Computer Science Building, Princeton University


Introduction to CS


1.  A Simple Machine

2.  Java Programming

3.  OOP

4.  Data Structures

    • Performance

    • Sorting

    • Linked Structures

    • Stacks and Queues

    • Multisets

    • Priority Queues

    • Symbol Tables

    • Graphs

5.  A Computing Machine

6.  Building a Computer

7.  Theory of Computation

8.  Systems

9.  Scientific Computation

10.  Perspective


 Lecture Notes

Assignments

FAQ









4.2 SORTING


This section under major construction.

Some of this material is adapated directly from Chapter 6, Algorithms in Java, 3rd edition, by Robert Sedgewick, Addison Wesley.


Sorting. The sorting problem is to rearrange a set of items in ascending order. A prime reason why sorting is so useful is that it is much easier to search for an element in a sorted list than an unsorted one. Given someone's first and last name, it is easy to find their phone number using a phone book since the entries are sorted by last name. An iPod organizes MP3 files by artist name or song title, Google display search results in descending order of "importance", a spreadsheet displays columns sorted by a particular field, Matlab sorts the real eigenvalues of a symmetric matrix in descending order. Once a list is sorted, it is also easier to perform other tasks: find the median element, the closest pair, statistical outliers, remove duplicates in a mailing list, or detect double registrations in a voting precinct. Sorting also arises as a critical subroutine in many applications that appear to have nothing to do with sorting at all including: data compression (see the Burrows-Wheeler programming assignment), computer graphics (convex hull, closest pair), computational biology (longest common substring discussed below), supply chain management (schedule jobs to minimize weighted sum of completion times), combinatorial optimization (Kruskal's algorithm), social choice and voting (Kendall's tau distance), Historically, sorting was most important for commercial applications, but sorting also plays a major role in the scientific computing infrastructure. NASA and the fluids mechanics community use sorting to study problems in rarefied flow; these collision detection problems are especially challenging since they involve ten of billions of particles and can only be solved on supercomputers in parallel. Similar sorting techniques are used in some fast N-body simulation codes. Another important scientific application of sorting is for load balancing the processors of a parallel supercomputers. Scientists rely on clever sorting algorithm to perform load-balancing on such systems. For these reasons and others, the quicksort algorithm was honored as one of top 10 algorithms for science and engineering of 20th century.

Rules of the game. We shall begin by considering methods that sort arrays of real numbers, although the same ideas apply to sorting integers, strings, dates, or any other entity where pairwise comparison is a well-defined operation. (See below.) We use two basic abstractions when sorting: less and exch. The less function takes two inputs and return true if the first input is less than the second one, and false otherwise. For floating point numbers, this function is a one-liner. The exch functions takes an array a and two indices i and j and exchanges a[i] with a[j]. This is the classic swapping idiom that we have encountered before.

static boolean less(double x, double y) {
   return (x 

Insertion sort. The method that people often use to sort bridge hands is to consider the cards one at a time, inserting each into its proper place among those already considered (keeping them sorted). In a computer implementation, we need to make space for the element being inserted by moving larger elements one position to the right, and then inserting the element into the vacated position. Program InsertionSort.java implements this method, which is known as insertion sort.

static void insertionSort(double[] a) {
   int N = a.length;
   for (int i = 0; i  0; j--)
         if (less(a[j], a[j-1]))
            exch(a, j, j-1);
         else break;
}

The program maintains the invariant that the elements to the left of the current index i are in sorted order during the sort, but they are not necessarily in their final position, as they may have to be moved to make room for smaller elements encountered later. The array is, however, fully sorted when the index reaches the right end. The figure below illustrates the method in operation on a sample file.

Insertion sort

Analysis of insertion sort. We first perform an empirical analysis. Then we validate our observations using a mathematical analysis.

Quicksort. Quicksort is a divide-and-conquer method for sorting. It works by partitioning an array of elements into two parts, then sorting the parts independently. As we shall see, the precise position of the partition depends on the initial order of the elements in the input file. The crux of the method is the partitioning process, which rearranges the array to make the following three conditions hold:

We achieve a complete sort by partitioning, then recursively applying the method to the subfiles, as depicted below.

public static void quicksort(double[] a) {
   quicksort(a, 0, a.length - 1);
}

private static void quicksort(double[] a, int left, int right) {
   if (right <= left) return; int i=partition(a, left, right); quicksort(a, left, i-1); quicksort(a, i+1, right); } 

We use the following general strategy to implement partitioning. First, we arbitrarily choose a[right] to be the partitioning element - the one that will go into its final position. Next, we scan from the left end of the array until we find an element greater than the partitioning element, and we scan from the right end of the array until we find an element less than the partitioning element. The two elements that stopped the scans are obviously out of place in the final partitioned array, so we exchange them. Continuing in this way, we ensure that no array elements to the left of the left index are greater than the partitioning element, and no array elements to the right of the right index are less than the partitioning element, as depicted in the following diagram.

Quicksort partitioning

When the scan indices cross, all that we need to do to complete the partitioning process is to exchange a[right] with the leftmost element of the right subfile (the element pointed to by the left index i). The code below implements this idea.

static int partition(double[] a, int left, int right) {
   int i = left - 1;
   int j = right;

   while(true) {
      while (less(a[++i], a[right]))   // left pointer
         if (i == right) break;

      while (less(a[right], a[--j]))   // right pointer
         if (j == left) break;

      if (i >= j) break;               // pointers cross?

      exch(a, i, j);                   // swap left and right
   }

   exch(a, i, right);                  // swap pivot
   return i;
}
This PowerPoint demo shows the results of partioning on the L in QUICKSORTISCOOL. The following illustrates the results of partioning on the rightmost E of the input the input ASORTINGEXAMPLE.

Quicksort partitioning


Program QuickSort.java takes a command line input N, reads in N strings from standard input, and quicksorts them. Here are the results of the full algorithm on the input ASORTINGEXAMPLE.


Quicksort

Analysis of quicksort. Quicksort is more difficult to analyze mathematically than the programs we've seen so far because the program is recursive and the running time can vary greatly depending on the input. Nevertheless, the performance of quicksort is well-understood. On average, quicksort uses about 2N ln N comparisons. This analysis assumes that the input has been randomly shuffled prior to sorting. If the input is degenerate (e.g., sorted or reverse sorted) then quicksort can take as many as N^2/2 comparisons.

To gain some intuition as to where the N log N terms comes from, let's assume that N is a power of 2 and that each partitioning step of quicksort divides the file exactly in half. Let T(n) denote the number of comparisions to quicksort an input of size n. Partioning a file of n elements requires n comparisons (possibly n+1 or n+2). By our optimisitc assumption T(n) = 2T(n/2) + n. Clearly T(1) = 0. Now T(2) = 2, T(4) = 8, T(8) = 24, T(16) = 64, T(32) = 160, and so forth. In general T(N) = N lg N. We can visualize this recurrence with a recursion tree. At level i (starting at i = 0), we sort k = 2^i subfiles of size N/k. Thus each level requires N comparisons. The depth of the tree is lg N. This is a total of N lg N.

N log N Divide-and-Conquer analysis

This intuition undercounts the number of comparisons because we will not get a perfectly even split.

Binary search.

Searching is a fundamental computational problem. For example, suppose we want to find the phone number of a classmate and we have access to a phone book which lists the names of each student in alphabetical order along with their phone numbers. A naive solution would be to examine each entry one at a time, and see if it matches the name of the classmate. A much more efficient solution exploits the phone book's ordering. We can open up to the middle of the phone book. If the name of the classmate appears on that page, we are done. Otherwise, we can eliminate half of the phone book from further consideration and repeat.

The recursive phone book strategy is known as binary search. Given a sorted array a and an element key, our task is to locate that element within the array, if it exists. It is based on the idea that if the elements in the table are in ascending order, we can eliminate half of them from consideration by comparing the one that we seek with the one at the middle position of the table. If it is equal, we have a successful search. If it is less, we apply the same method to the left half of the table. If it is greater, we apply the same method to the right half of the table. To implement this strategy, we maintain two indices left and right that mark the range of the array where the element might occur. If the range is empty, we return -1 to indicate an unsuccessful search. Otherwise, we examine the element mid in the middle of the range and compare it to key. Depending on the outcome, we either (i) return the index of the matching element, (ii) update the right endpoint of the range, or (iii) update the left half of the range.

public static int binarySearch(long[] a, long key) {
   int bot = 0;
   int top = a.length - 1;
   while (bot <= top) { int mid=(bot + top) / 2; if (key < a[mid]) top=mid - 1; else if (key> a[mid]) bot = mid + 1;
      else return mid;
   }
   return -1;
}
// precondition array a in ascending order
public static int binarySearch(long[] a, long key) {
   int bot = -1;
   int top = a.length;
   while (top - bot > 1) {
      int mid = (bot + top) / 2;
      if (key > a[mid]) bot = mid;
      else              top = mid;
   }
   if (a[top] == key) return  top;
   else               return -top - 1;
}

Correctness / program verification. In the early 1970s, Dijkstra championed the approach of developing a program along with a proof of correctness. Deceptively simple program and surprisingly hard to implement correctly. Bentley: asked professional programmers to code up the description of binary search in their favorite language. After a couple hours, most programmers believed they had written correct code. After 30 minutes of analysis and review, 90% had bugs. Knuth 6.2.1: first binary search algorithm was published in 1946, but first published binary search without bugs did not appear until 1962. Second version: if key appears, top is the index of the leftmost occurrence; otherwise top is the index of the leftmost element greater than key. While loop invariant says top >= bot + 2. This implies bot a[bot] , with the contention that a[-1] is -infinity and a[N] is +infinity.

Running time. Now we analyze the running time of the binary seach algorithm. We choose as our basic operation the number of key comparisons. Each time through the while loop, we decrease the length of the interval by roughly a factor of 2. If the array has N elements, then after one iteration the range consists of roughly N/2 elements, after two iterations it is roughly N/4, and after three it is roughly N/8. We say "roughly" because the range actually decreases from n to n/2 where we are using integer division. This repeated shrinking can happen at most log N times, since this is the number of iterations it takes to drive N down to 1. Since we do only a constant amount of work in the body of the while loop, we obtain a running time of Θ(log N). This analysis implies that we can search truly huge data sets. If N = 1 million, we need only 20 comparisons. If N = 1 billion, we need only 30. If N = 1 trillion, we need only 40.

Sorting generic objects. In many applications, we are interested in sorting non-numeric data, including strings and dates. We assume that we can compare two elements to see whether the first is bigger than, smaller than, or equal to the second. To help enforce this assumption, we use Java's Comparable interface and engineer our sort method so that it sorts arrays of Comparable objects.

public interface Comparable {
   public int compareTo(Object b);

By implementing the Comparable interface, the data type promises to implement a method compareTo so that that a.compareTo(b) returns a negative integer if a b, and zero if they are equal. Although the compiler does not check, the implementation of compareTo must be consistent, e.g, if a > b, then b compareTo method should throw an exception if a and b are of incompatible types or one is null. Many of the built-in Java types implement this interface, including String, Integer, File, URI, BigInteger, and Date. It is also easy to implement the interface with user-defined types. Here it is for our rational number ADT (assuming values are nonnegative). [In Java 1.5, it suffices to only write compareTo(Rational b).]

public class Rational implements Comparable {
   private long num, den;
   ...
   public int compareTo(Object o) {
       Rational a = this;
       Rational b = (Rational) o;
       if      (a.num * b.den  a.den * b.num) return  1;
       else                                    return  0;
   }
}

We use two basic abstractions when sorting: less and exch. The less function takes two inputs and return true if the first input is "less" than the second one, and false otherwise. This function is easy to code since the elements come with a compareTo method. The exch functions takes an array a and two indices i and j and swap a[i] and a[j]. This is the classic swapping idiom that we have encountered before.

static boolean less(Comparable x, Comparable y) {
   return (x.compareTo(y) <0); } static void exch(comparable[] a, int i, int j) { comparable swap=a[i]; a[i]=a[j]; a[j]=swap; } 

Using these abstractions, the body of insertion sort is identical to the code for sorting integers. The only difference is that the function takes an array of type Comparable instead of int.

static void insertionSort(Comparable[] a) {
   int N = a.length;
   for (int i = 0; i  0; j--)
         if (less(a[j], a[j-1]))
            exch(a, j, j-1);
         else break;
}
We can also run this program on the built-in String data type since it implements the Comparable interface. We use the data file dickens.txt (from Project Gutenberg). The file contains most of the writing of Charles Dickens and comprises over 5.5 million words and 31 million characters. We will see how long our program takes to read in the first N words and sort them.

Java library. The Java library Arrays contains static methods for manipulating arrays, including sorting and binary search, but not shuffling or reversing. The function Arrays.sort is applicable when sorting primitive types or objects. The algorithm for sorting strings (and other objects) is a tuned version of mergesort. It is guaranteed to be stable and run in time proportional to N log N. Program SystemSort.java illustrates how to use it. The algorithm for primitive types is a variant of quicksort developed by Bentley and McIlroy. It is extremely efficient for most inputs that arise in practice, including inputs that are already sorted. However, using a clever techinique described by M. D. McIlroy in A Killer Adversary for Quicksort, it is possible to construct pathological inputs that make the system sort run in quadratic time. Even worse, it is blows the function call stack. To see Java's sorting library break, here are some killer inputs of varying sizes: 10,000, 20,000, 50,000, 100,000, 250,000, 500,000, and 1,000,000. You can test them out using the program IntegerSort.java which takes a command line input N, reads in N integers from standard input, and sorts them using the system sort.

Stock market symbols. Program MarketSymbol.java reads in a database of market symbols associated with equities listed on the NYSE, NASDAQ, and other exchanges. Then it prompts the user to enter market symbols and prints out the corresponding equity using binary search. It uses the file mktsymbols.txt, which contains 127,482 entries. The data file stores the entries in alphabetical order, sorted by symbol name.

Longest repeated substring. An application of sorting to computational biology and plagiarism detection. Use system sort to suffix sort an input string and find longest repeated substring. Applications: computational biology, plagiarism detection, SCO lawsuit that Linux contains copies of their code. Program LRS.java uses String.substring to build the array of substrings and Arrays.sort to sort them. In Java, this is not terribly inefficient since each call to substring reuses the characters from the original String. However, it uses a substantial amount of memory because each of the N strings formed uses 16 bytes (the character array itself is reused and consumes 12 + 2N bytes). Exercise XYZ suggests a more memory efficient solution.

Q+A

Q. Why doesn't the Java library use a randomized version of quicksort?

A. Good question. At the very least, the library should cutoff to some guaranteed N log N algorithm if it "realizes" it is in trouble. Programmers may want their libraries to be deterministic for debugging. But the library only uses quicksort for primitive types when stability is not an issue, so the programmer probably wouldn't notice the randomness, except in running time.

Exercises

  1. What would happen if we replaced <= below with== in the quicksort function?
    if (right <= left) return; 

  2. What would happen if we replaced the following code with an empty statement in the partitioning subroutine?
    if (i == right) break;
    

    Answer: still correct (because partitioning element acts as a sentinel) and a little faster.

  3. What would happen if we replaced the following code with an empty statement in the partitioning subroutine?
    if (j == left) break;
    

    Answer: array out-of-bounds possible if partitioning element is smallest.

  4. What would happen if we replaced the >= with == in the partitioning subroutine?
    if (i >= j) break;
    

  5. What would happen if we replaced i in the following code with j in the partitioning subroutine?
    exch(a, i, right);
    

  6. Consider the following implementation of the compareTo method for String. How does the third line help with efficiency?
    public int compareTo(Object x) {
       String s = this;
       String t = (String) x;
       if (s == t) return 0;
       int n = Math.min(s.length(), t.length());
       for (int i = 0; i  t.charAt(i)) return +1;
       }
       return s.length() - t.length();
    }
    

    Answer: avoid directly comparing all n characters if s and t are references to the same string.

  7. Given a sorted list of N integers and a target integer x, determine in O(N) time whether there are any two that sum to exactly x.
  8. Let f be a monotonically increasing function with f(0) <0 and f(n)> 0. Find the smallest integer i such that f(i) > 0. Devise an algorithm that makes O(log N) calls to f().
  9. Give an O(N log N) algorithm for computing the median of a sequence of N integers. Hint: sort first.
  10. Given two sorted lists of size N1 and N2, find the median of all elements in O(log N) time where N = N1 + N2.
  11. Give an O(N log N) algorithm for computing the mode (value that occurs most frequently) of a sequence of N integers.
  12. Give a O(N) algorithm for sorting a sequence of N integers that are between 0 and 99.
  13. Give a O(N) algorithm for computing the median of a sequence of N integers that are between 0 to 99.
  14. Given a sequence of N real numbers, find the pair of integers that are closest in value. Give a O(N log N) algorithm.
  15. Given a sequence of N real numbers, find the pair of integers that are farthest apart in value. Give a O(N) algorithm.
  16. Element uniqueness. Give an array of N long integers, devise an O(N log N) algorithm to determine if any two are equal.
  17. Give three lists of N names each, devise an O(N log N) algorithm to determine if there is any name common to all three lists, and if so, return the first such name.
  18. Give a sorted array of N elements, possibly with duplicates, find the index of the first and last occurrence of k in O(log N) time.
  19. Write a function removeDuplicates that takes an array of strings as input and returns a new array of strings with all duplicates removed.
    public static String removeDuplicates(String[] duplicates) {
       Arrays.sort(duplicates);
       int N = 0;
       for (int i = 0; i 
  20. Given an array of N integers, the 2-sum problem is to find a pair of integers whose sum is closest to zero. Describe an O(N log N) algorithm for the problem. Solution: sort by absolute value - the best pair is now adjacent.
  21. Given an array of N integers, the 3-sum problem is to determine find three integers whose sum is closest to zero. Design an algorithm that uses O(N2 log N) time and O(N) space. Hint: sort and binary search.
  22. Design an algorithm for the 3-sum problem that runs in O(N2) time and O(N) space. Hint: sort and clever search.

Creative Exercises

  1. Floor and ceiling. Given a set of comparable elements, the ceiling of x is the smallest element in the set greater than or equal to x, and the floor is the largest element less than or equal to x. Suppose you have an array of N elements in ascending order. Give an O(log N) algorithm to find the floor and ceiling of x.
  2. Union of intervals. Given N intervals on the real line, determine the length of their union in O(N log N) time. For example the union of the four intervals [1, 3], [2, 4.5], [6, 9], and [7, 8] is 6.5.
  3. Coffee can problem. (David Gries). Suppose you have a coffee can which contains an unknown number of black beans and an unknown number of white beans. Repeat the following process until exactly one bean remains: Select two beans from the can at random. If they are both the same color, throw them both out, but insert another black bean. If they are different colors, throw the black one away, but return the white one. Prove that this process terminates with exactly one bean left. What can you deduce about the color of the last bean as a function of the initial number of black and white beans? Hint: find a useful invariant maintained by the process.
  4. Spam campaign. To initiate an illegal spam campaign, you have a list of email addresses from various domains (the part of the email address that follows the @ symbol). To better forge the return addresses, you want to send the email from another user at the same domain. For example, you might want to forge an email from wayne@princeton.edu to rs@princeton.edu. How would you process the email list to make this an efficient task?
  5. Order statistics. Given an array of N elements, not necessarily in ascending order, devised an algorithm to find the kth largest one. It should run in O(N) time on random inputs.
  6. Kendall's tau distance. Given two permutations, Kendall's tau distance is the number of pairs out of position. "Bubblesort metric." Give an O(N log N) algorithm to compute the Kendall tau distance between two permutations of size N. Useful in top-k lists, social choice and voting theorey, comparing genes using expression profiles, and ranking search engine results.
  7. Antipodal points. Given N points on a circle, centered at the origin, design an algorithm that determines whether there are two points that are antipodal, i.e., the line connecting the two points goes through the origin. Your algorithm should run in time proportional to N log N.
  8. Antipodal points. Repeat the previous question, but assume the points are given in clockwise order. Your algorithm should run in time proportional to N.
  9. Identity. Given an array a of N distinct integers (positive or negative) in ascending order. Devise an algorithm to find an index i such that a[i] = i if such an index exists. Hint: binary search.
  10. Majority. Given an array of N strings. An element is a majority if it appears more than N/2 times. Devise an algorithm to identify the majority if it exists. Your algorithm should run in linearathmic time.
  11. Majority. Repeat the previous exercise, but this time your algorithm should run in linear time, and only use a constant amount of extra space. Moreover, you may only compare elements for equality, not for lexicographic order.
  12. Bogosort. Bogosort is a randomized algorithm that works by throwing the N cards up in the air, collecting them, and checking whether they wound up in increasing order. If they didn't, repeat until they do. Implemenet bogosort using the shuffling algorithm from Section 2.5. Estimate the running time as a function of N.
  13. Sort by reverse domain. Write a program to read in a list of domain names from standard input, and print the reverse domains in sorted order. For example, the reverse domain of cs.princeton.edu is edu.princeton.cs. This is useful for web log analysis.
  14. L1 norm. There are N circuit elements in the plane. You need to run a special wire (parallel to the x-axis) across the circuit. Each circuit element must be connected to the special wire. Where should you put the special wire? Hint: median minimizes L1 norm.
  15. Finding common elements. Given two arrays of N 64-bit integers, design an algorithm to print out all elements that appear in both lists. The output should be in sorted order. Your algorithm should run in N log N. Hint: mergesort, mergesort, merge. Remark: not possible to do better than N log N in comparsion based model.
  16. Finding common elements. Repeat the above exercise but assume the first array has M integers and the second has N integers where M is much less than N. Give an algorithm that runs in N log M time. Hint: sort and binary search.
  17. Prefix free codes. In data compression, a set of binary code words is prefix-free if no string is a prefix of another. For example, { 01, 10, 0010, 1111 } is prefix free, but { 01, 10, 0010, 1010 } is not because 10 is a prefix of 1010. Design an efficient algorithm to determine if a set of binary code words is prefix-free. Hint: sort.
  18. Word frequencies. Write a program Frequency.java that reads in a sorted list of strings and prints out the number of times each string appears in descending order. Used in conjunction with QuickSort.java, you can process an unsorted text file and print out the number of times each string appears with the command
    java Quicksort 
  19. Nuts and bolts. (Reference.) You have a mixed pile of N nuts and N bolts and need to quickly find the corresponding pairs of nuts and bolts. Each nut matches exactly one bolt, and each bolt matches exactly one nut. By fitting a nut and bolt together, you can see which is bigger. But it is not possible to directly compare two nuts or two bolts. Given an efficient method for solving the problem. Hint: customize quicksort to the problem.
  20. Anagrams. Design a O(N log N) algorithm to read in a list of words and print out all anagrams. For example, the strings "comedian" and "demoniac" are anagrams of each other. Assume there are N words and each word contains at most 20 letters. Designing a O(N^2) algorithms should not be too difficult, but getting it down to O(N log N) requires some cleverness.
  21. Pattern recognition. Given a list of N points in the plane, find all subset of 3 or more points that are collinear.
  22. Pattern recognition. Given a list of N points in the plane in general position (no three are collinear), find a new point p that is not collinear with any pair of the N original points.
  23. Search in a sorted, rotated list. Given a sorted list of N integers that has been rotated an unknown number of positions, e.g., 15 36 1 7 12 13 14, design an O(log N) algorithm to determine if a given integer is in the list.
  24. Counting inversions. Each user ranks N songs in order of preference. Given a preference list, find the user with the closest preferences. Measure "closest" according to the number of inversions. Devise an N log N algorithm for the problem.
  25. Throwing cats from an N-story building. Suppose that you have an N story building and a bunch of cats. Suppose also that a cat dies if it is thrown off floor F or higher, and lives otherwise. Devise a strategy to determine the floor F, while killing O(log N) cats.
  26. Throwing cats from a building. Repeat the previous exercise, but devise a strategy that kills O(log F) cats. Hint: repeated doubling and binary search.
  27. Throwing two cats from an N-story building. Repeat the previous question, but now assume you only have two cats. Now your goal is to minimize the number of throws. Devise a strategy to determine F that involves throwing cats O(√N) times (before killing them both). This application might occur in practice if search hits (cat surviving fall) are much cheaper than misses (cat dying).
  28. Throwing two cats from a building. Repeat the previous question, but only throw O(√F) cats. Reference: ???.
  29. Nearly sorted. Given an array of N elements, each which is at most k positions from its target position, devise an algorithm that sorts in O(N log k) time.

    Solution 1: divide the file into N/k pieces of size k, and sort each piece in O(k log k) time, say using mergesort. Note that this preserves the property that no element is more than k elements out of position. Now, merge each blocks of k elements with the block to its left.

    Solution 2: insert the first k elements into a binary heap. Insert the next element from the array into the heap, and delete the minimum element from the heap. Repeat.

  30. Merging k sorted lists. Suppose you have k sorted lists with a total of N elements. Give an O(N log k) algorithm to produce a sorted list of all N elements.
  31. Dutch national flag problem. Given an array a[] of elements and a partitioning element p, rearrange it into three pieces: those with keys less than a[p], those with keys equal to a[p], and those with keys greater than a[p]. Do it in-place, i.e., with at most a constant amount of extra memory. Application: 3-way quicksort. Prove that your algorithm is correct in all cases by identifying loop invariants.
  32. Longest common substring. Given two strings s and t, find the longest substring that appears in both. Hint: concatenate the two strings together, separated by a special character that doesn't appear in either string, say '\0'. Then, suffix sort, and look for the longest substring in two adjacent words, one that comes from s, one from t.
  33. Longest common reverse complemented substring. Given two DNA strings, find the longest substring that appears in one, and whose reverse Watson-Crick complement appears in the other. Two strings s and t are reverse complements if t is the reverse of s except with the following substitutions A<->T, C<->G. For example ATTTCGG and CCGAAAT are reverse complements of each other. Hint: suffix sort.
  34. Circular string linearization. Plasmids contain DNA in a circular molecule instead of a linear one. To facilitate search in a database of DNA strings, we need a place to break it up to form a linear string. A natural choice is the place that leaves the lexicographically smallest string. Devise an algorithm to compute this canonical representation of the circular string Hint: suffix sort.
  35. Burrows-Wheeler transform. Read in a text string and encodes it using the Burrows Wheeler transform. Another application of suffix sorting.
  36. Find all matches. Given a text string, find all matches of the query string. Hint: combine suffix sorting and binary search.
  37. Longest repeated substring with less memory. Instead of using an array of substrings where suffixes[i] referes to the ith sorted suffix, maintain an array of integers so that index[i] referes to the offset of the ith sorted suffix. To compare the substrings represented by a = index[i] and b = index[j], compare the character s.charAt(a) against s.charAt(b), s.charAt(a+1) against s.charAt(b+1), and so forth. How much memory do you save? Is your program faster?
  38. Idle time. Suppose that a parallel machine processes n jobs. Job j is processed from sj to tj. Given the list of start and finish times, find the largest interval where the machine is idle. Find the largest interval where the machine is non-idle.
  39. Local minimum of an array. Given an array a of N distinct integers, design an O(log N) algorithm to find a local minimum: an index i such that a[i-1]
  40. Local minimum of a matrix. Given an N-by-N array a of N2 distinct integers, design an O(N) algorithm to find a local minimum: an pair of indices i and j such that a[i][j]
  41. Bitonic search. An array is bitonic if it is comprised of an increasing sequence of integers followed immediately by a decreasing sequence of integers. Given a bitonic array a of N distinct integers, describe how to determine whether a given integer is in the array in O(log N) steps. Hint: find the local maximum, then binary search in each piece.
  42. Monotone 2d array. Give an n-by-n array of elements such that each row is in ascending order and each column is in ascendeing order, devise an O(n) algorithm to determine if a given element x in the array. You may assume all elements in the n-by-n array are distinct.
  43. Maxima. Given a set of n points in the plane, point (xi, yi) dominates (xj, yj) if xi > xj and yi > yj. A maxima is a point that is not dominatese by any other point in the set. Devise an O(n log n) algorithm to find all maxima. Application: on x-axis is space efficiency, on y-axis is time efficiency. Maxima are useful algorithms. Hint: sort in ascending order according to x-coordinate; scan from right to left, recording the highest y-value seen so far, and mark these as maxima.
  44. Point in a convex polygon. Write a program that can test whether points fall within a convex polygon, using linear preprocessing time and logarithmic test time per point. First, read in the points which define the polygon. Assume that these appear in the standard format, the points appearing clockwise around the polygon with the point with lowest y coordinate first. Note that this implies that the y coordinates of the points increase, then decrease. During input, remember which point is at the "top" (where the y coordinates start to go down). This divides the polygon into a "right" part (increasing y coordinates) and a "left" part (decreasing y coordinates). In summary, after input, you have the points of the polygon in an array, and you have an index into the array which tells where the right part ends and the left part begins.

    Next, read in some test points, testing, one at a time, whether or not each is inside the polygon, using the following method. If a horizontal line were drawn through the given test point, it would hit one line from the left part of the polygon and one line from the right part of the polygon. The point is inside the polygon if and only if it falls between those two lines. But from the way that the polygon is stored (the y coordinates of each part are sorted), the two lines can be found in logarithmic time using binary search. After the two lines are found, it is a simple matter to test whether the point falls between them.

  45. Largest empty rectangle. Given a set of n points in the plane, with coordinates between 0 and 1,000, find the area of the largest rectangle that contains no points.
  46. Largest area under a histogram. Given a histogram of n points 1..n. The height of point i is hi. Find the largest axis-aligned rectangle that fits under the histogram. O(n log n) is good. O(n) is also possible.
  47. Helipad landing. Given an n-by-n grid of integer altitudes values, find the largest axis-aligned rectangle that contains values of all the same height. Reference: ???
  48. Compound words. Read in a list of words from standard input, and print out all two-word compound words. If after, thought, and afterthought are in the list, then afterthought is a compound word. Note: the components in the compound word need not have the same length.