4.1 ANALYSIS OF ALGORITHMS
This section under major construction.
In this section we describe a systematic method and powerful theory
for understanding the performance and resource consumption of the
program that we write.
Give an algorithm that solves a problem,
we would like to be able to predict with confidence how long it
will take, how much memory it will use, or how much of some other
scarce resource it will consume.
The following are among the reasons that we are interested
in such analyses:
- To compare different algorithms for the same task.
- To predict performance in a new environment.
- To set values of algorithm parameters.
Our approach is the scientific method, and it involves the following 5 step approach.
- Observe some feature of the universe.
- Hypothesize a model that is consistent with the observations.
- Predict events using the hypothesis.
- Verify the predictions by making further observations.
- Validate the theory by repeating the previous steps until the hypothesis agrees with the observations.
One of the key tenets of the scientific method is that the experiments we design must be reproducible, and this is certainly the case here. The main difference between applying the scientific method to the natural sciences and to the analysis of algorithms is that the "universe" we are observing is the computer itself.
Observations. Our first challenge is to determine how to make quantitative measurements of the running time or memory usage of our programs. We cannot expect to exactly determine these quantities since there are several important factors outside a given programmer's domain of influence. First, Java programs are translated into bytecode, and the bytecode is interpreted or translated into runtime code on a virtual machine (VM). The compiler, translator, and VM implementations all have an effect on which instructions on an actual machine are executed, so it can be a daunting task to figure out exactly how long even one Java statement might take to execute. In an environment where resources are being shared, even the same program can have varying performance characteristics at two different times. Second, many programs are extremely sensitive to their input data, and performance might fluctuate wildly depending on the input.
Although we cannot hope to measure the running time of our program exactly, there are a number of tools available to help us estimate these quantities. Perhaps the simplest is a stopwatch. We can simply run a program on various inputs, measure the amount of time to process each input, and create a chart of the results. To automate the process, we can use the Java profiler which reports back the running time and various other information about the performance of our program on a particular input.
To illustrate the approach, we use the program Quadratic.java. It takes a command line parameter N, reads in N long integers from standard input, stores the N values in an array, and finds the pair of elements whose sum is closest to 0.
We execute the program with the -Xprof option to obtain the following profiling or "benchmarking" information.
// read in the data int N = Integer.parseInt(args[0]); long[] a = new long[N]; for (int i = 0; i
% java -Xprof Quadratic 5000
The output contains a wealth of information. For our purposes, the most important piece of information is the number of seconds listed in the "flat profile." In this case, the profiler says our program took 3.18 seconds. Running it a second times may yield an answer of 3.28 or 3.16 since the measurement is not perfectly accurate. We repeat this experiment for different inputs of size 10,000 and also for inputs of sizes 20,000, 40,000 and 80,000. The results are summarized in the table and plot below.
Hypotheses. Despite all of the complicating factors in understanding the running times of our programs, it is often possible to create accurate models that will predict precisely how long a particular program will take, or to know that one program will do better than another in particular situations. Moreover, we can often acquire such knowledge by using a combination empirical observations and a relatively small set of mathematical tools.
Empirical analysis. Our first qualitative observation is that the input size increases, so does the running time. With additional experiments, we also observe that the running time is relatively insensitive to the input itself, only to its size. Now, our goal is to better quantify the relationship between input size and running time. One way to see the effect is to plot the running time versus input size. From this, we may hypothesize that there is a quadratic relationship between running time and input size. When we increase the size of the input by a factor of 2, the running time increases by a factor of 4. If we plot using a log-log plot, we see that the slope of the line is xxx and the intercept is yyy, so we may estimate that our program takes approximately cN2 nanoseconds to process an input of N integers.
Mathematical analysis. Fitting the observed data to a curve enables us to make predictions. However, we are not satisfied with simply understanding what is the running time as a function of the input size. We also want to understand why it is so. For example, why is the shape of the curve quadratic instead of linear or cubic? Why is it any polynomial? Now, we will try to quantify the running time from a mathematical viewpoint by simply looking at the code, without using any measurements or observations. Of course, we can not succeed completely because the running time of our program depends on the machine and compiler on which we run it. Nevertheless, in many cases, we can glean the most important information by studying the algorithm. The first step is to identify the abstract operations on which the algorithm is based in order to separate the analysis from the implementation. For example, when studying a sorting algorithm, we wish to quantify the number of comparisons and exchanges that a particular algorithm requires. In a scientific application, our primary interest might be the number of floating point operations (flops).
We now provide a mathematical analysis the program Quadratic.java. Our goal is to estimate the number of instructions that the program executes for a given input size N. There are two main parts of the program: reading in the input and find the pair whose sum is closest to zero. We focus on the latter part because it dominates the running time. (See Exercise XYZ for an analysis of the first part.)
For simplicity, we will assume that each operation (variable declaration, assignment, increment, sum, absolute value, comparison) [array access??] takes one step.
long best = Long.MAX_VALUE; for (int i = 0; i
- The first statement consists of 1 variable declaration and 1 assignment statement.
- The for loop over i: The initialization consists of 1 variable declartion (i) and 1 assignment statement (i = 0); the loop continuation condition requires N+1 comparisons (N of which evaluate to true, and one false); the increment part occurs N times.
- The for loop over j: The j loop itself is executed N times, once for each value of i. This means that we must do each of the following operations N times: declare j, compute i+1, and initialize j, for a total of 3N steps. Now we analyze the total number of times that the increment statement (j++) is execute. When i = 0 the j loop iterates N-1 times; when i = 1 the j loop iterates N-2 times; and so forth. Overall, this is N-1 + N-2 + ... + 1 = N(N-1)/2 times. This sum arises frequently in computer science because it is the number of distinct pairs of N elements. The loop continuation condition is executed once more per loop than the increment statement, so there are a total of N(N-1)/2 + N comparisons.
- The body of the j loop: The body is executed once for each distinct pair of N elements. As we've seen, this is N(N-1)/2 times. The body consists of one variable declaration, one addition, one comparison, two absolute values, and either one or two assignment statements (depending on the result of the comparison) for a total of between 6N(N-1)/2 and 7N(N-1)/2 steps.
Order of growth.
Computer scientists use order of growth notation to simplify the expressions that arise in the analysis of algorithms. Informally, the order of growth is the term that grows the fastest as N increases, ignoring the leading coefficient. For example, we determined that the double loop of Quadratic.java takes 5 + 1.5N + 4.5N2 steps. The order of growth of this program is Θ(N2). Disregarding lower order terms is justified since we are primarily interested in running times for large values of N, in which case the effect of the leading term overwhelms the smaller terms. We can partially justify disregarding the leading coefficient because it is measured in the number of steps. But we are really interested in the running time (in seconds). We really should be weighting each step by the actual time it takes to execute that type of instruction on a particular machine and with a particular compiler. Formally, this notation means that there exist constants 0 2 and bN2 for all positive integers N. We can choose a = 4.5 and b = 11 in the example above.We use order of growth notation since it is a simple but powerful model of running time. For example, if an algorithm has Θ(N2) running time, then we expect the running time to quadruple if we double the size of the problem. Order of growth is usually easier to calculate than meticulously trying to count the total number of steps. We now consider an order of growth analysis of the program Cubic.java. It takes a command line argument N, reads in N long integers, and finds the triple of values whose sum is closest to 0. Although this problem seems contrived, it is deeply related to many problem in computational geometry (see section xyz). The main computation loop is shown below.
The bottleneck is iterating over all triples of integers. There are N choose 3 = N(N-1)(N-2)/6 ways to select 3 of N integers. Thus, the order of growth is &Theta(N3) or cubic. If we double the size of the problem, we we should expect the running time to go up eightfold.
long best = Long.MAX_VALUE; for (int i = 0; i
The table below contains the most common order of growth terms. The running time of a particular program is likely to be some constant multiplied by one of these terms (the leading term) plus some smaller terms. We refer to the running time of such programs or algorithms simply as logarithmic, linear, linearithmic, quadratic, cubic, exponential, factorial, and so forth.
| Complexity | Description | Examples |
|---|---|---|
| 1 | Constant algorithm does not depend on the input size. Execute one instruction a fixed number of times |
Arithmetic operations (+, -, *, /, %)
Comparison operators (<, >, ==, !=) Variable declaration Assignment statement Invoking a method or function |
| log N | Logarithmic algorithm gets slightly slower as N grows. Whenever N doubles, the running time increases by a constant. |
Bits in binary representation of N Binary search Insert, delete into heap or BST |
| N |
Linear algorithm is optimal if you need to process N inputs. Whenever N doubles, then so does the running time. |
Iterate over N elements
Allocate array of size N Concatentate two string of length N |
| N log N |
Linearithmic algorithm scales to huge problems. Whenever N doubles, the running time more (but not much more) than doubles. |
Quicksort
Mergesort FFT |
| N2 |
Quadratic algorithm practical for use only on relatively small problems. Whenever N doubles, the running time increases fourfold. |
All pairs of N elements
Allocate N-by-N array |
| N3 |
Cubic algorithm is pratical for use on only small problems. Whenever N doubles, the running time increases eightfold. |
All triples of N elements N-by-N matrix multiplication |
| 2N |
Exponential algorithm is not usually appropriate for practical use. Whenever N doubles, the running time squares! |
Number of N-bit integers All subsets of N elements Discs moved in Towers of Hanoi |
| N! |
Factorial algorithm is worse that exponential. Whenever N increases by 1, the running time increases by a factor of N |
All permutations of N elements |
Predictions and validations. To test out our hypothesis, we can make several predictions. The total running time of a program is the sum over all instructions of the frequency × cost. The frequency is the number of times an instruction is executed. It depends on the algorithm and the input. The cost is the amount of time it takes to execute an instruction. It depends on the compiler and machine. It would be prohibitively time-consuming to try to analyze the running time of a Java program for each machine, input, and compiler combination. Donald Knuth championed an easier alternative.
- Analyze order of growth as a function of input size N.
- For medium N, run and measure the actual time on your computer.
- For large N, use (i) and (ii) to predict time.
Suppose our algorithm is too slow to run on our computer. We can predict how long it will take on a faster computer or a supercomputer. Give table showing effects of a faster computer for linear, linearithmic, qudratic, cubic, exponential, and factorial algorithm on existing 1GHz computer (say in 1 minute), on 5GHz computer on supercomputer 1,000 times as fast. Reference
Given two algorithms that solve the same problem, we want to know which one will solve our problem using fewer computational resources. The observation that one algorithm is 10 times faster than another is unlikely to escape the notice of someone who waits 3 seconds for one to finish and 30 seconds for the other, but it is easy to overlook a small constant in a mathematical analysis. Two programs might not be comparable at all: one may run much more efficiently on one particular kind of input and the second may run efficiently under other circumstances. Empirical analysis serves to validate our mathematical analyses. It also reveals when our algorithm is too slow for practical consumption and tells us to seek better algorithmic ideas.
Profiling individual instructions. We have learned how to estimate the number of instructions executed in our programs. This tells us the order of growth of our program. However, not every machine instruction or Java statement takes the same amount of time so the leading coefficient may be misleading. Program Timing.java tests the execution time of various operations including: arithmetic, exponential, and trigonemtric operations. Math.sqrt, Math.pow, Math.exp, Math.log, Math.min, Math.random. The program stores the current system time before and after the code to be timed. The function System.currentTimeMillis returns the current time in miliseconds since January 1, 1970.
From this, we see that on our system division is slower than addition and multiplication. We also see that the trigonemtric operations are substatially slower (factor of xxx) than addition. This technique measures that actual elapsed time as measured on a wall clock. If your system is not running many other applications, this can produce accurate results. However, if you are on a public cluster machine, there might be many other processes running. To get a more accurate estimate, you want the amount of CPU time that your program consumes. Also, the JIT compiler needs to get warmed up, so we disregard the first bunch of output.
long start = System.currentTimeMillis(); // CODE TO TIME GOES HERE long stop = System.currentTimeMillis(); long elapsed = (stop - start) / 1000.0; System.out.println("Elapsed time in seconds = " + elapsed);
Here is a huge list of Java performance tips.
Worst case vs. average case. Actual real-world data (words of Moby Dick), random (randomly generated integers between one and a million), or perverse (input is in reverese sorted order). Are average case bounds as good as worst case bounds? From a theoretical viewpoint, we, of course, prefer worst case bounds since they are stronger. But what about in practice? In practice, the answer is not so simple. There are obvious situations where applications demand good worst case bounds. For example, software controlling aircraft or nuclear reactors must run efficiently, and if our software fails to run as expected, we take little solace in knowing that we encountered a statistically insignificant event. There are other seemingly less critical applications where worst case running times are also important. By exploiting hash tables that don't guarantee worst-case performance, attackers can launch denial-of-service attacks by sending pathological inputs to the server and causing it to perform a linear amount of work per operation instead of logarithmic, resulting in 100 - 10,000 times performance degradation. Such algorithmic complexity attacks can be used against web sites running perl 5.8.0, the Linux 2.4.20 kernel, the Squid 2.5 web proxy cache, and the Bro intrusion detection system 0.8a20, are described in this paper. Moral: if you have a hash table into which attackers can insert arbitrary keys, you'd better be using a hash function for which they cannot easily generate collisions.
Euclid's algorithm. In Section 2.7, we considered Euclid's algorithm for computing the
static int gcd(int p, int q) { if (q == 0) return p; else return gcd(q, p % q); }
We will now analyze the running time of this algorithm and explain why
it is fast in practice. Recursive algorithms can be trickier to analyze
than straightforward nested loops since we must determine how many times
the function calls itself recursively. First we assume that p > q.
If not, then the first recursive call effectively swaps p and q.
Now, we argue that p decreases by a factor of 2 after at most 2 recursive
calls. To see this, there are two cases to consider.
If q ≤ p / 2, then the next recursive call will have p' = q ≤ p / 2
so p decreases by at least a factor of 2 after only one recursvie call.
Otherwise, if p / 2
The following table summarizes typical memory consumption of
various types on a typical 32-bit Java Virtual Machine (JVM).
Note than a boolean consumes one byte of memory
(wasting 7 of the 8 bits), whereas a boolean array
packs 8 boolean variables to the byte. Strings and arrays
require some overhead, so creating lots of tiny strings can
consume more memory than you might expect.
Increasing memory and stack space.
Sometimes you need more memory or stack space than the default
allotment.
You can increase the amount of memory alotted to Java by executing with
java -Xmx200m Hello where 200m means
200 megabytes. The default setting is typically 64MB.
You can increase the amount of stack space alotted to Java by
executing with java -Xss200k Hello where 200k
means 200 kilobytes. The default setting is typically 128KB.
It's possible to increase both the amount of memory and stack space
by executing with
executing with java -Xmx200m -Xss200k Hello.
Perspective.
Perhaps the most common mistake made in selecting an algorithm
is to pay too much attention to performance characteristics.
Improving the running time of a program by a factor of 10 is
inconsequential if the program takes only a few microseconds. Even if a
program takes a few minutes, it may not be worth the time and effort
required to make it run 10 times faster, particularly if we expect to use
the program only a few times. The total time required to
implement and debug an improved algorithm might be substantially more
than the time required simply to run a slightly slower one - we may as
well let the computer do the work. Worse, we may spend a considerable
amount of time and effort implementing ideas that should improve
a program but actually do not do so.
Perhaps the second most common mistake made in selecting an algorithm is to
ignore performance characteristics. Faster algorithms are often more
complicated than brute-force solutions, and implementors are often
willing to accept a slower algorithm to avoid having to deal with
added complexity. However, we
can sometimes reap huge savings with just a few lines of code. Users
of a surprising number of computer systems lose substantial time
waiting for simple quadratic algorithms to finish solving a problem,
even though N log N or linear algorithms are available that are only
slightly more complicated and could therefore solve the problem in a
fraction of the time. When we are dealing with huge problem sizes, we
have no choice but to seek a better algorithm, as we shall see.
Q. How long does retrieving the length of an array take?
A. Constant time - it's length is stored as a separate variable.
Q. How long do string operations take?
A. The methods length, charAt, and substring
take constant time.
The methods toLowerCase and replace take linear time.
The methods compareTo, startsWith, and indexOf
take time proportional to the number of characters needed to resolve
the answer (constant in the best case and linear in the worst case).
String concatenation takes time proporitional to the total number
of characters in the result.
The array value contains a reference to the sequence
of characters. The string that is represented consists of
the characters value[offset] through value[offset+count-1].
Q. Why does Java need so many fields to represent a string?
A. The array value contains a reference to the sequence
of characters. The string that is represented consists of
the characters value[offset] through value[offset+count-1].
The variable hash is used to cache the hash code of the
string so that it need not be computed more than once.
Java implements a string this way so that the substring
method can reuse the character array without requiring extra
memory (beyond the 24 bytes of header information).
Q. Why does allocating an array of size N take time proportional to N?
A. In Java, all array elements are automatically initialized to default
values (0, false, or null).
In principle, this could be a constant time operation if the system
defers initialization of each element
until just before the program accesses that element for the first time.
Q. Should I perform micro-optimizations? Loop-unrolling, inlining functions.
A. "Premature optimization is the root of all evil" - C.A.R. Hoare.
Micro-optimizations are very rarely useful, especially when they come
at the expense of code readability. Modern compilers are typically
much better at optimizing code than humans. In fact, hand-optimized code
can confuse the compiler and result in slower code.
Instead you should focus on using correct algorithms and data structures.
The first code fragmnent takes time proporitional to N2,
whereas the second one takes time proporitional to N.
Unfortunately, it's running
time is quadratic in the number of '/'s in the input.
By sending multiple simultaneous requests with large numbers of '/'s,
you can deluge a server and starve other processes for CPU time,
thereby creating a denial of service attack.
Fix the version of no2slash so that it runs in linear
time and does not allow for the above attack.
Devise an O(N) time algorithm to determine
whether or not a given integer x is in a Young tableaux.
Answer: Start at the upper right corner. If element = x,
done. If element > x, go left. Otherwise go down.
If you reach bottom left corner, then it's not in table.
O(N) since can go left at most N times and down at most N times.
Subset sum.
Write a program Exponential.java
that takes a command line integer N, reads in N long integer
from standard input, and finds the subset
whose sum is closest to 0. Give the order of growth of your
algorithm.
Solutions. (a) allocate a boolean array of 220 bits.
(b) copy all numbers starting with 0 to one tape and 1 to the other.
Choose leading bit of missing number to be whichever tape has
fewer entries (breaking ties arbitrarily). Recur on the smaller
half. (c) Pick twenty 20-bit integers at random and search list
sequentially to see if they're missing. There
are 48,576 missing integers, so you'll have at least a
(1 - (1 - 48576/220)20) > 0.5 chance of getting one.
If you get unlucky, repeat.
Estimating memory usage.
To estimate how much memory our program uses, we can count up
the number of variables and weight them by the number of bytes
according to their type.
In Java, a char is 2 bytes (16 bits),
an int is 4 bytes (32 bits), a double
is 8 bytes (64 bits), and a reference is 4 bytes (32 bits).
An array of N elements requires 12 bytes of header information plus
the number of bytes needed to store the N data elements.
As when estimating the running time of our programs, we are
primarily interested in the asymptotic behavior when N is large.
Thus, we can safely disregard lower order terms.
Primitive Type
Memory (bytes)
boolean
1
char
2
int
4
float
4
long
8
double
8
Type
Memory (bytes)
object reference
4
boolean[]
16 + N/8
char[]
16 + 2N
int[]
16 + 4N
double[]
16 + 8N
String
40 + 2N
Memory of user-defined objects.
The table above indicates that storing a reference to an object
uses 4 bytes of memory. This does not count the amount of memory
to store the object itself.
To determine the memory consumption of an object, we sum up
the amount of memory used by each instance variable (and any memory
that they reference). We must also account for some overhead
associated with each object, typically 8 bytes. As an example,
here is (roughly) how Sun implements a String.
A String object stores
stores three integer values (4 bytes each) plus one
reference to a character array (4 bytes). The array itself
uses 16 + 2N bytes to store N characters.
This is a total of 40 + 2N bytes after we include the 8 bytes of
overhead for any object.
To see an example, read in the first N words of Moby Dick
into an string array and count up the memory consumption.
The average word length is ....
Typically the JVM allocates memory (on the heap?)
in 8 byte blocks so a string of size 1, 2, 3, or 4
each consume the same (48 bytes) amount of memory.
public class String {
private char value[]; // 4 bytes + memory for array
private int offset; // 4 bytes
private int count; // 4 bytes
private int hash; // 4 bytes
...
}
Lessons
Q + A
Exercises
String s = "";
for (int i = 0; i
public static int f(int n) {
if (n == 0) return 1;
else return f(n-1) + f(n-1);
}
public static String method1(int n) {
String s = "a";
for (int i = 0; i
public static String method1(int N) {
if (N == 0) return "";
String temp = method1(N / 2);
if (N % 2 == 0) return temp + temp;
else return temp + temp + "x";
}
public static String method2(int N) {
String s = "";
for (int i = 0; i
long best = Long.MAX_VALUE;
for (int i = 0; i
Answer: linear time. The bottlenecks are the
array initialization and the input loop.
int N = Integer.parseInt(args[0]);
long[] a = new long[N];
for (int i = 0; i
for (int i = 0; i
for (int i = 0; i
int[] a = new int[N];
boolean[] taken = new boolean[N];
Random random = new Random();
int count = 0;
while (count
int[] a = new int[N];
Random = new Random();
for (int i = 0; i
public static String reverse(String s) {
int N = s.length();
String reverse = "";
for (int i = 0; i
public static String reverse(String s) {
int N = s.length();
if (N <= 1) return s; string left=s.substring(0, n/2); string right=s.substring(N/2, n); return reverse(right) + reverse(left); }
public static String reverse(String s) {
int N = s.length();
char[] a = new char[N];
for (int i = 0; i
public static String random(int N) {
if (N == 0) return "";
int r = (int) (26 * Math.random()); // between 0 and 25
char c = 'a' + r; // between 'a' and 'z'
return random(N/2) + c + random(N - N/2 - 1);
}
Answer: N choose 3 = N(N-1)(N-2)/3!.
int x = 0;
for (int i = 0; i
Creative Exercises
void no2slash(char *name) {
int x, y;
for(x = 0; name[x]; )
if(x && (name[x-1] == '/') && (name[x] == '/'))
for(y = x+1; name[y-1]; y++)
name[y-1] = name[y];
else x++;
}
offset = 0
for i = 2 to n do
if a[i-1] = '/' and a[i] = '/' then
offset = offset + 1
else
a[i-offset] = a[i]
end for
5 23 54 67 89
6 69 73 74 90
10 71 83 84 91
60 73 84 86 92
99 91 92 93 94
1 0 1 1 1 0 0 0
0 0 0 1 0 1 0 0
0 0 1 1 1 0 0 0
0 0 1 1 1 0 1 0
0 0 1 1 1 1 1 1
0 1 0 1 1 1 1 0
0 1 0 1 1 1 1 0
0 0 0 1 1 1 1 0
