4. FUNDAMENTAL ABSTRACT DATA TYPES
This chapter under major construction.
Overview. In this chapter we describe and implement some of the most important algorithms and data structures in use on computers today. We begin by considering a powerful framework for measuring and analyzing the efficiency of our programs. This enables us to compare algorithms and accurately predict performance. Next, we consider several novel algorithms for the classic problem of sorting. In preparation for various data structures on collections of objects, we describe how to create linked structures in Java, including linked lists and binary trees. Then, we use indexed and linked structures to build the most important higher level data structures, including stacks and queues (using arrays or linked lists), multisets (using arrays or linked lists), priority queues (using a binary heap), symbol tables (using hash tables or binary search trees), and graphs (using adjacency matrices or adjacency lists). We will learn how to implement the core operations and understand their performance characteristics. Then, we will learn how to access and use industrial strength versions of these data structures that comprise the Java Collections library. We will also encounter situations where the built-in libraries do not provide suitable APIs, and we must craft our own linked data structures.
Need for fast algorithms. According to Moore's Law, the number of transistors on a chip doubles every 18-24 months. As a result, computer get twice as fast in the same time frame. So, do we really need fast algorithms? Paradoxically, the answer is yes, and the faster computers get, the importance of fast algorithms becomes even more pronounced. One example is laying out circuits. The size of the problem doubles every 18-24 months as well....
The discovery of efficient algorithms for many problems has direct impact on our everyday lives. It enables new technology and new scientific discoveries. In section XYZ, we will consider the FFT, which has had major impact in the digital revolution. Another striking example is the problem of N-body simulation in stellar and molecular dynamics. The brute force solution requires computing N^2 different pairwise interactions. While an undergraduate at Princeton University, Andrew Appel discovered a more efficient method that takes time proportional to N log N. This discovery enabled physicists to perform research that would be otherwise impossible. Ab initio quantum mechanical simulations in materials science (N^3 to N). Image rendering, simulation, cryptography, and compilation are other areas that inherently eat up as many computer cycles as we can throw at the problem. When Pixar started in 1985, it took 8 hours to render one frame (1/24th of a second) of computer animation. Now, even though computers are XYZ times faster, it took 10 hours to render one frame of Finding Nemo on Pixar's 2,000 processor render farm because the artwork in each frame has become that much more complex. The rate at which you have to process video images depends on the rate at which another computer can generate them. According to this article, creating the movie The Hulk consumed more than 2.5 million computer hours.
Computational scientists have an insatiable thirst for CPU cycles. Particle physicists want to simulate as many particles as their supercomputer will allow since the results are more accurate if there are more particles. It is not uncommon to simulate billions or trillions of particles. As we'll see in Section 10.4 modern cryptography relies on the tension between factoring big integer and testing whether a big integer is prime. As computers have become more powerful, we need to use bigger and bigger integers to thwart potential adversaries from breaking our cryptosystems. Compiling large programming projects (e.g., Java or Microsoft Windows) can takes hours or days. As computers have become more powerful, programmers have developed more sophisticated programs which require more and more time to compile. Many scientific applications require years of CPU computing time (protein folding, climate modeling, N-body simulation). Plasma physicists often measure the time to perform one simulation in the tens or hundreds of thousands of CPU hours.
For a more advanced treatment of these topics, see
Algorithms in Java, 3rd edition by Robert Sedgewick, Addison Wesley.
Some of the code and figures in this section are directly
adapted from this source.
