Computer Science Building, Princeton University


Introduction to CS


1.  A Simple Machine

2.  Java Programming

3.  OOP

    • Using Data Types

    • Creating Data Types

    • Modular Programming

    • Encapsulation

    • Inheritance

4.  Data Structures

5.  A Computing Machine

6.  Building a Computer

7.  Theory of Computation

8.  Systems

9.  Scientific Computation

10.  Perspective


 Lecture Notes

Assignments

FAQ









3.4. ENCAPSULATION AND ABSTRACT DATA TYPES


This section under major construction.

Developing abstract models for our data and for the ways in which our programs process those data is an essential ingredient in the process of solving problems with a computer. All computer systems are based on layers of abstraction: We adopt the abstract model of a bit that can take on a binary 0-1 value from certain physical properties of silicon and other materials; then we adopt the abstract model of a machine from dynamic properties of the values of a certain set of bits; then we adopt the abstract model of a programming language that we realize by controlling the machine with a machine-language program; then we adopt the abstract notion of an algorithm implemented as a Java program. We do not need to understand how each layer works in order to use it, only what it does. This approach has two main advantages. First, we can write useful Java programs even though we may not fully understand the physical properties of silicon. Seconds, if someone invents a better way to represent a bit that relies on light instead of silicon, we can simply change the computer, but reuse all of our Java programs. In this section, we consider how to adapt the basic Java class mechanism to achieve this same kind of separation in layers of abstraction built with Java code.

Abstract data types.

Designing reusable data types requires great care. Obviously we expect our data types to work according to their published APIs. The next most important design feature is separating the internal representation from the API, hiding all of the implementation details. An abstract data type (ADT) is a set of values and a set of operations on those values that is accessed only through an interface. The interface does not specify the way the values are represented or the way the operations are implemented. It only specifies which operations are permissible. "A nut and bolt are compatible if they have the same diameter and threads per inch." This is the interface. However, "they may be made of carbon steel, steel, bronze, nylon, titanium, whatever." We refer to a program that uses an ADT as a client and a program that specifies the data type as an implementation. The interface is a contract between the client and the implementation which specifies which operations are supported and what is the required behavior. Encapsulation is the process of separating and compartmentalizing the client from the implementation by hiding the internal representation of the data. By prohibiting clients to access directly the data representation, we are free to change it! Using ADTs enables us to: For example, although we've been using strings since our first program, we may have no idea how Java implements them. This is perfectly fine because we haven't had a pressing need to know their internal representation. However, there are times when we must be aware of the performance guarantees of individual operations. These should be documented in the interface since we should not be expected to glean them from the internal representation.

Zip codes and time bombs.

There are many examples of programmers making assumptions about the internal representation of a data type, and later having to pay a big price to fix their mistakes.

Intuition for standards. It may be easier to understand why writing to an interface or spec is so important by considering other domains. Fax machines, radio, MPEG-4, MP3 files, PDF files, HTML, etc. Simpler to use a common standard. Lack of incompatibilities enables business opportunities that would otherwise be impossible. Connection with patents? There is a tension between software patents and software standards. A patent grants a temporary monopoly, whereas a standard is meant to be used by all. An interesting article about the development of the MPEG standards.

One of the challenges of writing software is making it portable so that it works on a variety of operating systems including Windows, OS X, and Linux. Java Virtual Machine enables portability of Java across platforms.

ADTs in Java. Java's class mechanism directly supports the design of ADTs. We enforce data hiding by using the private and public access modifiers. When we declare an instance variable as private, this means that the client (code written in another module) cannot directly access that instance variable. The client can only access the ADT through the public methods and constructors. All of the data type that we have built use only private instance variables, so they are already ADTs. As a general rule, always use private as the access modifier for your instance variables. If you use public then you will greatly limit any opportunity to modify the class over time. Client programs may rely on your public variable in thousands of places, and you will not be able to remove it without breaking dependent code.

Date.

Give two implementations of a date ADT. For simplicity, we ignore leap years. See Exercise XYZ to deal with leap years. Include a constructor, toString, next, after. Program Date.java represents a date using three integers for the month, day, and year. Program Date2.java represents a date a single integer that counts the number of days since January 1, 1970. By accessing the data only through the interface, our data type can perform range checking (e.g., no such thing as January 0 or February 30). This enable our data type to maintain objects that are always in a consistent state and preserve data type invariants. The time invested in performing routine error-checking more than pays off as we start writing more complex programs. Using encapsulation enables us to isolate to a few places where an object can change state.

Generating random numbers.

Different methods to generate a random number from the standard Gaussian distribution. We have seen one method in Section xyz. Trigonometric method is simple, but may be slow due to calling several transcendental functions. More importantly, it suffers from numerical stability problems when x1 is close to 0. Better method is alternate form of Box-Muller method. reference. Both methods require two values from a uniform distribution and produce two values from the Gaussian distribution with mean 0 and standard deviation 1. Can save work by remembering the second value for the next call. Can also use Random in Java's library. Their implementation is the polar method of Box-Muller, saving the second random number for a subsequent call. (See Knuth, ACP, Section 3.4.1 Algorithm C.)

ADT for generating a random bit: LFSR.

Statistics.

Encapsulation enables us to easily replace one implementation with another. This is desirable when we anticipate creating an improved implementation with better performance or accuracy properties. We consider a simple, but representative example below. Our goal is to store statistics for a set of real values, say the sample mean x bar and the sample variance s2.

mean and variance

We want our data type to support the following methods: add, size, mean, stddev, variance. OnePass.java stores n, the sum, and the sum of squares. TwoPass.java stores n, the sum, and all of the data values in an array. The one pass algorithm is faster and uses substantially less space, but is susceptible to roundoff error. StableOnePass.java is a well-engineered alternative.

stable one pass algorithm for sample mean and variance

It is numerically stable and does not require an array to store the elements.

public void add(double x) {
   n++;
   s = s + 1.0 * (n-1) / n * (x - m) * (x - m);
   m = m + (x - m) / n;
}

public double mean()     { return m;            }
public double variance() { return s / (n - 1);  }

Complex numbers.

In Section 3.2, we considered program Complex.java, which is an ADT for complex numbers. Internally, it stores a complex number c = x + iy as its real and imaginary parts (x, y). This is called the rectangular representation. We can also represent a complex number as c = r (cos θ + i sin θ). This is known as the polar representation, and we refer to r as the modulus and θ as the angle. This representation is especially useful if we are multiplying or dividing complex numbers. Program PolarComplex.java implements the exact same complex number ADT as Complex.java, but stores the polar representation (r, θ) instead of the the rectangular representation (x, y). Since it has exactly the same public methods and public constructor as the version with rectangular coordinates, we can essentially use it interchangeably with the other version of Complex.java. The main difference to the client is in different performance properties. For example, implementing an exponentiation method becomes much easier with the polar representation, especially if the power is a negative integer or if the power itself is complex! Other operations become more cumbersome to implement and expensive to compute, e.g., addition and subtraction. There is one minor caveat that makes these two implementations not completely interchangeable: floating point precision may influence the two implementations in different ways. For example, the rectangular implementation represents 3 + 4i exactly, whereas the polar implementation must store an approximation to the angle θ since it is an irrational number. Such difference should not effect a well-behaved client, but we should be aware of any places where our ADT leaks information.

Signals and waveforms.

Measure sound in time domain (represent as fluctuations in air pressure over time) or frequency domain (represent as freqencies and amplitudes). Basilar membrane of human ear converts from time domain to frequency domain for processing my brain. Explains why we "hear" a collection of frequencies played simultaneously as separate notes. Most common representation = time domain, but most signal analysis techniques are designed to work in frequency domain. (discard phase information) In section XYZ, we'll learn about an elegant algorithm called the Fast Fourier Transform which converts between time and frequency domains.

Polynomials.

Data structure representations of degree N polynomial with T terms. Perhaps the most natural representation is to explicitly store the coefficients and exponents. There are equivalent mathematical representations.

More scientific applications.

In Section 9.2 we implement a data type to represent polynomial functions, and implement classical operations including addition, multiplication, composition, evaluation, integration, and differentiation. In Section 9.5 we implement a data type to represent real-valued matrices, and implement classical operations including addition, multiplication, and solving Ax = b. For both data types, there are two natural representations, one for sparse polynomials and matrices, and one for dense polynomials and matrices. Performance differences are enormous. For example, airline crew scheduling problems might involves ten of millions of rows and a hundred thousand columns. Solving problems of this size is a formidable challenge, and it could not even be attempted without exploiting sparsity.

First class ADTs. A first class ADT is an ADT for which you can create as many instances of the data type as desired. In.java is an object-oriented version of StdIn.java. It supports reading input from standard input, from a file, from a web site, or from a network socket. Unlike StdIn.java, you can open up several input streams in the same program. Similarly Draw.java is an object-oriented version of StdDraw.java. You can create and display several independent drawing windows.

Interface. To see all of the non-private members of a class, type

javap Complex.class


Q + A

Q. How would I add a constructor to the Complex data type that takes r and θ as arguments?

A. It's a problem since you already have one constructor that takes two real arguments. Instead, you should probably create two methods createRect(x, y) and createPolar(r, theta) that create and return new objects. This style will also help remind the client to pass in arguments of the correct form.

Q. The instance variable x in type Complex is declared with access modifier private. How come in the plus method I can access both this.x and b.x?

A. Declaring an instance variable to be private means that it is not directly accessible from any other class. Methods within the Complex class can access (read or write) the instance variables of any object in that class. Perhaps, Java should include a more restrictive access modifier, say superprivate, that makes an instance variable only accessible to the invoking object.

Q. If I don't specify an access modifier, what's the default?

A. It's something between public an private.

Q. Is it possible to defeat Java's type safety features?

A. If there are no bugs in the Java implementation, an untrusted program should not have access to the private data of a trusted program. This guarantee assumes that "the computer faithfully executes its specified instruction set." If a cosmic ray strikes the computer's memory, it can flip a bit, thereby invalidating this axiom. Appel and Govindavajhala showed how to craft a program so that when executed, any single bit error would yield a 70% chance of the program being have to gain unfettered access to the JVM. They also demonstrated that such errors could be induced by shining a 50 watt spotlight bulb at the memory for one minute. Smartcards are particularly susceptible since the attacker has physical access to the computer.

Exercises

  1. Add a method power to Complex so that a.power(b) returns the result of taking the complex number a and taking it to the complex power b.
  2. What is the principle value of ii?

    Answer: e-Π/2 = 0.207879576....

  3. Create a Rectangle ADT that represents a rectangle. Represent a rectangle by two points. Include a constructor, a toString method, a method for computing the area, and a method for drawing using our graphics library.
  4. Repeat the previous exercise, but this time represent a Rectangle as the lower left endpoint and the width and height.
  5. Repeat the previous exercise, but this time represent a Rectangle as the center and the width and height.

Creative Exercises

  1. International bank account (depositDollars, depositEuros, withdrawDollars, withdrawEuros). How to represent money (cents, BigDecimal, precision, etc.)
  2. Color. Take linear combinations of colors. (RGB vs. CMYK)
  3. Picture. 2D array of colors, 2D array of rgba ints.
  4. Turtle location starting at origin (goForward, rotate, draw) Use polar and cartesian coordinates.
  5. Polar representation of points. Point.java and PointPolar.java implement the following point interface using rectangular and polar coordinates, respectively.
    Point()
    Point(double, double)
    double x()
    double y()
    double r()
    double theta()
    double distance(Point)
    public String toString()
    
  6. Spherical coordinates. Represent a point in 3D space using Cartesian coordinates (x, y, z) or spherical coordinates (r, θ φ). To convert from one to the other, use

    spherical coordinates
  7. Colors. Can represent in RGB, CMYK, or HSV formats. Natural to have different implementations of same interface.
  8. Time. Implement an ADT that represents the time of day, say with seconds, minutes, and hours. Representation 1: three fields for hours, minues, and seconds. Representation 2: number of seconds since 12 midnight. Include compareTo, plus.
  9. ZIP codes. Implement an ADT that represents a USPS ZIP code. Support both the original 5 digit format and the newer (but optional) ZIP+4 format.
  10. Vector fields. Write a data type for force vectors in two dimensions. You should be able to add vectors using the usual rules from physics using x sin(theta) and x cos (theta)....
  11. Genome. Implement a data type to store the genome of an organism. Biologists often abstract away the genome to a sequence of nucleotides (A, C, G, or T). The data type should support the method addNucleotide, nucleotideAt(int i), and doSomeComputation. Perhaps change to addCodon.
    • StringGenome.java has one instance variable of type String. It implements addNucleotide with string concatenation. Each method call takes time proportional to the size of the current genome. Not practical spacewise either for large genomes since nucleotide is stored as a 16-bit char.
    • Genome.java implements a genome as an array of characters. The size of the array is doubled when the array fills up. The method addNucleotide is now constant time, but the space consumption is still 16 bits per nucleotide.
    • CompactGenome.java implements a genome as boolean array. We need to use two bits per nucleotide since there are 4 different nucleotides. As in the previous implementation, we use a dynamic array with repeated doubling. Now, each nucleotide consumes 2 bits of storage (instead of 16).

    Include a method to return the reverse-complemented genome. Include a method for testing for equality.

    Codon usage table. Print out summary statistics for each codon, e.g.,

    
    UUU 13.2(   724)  UCU 19.6(  1079)  UAU 16.5(   909)  UGU 12.4(   680)
    UUC 23.5(  1290)  UCC 10.6(   583)  UAC 14.7(   808)  UGC  8.0(   438)
    UUA  5.8(   317)  UCA 16.1(   884)  UAA  0.7(    38)  UGA  0.3(    15)
    UUG 17.6(   965)  UCG 11.8(   648)  UAG  0.2(    13)  UGG  9.5(   522)
    
    CUU 21.2(  1166)  CCU 10.4(   571)  CAU 13.3(   733)  CGU 10.5(   578)
    CUC 13.5(   739)  CCC  4.9(   267)  CAC  8.2(   448)  CGC  4.2(   233)
    CUA  6.5(   357)  CCA 41.0(  2251)  CAA 24.9(  1370)  CGA 10.7(   588)
    CUG 10.7(   590)  CCG 10.1(   553)  CAG 11.4(   625)  CGG  3.7(   201)
    
    AUU 27.1(  1491)  ACU 25.6(  1405)  AAU 27.2(  1495)  AGU 11.9(   653)
    AUC 23.3(  1279)  ACC 13.3(   728)  AAC 21.0(  1151)  AGC  6.8(   374)
    AUA  5.9(   324)  ACA 17.1(   940)  AAA 32.7(  1794)  AGA 14.2(   782)
    AUG 22.3(  1223)  ACG  9.2(   505)  AAG 23.9(  1311)  AGG  2.8(   156)
    
    GUU 25.7(  1412)  GCU 24.2(  1328)  GAU 49.4(  2714)  GGU 11.8(   650)
    GUC 15.3(   843)  GCC 12.6(   691)  GAC 22.1(  1212)  GGC  7.0(   384)
    GUA  8.7(   478)  GCA 16.8(   922)  GAA 39.8(  2185)  GGA 47.2(  2592)
    

Lessons