Computer Science Building, Princeton University


Introduction to CS


1.  A Simple Machine

2.  Java Programming

3.  OOP

    • Using Data Types

    • Creating Data Types

    • Modular Programming

    • Encapsulation

    • Inheritance

4.  Data Structures

5.  A Computing Machine

6.  Building a Computer

7.  Theory of Computation

8.  Systems

9.  Scientific Computation

10.  Perspective


 Lecture Notes

Assignments

FAQ









3.1. USING JAVA DATA TYPES


This section is under construction.


Organizing the data for processing is an essential step in the development of a computer program. In this section we will describe how to use pre-defined data types for string processing and image processing. We'll consider both data types that are built into the Java language and standard libraries (String and Color) and those that we have created especially for use in this textbook (In, Out, and Picture). This serves a gradual introduction to data types and Java classes. In the next section we will learn how to create our own user-defined data types from scratch.

Data types.

A data type is a set of values and a set of operations defined on them. An instance of a given data type is one value from the proscribed set. For example, the data type int stores integers between -231 and 231 - 1 and supports a variety of arithmetic and logical operations that we have become accustomed to using in Chapter 2. The data type String consists of a sequence of UNICODE characters. The string data type supports a number of useful operations, including + for string concatenation.

Data types in Java.

There are eight data types built directly into the Java language, including int, double, and boolean. These map directly to hardware (e.g., registers in the CPU). Other data types, including String, are composed from these primitive types using a special construct in Java known as a class. A class is a blueprint that specifies which values and operations are permissible. In Java, we refer to these operations as methods. An object is an instance of a class, and represents one of the permissible values. Each object is associated with a class and is created from the class blueprint. It is possible to create many objects from the same blueprint, but each object stores its own value and is manipulated independently. For example, we might create several objects from the class String, each storing a different sequence of characters.

Color. We begin by considering the Color data type in the Java library. It represents colors using RGB format: a color is comprised of three integers (each between 0 and 255), representing the red, green, and blue intensities, respectively. For example, red is (255, 0, 0), green is (0, 255, 0) and blue is (0, 0, 255). Non-primary colors are obtained by mixing the three primary colors. For example, yellow is (255, 255, 0) and gray is (100, 100, 100). We can create a custom color by using the keyword new and specifying the appropriate parameters. The following code fragment creates two custom color objects, which we can access using the variables named magenta and gray.

Color magenta = new Color(255, 255,   0);
Color gray    = new Color(128, 128, 128);
We can manipulate a color object as a single composite entity. Color has accessor methods getRed, getGreen, and getBlue if we wish to retrieve the individual components. We invoke these methods by typing the name of the object, followed by the dot operator, followed by the method name, followed by any parameters in parentheses. The following code fragment from ColorTest.java prints out (255, 255, 0).
int r = magenta.getRed();
int g = magenta.getGreen();
int b = magenta.getBlue();
System.out.println("(" + r + ", " + g + ", " + b + ")");
To access the Color data type, we must include an import statement at the beginning of our program to notify the Java compiler of our intention to use it.
import java.awt.Color;
Two useful methods for image processing are brighter and darker. These methods returns brighter or darker versions of the invoking color. They scale each color component up or down by an arbitrary scale factor, typically 0.7, and truncate the result to be between 0 and 255. The following code fragment creates a version of dark magenta.
Color darkMagenta = magenta.darker();
The RGB values for this version of magenta are (178, 0, 178).

The table below summarizes the methods associated with the Color data type that we will be using. The Java 1.4.2 Color API contains a complete description.


COMMAND ARGUMENTS RETURN TYPE PURPOSE
Color int r
int g
int b
creates and initializes a new color, with the given red, green, and blue intensities (0 - 255)
getRed int get the red intensity (0 - 255)
getGreen int get the green intensity (0 - 255)
getBlue int get the blue intensity (0 - 255)
brighter Color return a brighter version of the invoking color
darker Color return a darker version of the invoking color
toString String return a string representation of the invoking color


Functions on colors. We can create Java functions that manipulate colors. The monochrome luminance of a color is its effective brightness. The NTSC formula for converting an RGB image to grayscale is derived from the eye's sensitivity to red, green, and blue: 0.2989r + 0.5870g + 0.1140b. The following is a Java function that takes a color as an input, and returns the corresponding grayscale color.

public static Color toGray(Color color) {
   int r = c.getRed();
   int g = c.getGreen();
   int b = c.getBlue();
   int luminance = (int) (0.2989*r + 0.5870*g + 0.1140*b);
   Color gray = new Color(luminance, luminance, luminance);
   return gray;
}
Objects enable us to write programs to manipulate colors as compositie entities. In principle, we could pass the three integers r, g, and b to the function, but this is awkward and error-prone. Instead, we can create a single composite entity (comprised of the three integers) and pass that. Using objects is essential when we want to return multiple values, possible of different types, from a function. Recall that functions can only return one value, although it can be a primitive type or a reference to an object.

Raster-based graphics. We are familiar with using StdDraw to plot geometric objects (circles, lines, rectangles). Program Picture.java is a data type for pixel-based graphics. It supports reading in a JPEG, PNG, or GIF file, getting and setting the colors of the individual pixels, and saving the resulting picture to a file or displaying on the screen. The method getColor(i, j) returns the color of pixel (i, j) using the Color data type. The method setColor(i, j, c) sets the color of pixel (i, j) according to the Color object referenced by c. In keeping with image processing tradition, (0, 0) is the upper leftmost pixel. These are our first examples of methods that take parameters. They use exactly the same syntax as when we call functions. Here are the interface methods for the class Picture.


COMMAND ARGUMENTS RETURN TYPE PURPOSE
Picture String s creates and initializes a new picture, read from the file s in JPG, PNG, or GIF format
Picture int w, int h create and initialize an empty w-by-h pixel image
getHeight int return the height of the image in pixels
getWidth int return the width of the image in pixels
getColor int i, int j Color return the color of pixel (i, j)
setColor int i, int j, Color c void set pixel (i, j) to color c
show void view the image in a window
save String s void save the image to a file of type png or jpg


Spectrum. First, we'll consider a simple application where we plot an N-by-N grid of pixels, each in a different color. Program Spectrum.java takes a command line parameter N and plots an N-by-N image of the spectrum using the Picture data type. It plots pixel (i, j) in the color (r, g, b), where r = (i * 256) / N, g = 128, and b = (j * 256) / N.

public static void main(String[] args) {
   int N = Integer.parseInt(args[0]);
   Picture pic = new Picture(N, N);
   for (int i = 0; i 
      Color Spectrum

Digital image processing. Image processing refers to the act of manipulating the individual pixels of a digital image. For example, crop, shrink, adjust contrast, brighten, sharpen, blur, remove red-eye. We use the Picture data type to process an existing image saved on our hard drive.


Strings.

We've been using strings and string concatenation since our very first Java program. Now we will explore many additional operations built in to Java's String data type that open up the world of text processing. Before using them, we must know their calling conventions. The Application Programming Interface (API) describes the set of operations associated with a data type and how to invoke them. You can find formal descriptions in Sun's online documentation of the String class. The table below summarizes several useful string processing methods and gives brief examples to illustrate their usage. As with arrays, the characters of a string are indexed starting at 0.


Operation Description Input String s Return value
s.length() return length of s Hello 5
s.charAt(1) return character of s with index 1 Hello e
s.substring(1, 4) return substring from 1 (inclusive) to 4 (exclusive) Hello ell
s.substring(1) return substring starting at index 1 Hello ello
s.toUpperCase() return upper case version of s Hello HELLO
s.toLowerCase() return lower case version of s Hello hello
s.startsWith("http:") does s start with http:? http://www.cnn.com true
s.endsWith(".com") does s end with .com? http://www.cnn.com true
s.indexOf('.') return index of first character in s that is . Hello.java.html 5
s.lastIndexOf('.') return index of last character in s that is . Hello.java.html 11
s.indexOf(".java") return index of first occurrence of .java in s Hello.java.html 5
s.indexOf(".java", 5) return index of first occurrence of .java in s, starting at index 5 Hello.java.html -1
s.trim() return s with leading and trailing whitespace removed " Hello there " "Hello there"
s.replaceAll("," ".") return s with all occurrences of , with . 13,125,555 13.125.555
s.compareTo("abc") compare s to abc lexicographically "abc" 0


Convert from hexadecimal to decimal. Program Hex2Decimal.java contains a function takes a hexadecimal string (using A-F for the digits 11-15) and returns the corresponding decimal integer. It uses a number of the string library methods and Horner's method.

public static int hex2decimal(String s) {
   String digits = "0123456789ABCDEF";
   s = s.toUpperCase();
   int val = 0;
   for (int i = 0; i 

Alternate solution: Integer.parseInt(String s, int radix). More robust, and works with negative integers.

Input.

In Section XYZ we learned how to read numerical and text input from the terminal using StdIn.java. However, this supported only one input stream (standard input). Sometimes our programs need to read data from several input sources (standard input, files, web sites). Program In.java is a convenient class to do exactly this. We can specify which input stream by using the corresponding constructor. If we use the no-argument constructor, then we obtain standard input. If we pass the constructor a string, In interprets this as the name of a file or web site and reads input from that source.

Program Cat.java takes two strings as command line inputs (names of text files), and concatenates the two text files, and prints the results to standard output.

public static void main(String[] args) {
    In in1 = new In(args[0]);
    In in2 = new In(args[1]);
    System.out.println(in1.readAll());
    System.out.println(in2.readAll());
}

Screen scraping.

Now we illustrate a nice combination of using the built-in String library with our In library. The goal is to query a web page, extract some information, and report back the results. This process in known as screen scraping. Program StockQuote.java the symbol of New York Stock Exchange stock and prints out its current trading price. To report the stock price of Google (NYSE symbol = goog), it reads the Web page http://finance.yahoo.com/q?s=goog". Then, it identifies the relevant information using indexOf and substring. The relevant information is enclosed between the tags <b> and </b> immediately following the text Last Trade.
public static void main(String[] args) {
    String name = "http://finance.yahoo.com/q?s=" + args[0];
    In in = new In(name);
    String input = in.readAll();
    int p        = input.indexOf("Last Trade:", 0);
    int from     = input.indexOf("<b>", p);
    int to       = input.indexOf("</b>", from);
    String price = input.substring(from + 3, to);
    System.out.println(price);
}

The program heavily depends on the web page format of Yahoo. If Yahoo changes their web page format, we would need to change our program. Nevertheless, this is likely more convenient than maintaining the data ourselves.

Parsing. The string library methods split and matches are especially useful for parsing data files. We will explore more general applications of matches in Section 7.1 called regular expressions.

Character. The library Character has several useful functions for checking properties of characters, including isWhitespace, isLowerCase, isUpperCase, isDigit, isLetter.

Output.

We are also interested in writing output to files or the network instead of just standard output. Program Out.java provides a mechanism for writing data to various output streams. Using Out, writing to a file is almost as easy as writing to standard output. The following code fragment takes the name of two files as command line inputs and copies the text in the first file to the second file.
public class Copy {
   public static void main(String[] args) {
      In  in  = new In (args[0]);
      Out out = new Out(args[1]);
      String s = in.readAll();
      out.println(s);
   }
}

Random.

Another useful class in the Java library is Random. Unlike Math.random, you can set the pseudo-random number generator seed. It also has pre-packaged routines for generating an integer in a certain range or generating a variable with a Gaussian distribution.

Primitive types vs. reference types.

Java has two different categories of data types: primitive types (also known as value types) and reference types. We're already familiar with many of the eight primitive types in Java: boolean, char, byte, short, int, long, float, and double. Value types store the integral, floating pointing or boolean value, e.g., 17, 3.14, or true. The primitive types and the associated operations are typically implemented directly in hardware, so they are especially efficient. When we declare a variable of a primitive type, the system allocates enough memory to store a value of that type (e.g., 4 bytes for an int and 8 bytes for a double). We can initialize or modify the value using the assignment operator. However, when we pass a variable to a function, the system passes a copy of the integral, floating point or boolean value itself. This is called pass-by-value. One consequence is that if we pass an integer variable a to a function, the function cannot change the value that is stored in a since we have only passed a copy of the value stored in a.

Reference types have very different characteristics. A reference stores the memory address of an object. It captures the difference between a thing and its name.

Thing Name
Web page www.princeton.edu
Email inbox wayne@princeton.edu
Bank account 45-234-23310076
US citizen 166-34-9114
Word of TOY memory 1C
Byte of computer memory FFBEFB24
Cell phone (609) 876-5309
House 35 Olden Street

We say that the reference points to the object and often draw an arrow from the reference to what it points to. A reference of a given type always points to an object of the correct type or to the special value null which indicates that the reference points to nothing. Each reference points to one object, but two or more references can point to the same object. We initialize a reference by using an assignment statement. We can either set it equal to another reference (of the appropriate type) or we can also use the keyword new to make it point to a newly created object. Java manipulates objects by reference, but passes them to methods and functions by value. This means that if we pass a variable p of type Picture to a function, the function can change the state of the object referenced by p, e.g., change the colors of some of the pixels. It cannot, however, change what object p references.

Analogy. Object = house. Reference = paper with street address of house written in pencil. Each piece of paper can have at most one address. We can give piece of paper to a house painter and tell them to paint the house red. Can have multiple pieces of paper with the same address. When the house is painted, both pieces of paper have the street address of the same red house. It's possible to erase what's on the paper, and write down a new street address. But if you change what's written on your piece paper, it doesn't change what's written on my piece of paper.

In Program xyz, it is important to compare the strings s1 s2 with s1.equals(s2) instead of (s1 == s2). The former is a built-in method that tests whether the two strings are composed of exactly the same sequence of characters. The latter checks whether the two objects reference the same location in memory.

The distinction between primitive and reference types is a tradeoff between efficiency and elegance. There is substantial memory overhead (around 16 bytes) associated with creating each new object. OOP purists would argue that a language should not have any primitive types, only objects and reference types.

Arrays are objects.

Arrays are treated as objects in Java, except that there is special syntax for indexing into an array using square brackets. Otherwise, an array is just another example of a reference types. When we pass an array to a function, the system passes a copy of the reference, not a copy of the array. This means that the function is free to modify the contents of the array. If the array is huge, there is substantial savings in passing a reference to the array. Give example of pass-by-value....

Mutable vs. immutable.

Color and String are immutable. You can't change the red, green, or blue components in a color once created. You can't change the individual characters in a string once created. This is a consequence of there intentionally not being any methods available like setRed or setCharAt. In, Out, and Picture are mutable. The setColor method in Picture is specifically designed to change the color of one particular pixel. Describe significance of immutability in design or move somewhere else.

Automatic memory management.

One of the most significant features of the Java programming language is its ability to automatically manage memory. Memory management is straightforward with primitive types: allocate a fixed chunk of memory (e.g., 4 bytes for an int) when we declare a variable, and release it when the variable goes out of scope. It is import to free the memory whenever possible since your computer only has a finite amount of memory, and it may eventually consume it all. Managing memory for reference types is substantially more challenging. Each time we create an object with the keyword new, the system finds a free chunk of memory of the right size, and reserves it for the object. When the object is no longer accessible (e.g., the last reference to it goes out of scope), we want the system to free up the memory and recycle it for it for use the next time you create an object with new. In many languages (including C and C++) the programmer is responsible for marking those objects that it no longer needs. This process is tedious and notoriously error-prone. If the programmer is not diligent, the system may slowly leak memory and eventually run out. Many modern languages (including Java) include a garbage collector to transfer the burden of memory management from the programmer to the system. The garbage collector periodically identifies chunks of memory that are not in use, and notifies the system to reclaim them. If a chunk of memory has no references to it, then it can be safely garbage-collected since the programmer would have no way of accessing it.

Q + A

Q. Why use the RGB format for representing colors?

A. Although RGB is not a particularly natural way to represent color, it is commonly used in television screens, computer monitors and digital cameras. The screens (CRT or LCD) are comprised of thousands or millions of tiny red, green, and blue dots. The device can light each pixel up in various degrees of brightness. We substitute green for the primary color yellow because it is much cheaper to produce phosphors that glow bright green.

Q. When using a linear filter, each pixel becomes a weighted average of its 8 neighbors. What do I do when the pixel has less than 8 neighbors because it is near the border?

A. You could assume the image is toroidal (periodic boundary conditions) and make the left boundary wrap around to the right boundary.

Q. How can I convert from an uppercase letter to an integer between 0 and 25?

A. Recall that characters are 16-bit integers and that the uppercase letters are store consecutively in ascending order. The expression c - 'A' does the job. You can also use c - '0' to convert from one of the characters '0' through '9' to the corresponding integer.

Q. Is there a difference between the empty string and null?

A. Yes. The empty string is a string consisting of 0 characters. You can invoke all of the usual string methods, e.g., length. You will get a NullPointerExceptionError if you try to invoke a method with a variable storing null.

Q. How can I check whether a string s is the empty string?

A. Use (s.equals("")) or (s.length() == 0).

Q. What's the substring trap?

A. The String method call s.substring(i, j) returns the substring of s starting at index i and ending at j-1 (not at j as you might suspect).

Q. Can I apply several string operations at once?

A. Yes, the statement s.trim().toLowerCase().equals("saturday") works as expected. The methods are called from left to right, and each of these methods returns the resulting string.

Q. Where can I download some test files for image processing?

A. USC SIPI contains standard test images (including Peppers and Baboon).

Q. What special capabilities do arrays have over other objects?

A. Indexing into the array using square brackets. Declaring an array involves specifying its type with square brackets. Initializing an array involves using either new with square braces or list its constituent values within curly braces.

Q. What special capabilities do strings have over other objects?

A. String concatenation with + and assignment using quoted sequences of characters.

Q. How can I pass an array to a function in such a way that the function cannot change the array?

A. You can't since arrays are mutable. However, you can achieve the same effect by building a wrapper data type and passing that instead. Stay tuned.

Q. How can I change the value of a string?

A. You can't since strings are immutable in Java. If you want a new string, then you must create a new one using string concatenation or one of the string methods that returns a new string such as toLowerCase or substring.

Q. I've heard that Java has no "pointers." Is this true?

A. It's true that Java doesn't have an explicit pointer type, but you should view Java references as "safe pointers." Java's implementation of references is opaque so you cannot do pointer arithmetic on them or cast them to numeric types. Java also automatically dereferences pointers as needed.

Q. Is it correct to say that Java passes primitive types by value and objects by reference?

A. Not quite. See this explanation. For C++ programmers, it's better to think of a Java reference as a "safe pointer." Java references behave like C++ references, except that assignment and == work like pointers.

Lessons

  1. Use s.charAt(i) to access the ith character of string s.
  2. Don't forget that string indices start at 0, not 1.

Exercises

  1. Write a function that takes as input a string and returns the number of occurrences of the letter e.
  2. Write a program that takes a command line input string s, reads strings from standard input, and prints out the number of times s appears. Hint: use don't forget to use equals instead of == with references.
  3. Write a program that reads in the name of a month as a command line parameter and prints the number of days in that month in a non leap year.
    
    public static void main(String[] args) {
        String[] months = { "January", "February", "March",
                            "April",   "May",      "June",
                            "July",    "August",   "September",
                            "October", "November", "December" };
        int[] days      = { 31, 28, 31, 30, 31, 30,
                            31, 31, 30, 31, 30, 31 };
        String name = args[0];
        for (int i = 0; i <12; i++) if (name.equalsignorecase(months[i])) system.out.println(name + " has " + days[i] + " days" ); } 

  4. Write a program that takes a command line input N and reads in N strings from standard input, and then sorts the strings in ascending order of length. Hint: use s.length() to compute the length of string s.
  5. Write a program Squeeze.java that takes as input a string and removes adjacent spaces, leaving at most one space in-a-row.
  6. What does the following code fragment do?
    public static void main(String[] args) {
       String s1 = args[0];
       String s2 = args[1];
       int length1 = s1.length();
       int length2 = s2.length();
       if (length1 > length2) System.out.println(length1);
       else                   System.out.println(length2);
    }
    
  7. Write a function that takes as input a string and returns the string in reverse order.
  8. What does the following recursive function return, given an input string s?
    public static String mystery(String s) {
       int N = s.length();
       if (N <= 1) return s; string a=s.substring(0, n/2); string b=s.substring(N/2, n); return mystery(b) + mystery(a); } 
  9. Describe the string that the following function returns, given a positive integer N?
    public static String mystery(int N) {
       String s = "";
       while(N > 0) {
           if (N % 2 == 1) s = s + s + "x";
           else            s = s + s;
           N = N / 2;
       }
       return s;
    }
    
  10. Write a function that takes as input a string and returns true if the string is a palindrome, and false otherwise. A palindrome is a string that reads the same forwards or backwards.
  11. Write a function that takes as input a string and returns true if the string is a Watson-Crick complemented palindrome, and false otherwise. A Watson-Crick complemented palindrome is a DNA string that is equal to the complement (A-T, C-G) of its reverse.
  12. Write a function that takes as input a DNA string of A, C, G, and T characters and returns the string in reverse order with all of characters replaced by their complements. For example, if the input is ACGGAT, then return ATCCGT.
  13. What does the following recursive function return, given two strings s and t of the same length?
    public static String mystery(String s, String t) {
       int N = s.length();
       if (N <= 1) return s + t; string a=mystery(s.substring(0, n/2), t.substring(0, n/2)); string b=mystery(s.substring(N/2, n), t.substring(n/2, n)); return a + b; } 
  14. Write a program that reads in a string and prints out the first character that appears exactly once in the string. Ex: ABCDBADDAB -> C.
  15. Given a string, create a new string with all the consecutive duplicates removed. Ex: ABBCCCCCBBAB -> ABCBAB.
  16. Given a string s, determine whether it represents the name of a web page. Assume that all web page names start with http:

    Solution: The easiest way is using the startsWith method in Java's string library, e.g., if (s.startsWith("http:")).

  17. Given a string s that represents the name of a web page, break it up into pieces, where each piece is separated by a period, e.g., http://www.cs.princeton.edu should be broken up into www, cs, princeton, and edu, with the http:// part removed. Use either the split or indexOf methods.
  18. Given a string s that represents the name of a file, write a code fragment to determine its file extension. The file extension is the substring following the last period. For example, the file type of monalisa.jpg is jpg, and the file type of mona.lisa.png is png.

    Library solution: this solution is used in Picture.java to save an image to the file of the appropriate type.

    String extension = s.substring(s.lastIndexOf('.') + 1);
    

  19. Given a string s that represents the name of a file, write a code fragment to determine its directory portion. This is the prefix that ends with the last / character (the directory delimiter); if there is no such /, then it is the empty string. For example, the directory portion of /Users/wayne/monalisa.jpg is /Users/wayne/.
  20. Given a string s that represents the name of a file, write a code fragment to determine its base name (filename minus any directories). For /Users/wayne/monalisa.jpg, it is monalisa.jpg.
  21. What does the following code fragment print out?
    String string1 = "hello";
    String string2 = string1;
    string1 = "world";
    System.out.println(string2);
    
  22. Write a program that reads in text from standard input and prints it back out, removing any lines that consisting of only whitespace.
  23. Write a program that reads in text from standard input and prints it back out, replacing all single quotation marks with double quotation marks.
  24. Write a program WidthChecker.java that takes a command line parameter N, reads text from standard input, and prints to standard output all lines that are longer than N characters (including spaces).
  25. What does the program LatinSquare.java print when N = 5?
    String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    for (int i = 0; i 

    A Latin square of order N is an N-by-N array consisting of N different symbols, such that each symbol appears exactly once in each row and column. Latin squares are useful in statistical design and cryptography.

  26. What does the following code fragment print?
    String s = "Hello World";
    s.toUpperCase();
    s.substring(6, 11);
    System.out.println(s);
    

    Answer: Hello World. The methods toUpperCase and substring return the resulting strings, but the program ignores these so s is never changed. To get it to print World, use s = s.toUpperCase() and s = s.substring(6, 11).

  27. What happens when you execute the following code fragment?
    String s = null;
    int length = s.length();
    

    Answer: you get a NullPointerException since s is null and you are attempting to dereference it.

  28. What are the values of x and y after the two assignment statements below?
    int x = '-'-'-';
    int y = '/'/'/';
    

    Suppose that a and b are each integer arrays consisting of 100 million integers. What does the follow code do. How long does it take?

    int[] t = a;
    a = b;
    b = t;
    
    Answer It swaps them, but it does so without copying millions of elements.
  29. What does the following statement do where c if of type char?
    System.out.println((c >= 'a' && c <= 'z') || (c>= 'A' && c <= 'z')); 
    Answer: prints true if c is an uppercase or lowercase letter, and false otherwise.
  30. Write an expression that tests whether or not a character represents one of the digits '0' through '9' without using any library functions.
    boolean isDigit = ('0' <= c && c <='9' ); 

  31. Write a program FlipY.java that reads in an image and flips it vertically.

Creative Exercises

  1. Bounding box. Write a program BoundingBox.java that reads in an image file and output the smallest bounding box (rectangle parallel to the x and y axes) that contains all of the non-white pixels. Useful for automatic cropping.
  2. Anti-aliasing. Anti-aliasing is a method of removing artifacts from representing a smooth curve with a discrete number of pixels. A very crude way of doing this (which also blurs the image) is to convert an N-by-N grid of pixels into an (N-1)-by-(N-1) by making each pixel be the average of four cells in the original image as below. Write a program AntiAlias that reads in an integer N, then an N-by-N array of integers, and prints out the antialiased version. Reference.

  3. Thresholding. Write a program Threshold.java that reads in a grayscale version of a black-and-white picture, creates and plots a histogram of 256 grayscale intensities, and determines the threshold value for which pixels are black, and which are white.
  4. Linear filters. A box filter or mean filter replaces the color of pixel (x, y) by the average of its 9 neighboring pixels (including itself). The matrix [1 1 1; 1 1 1; 1 1 1] / 9 is called the convolution kernel. The kernel is the set of pixels to be averaged together. Program MeanFilter.java implements a mean filter using the Picture data type.
  5. Blur filter. Use low-pass 3-by-3 uniform filter [1/13 1/13 1/13; 1/13 5/13 1/13; 1/13, 1/13, 1/13].
  6. Emboss filter. Use prewitt masks [-1 0 1; -1 1 1; -1 0 1] (east) or [1 0 -1; 2 0 -2; 1 0 -1], [-1 -1 0; -1 1 1; 0 1 1] (south-east),
  7. Sharpen filter. Psychophysical experiments suggest that a photograph with crisper edges is more aesthetically pleasing than exact photographic reproduction. Use a high-pass 3-by-3 filter. Light pixels near dark pixels are made lighter; dark pixels near light pixels are made darker. Laplace kernel. Attempts to capture region where second derivative is zero. [-1 -1 -1; -1 8 -1; -1 -1 -1]
  8. Oil painting filter. Set pixel (i, j) to the color of the most frequent value among pixels with Manhattan distance W of (i, j) in the original image.
  9. Reverse string. Write a recursive function to reverse a string. Do not use any loops. Hint: use the String method substring.
    static String reverse(String s) {
        int N = s.length();
        if (N == 0) return "";
        else return reverse(s.substring(1, N)) + s.charAt(0);
    }
    
  10. Frequency analysis of English text. Write a program LetterFrequency.java that reads in text from standard input (e.g., Moby Dick) and calculate the fraction of times each of the 26 lowercase letters appears. Ignore uppercase letters, punctuation, whitespace, etc. in your analysis. Use CharStdIn.java from Section 2.4 to read process the text file.
  11. Complemented DNA string. Write a program to read in a DNA string (A, C, T, G) and print out its complement (substitute A for T, T for A, C for G, and G for C). Hint: use replaceAll several times, but be careful.
  12. Print longest word. Read a list of words from standard input, and print out the longest word. Use the length method.
  13. Print longest word(s). Repeat the previous exercise, but print out all of the longest words if there is a tie, say up to a maximum of 10 words. Use an array of strings to store the current longest words.
  14. Parsing command line options. Unix command line programs typically support flags which configure the behavior of a program to produce different output, e.g., "wc -c". Write a program that takes any number of flags from the command line and runs whichever options the user specifies. To check options, use something like if (s.equals("-v")).
  15. Capatilization. Write a program Capitalizer.java that reads in text strings from standard input and modifies each one so that the first letter in each word is uppercase and all other letters are lowercase.
  16. Reverse domain. Write a program to read in a domain name as a command line input and print the reverse domain. For example, the reverse domain of cs.princeton.edu is edu.princeton.cs. This is useful for web log analysis.
  17. Railfence transposition cipher. Write a program RailFenceEncoder.java that reads in text from standard input and prints out the characters in the odd positions, followed by the even positions. For example, if the original message is "Attack at Dawn", then you should print out "Atc tDwtaka an". This is a crude form of cryptography.
  18. Railfence transposition cipher. Write a program RailFenceDecoder.java that reads in a message encoded using the railfence transposition cipher and prints out the original message by reversing the encryption process.
  19. Scytale cipher. The scytale cipher is one of the first cryptographic devices used for military purposes. (See The Code Book, p. 8 for a nice picture.) It was used by the Spartans in the fifth century BCE. To scramble the text, you print out every kth character starting at the beginning, then every kth character starting at the second character, and so forth. Write a pair of programs ScytaleEncoder.java and ScytaleDecoder.java that implement this encryption scheme.
  20. Kama-sutra cipher. The Kama-sutra, written in the fourth century BCE by Vatsyayana, outlines 64 arts that women should study. Number 45 outlines a method to help women conceal their secret affairs. Each letter in the alphabet is paired up with another one, as in the table below:
    A B C E F G H K L M N P R
    Q D Z U J I X Y W S O V T
    

    Then a message is encoded by replacing each letter with its pair. For example, the message "MEET AT ELEVEN" is encoded as "SUUR QR UWUPUO". This is one of the earliest known substitution ciphers. Write a program KamaSutra.java that scrambles a message using this scheme. Observe that you can unscramble a message by applying the same scheme again.

  21. Password checker. Write a program that reads in a string from the command line and checks whether it is a "good" password. Here, assume "good" means that it (i) is at least 8 characters long, (ii) contains at least one digit 0-9, (iii) contains at least one upper case letter, (iv) contains at least one lower case letter, and (v) contains at least one non-alphanumeric character.
  22. Subsequence. Given two strings s and t, write a program Subsequence.java that determines whether s is a subsequence of t. That is, the letters of s should appear in the same order in t, but not necessarily contiguously. For example accag is a subsequence of taagcccaaccgg.
  23. Bible codes. Some religious zealots believe that the Torah contains hidden phrases that appear by reading every kth letter, and that such pattern can be used to find the Ark of the Covenant, cure cancer, and predict the future. Results not based on scientific method and results have been debunked by mathematicians and attributed to illicit data manipulation. Using the same methodology one can find statistically similar patterns in a Hebrew translation of War and Peace.
  24. Word chain checker. Write a program that reads in a list of words from the command line and prints true if they form a word chain and false otherwise. In a word chain, adjacent words must differ in exactly one letter, e.g., HEAL, HEAD, DEAD, DEED, DEER, BEER.
  25. Haiku detector. Write a program that reads in text from standard input and checks whether it forms a haiku. A haiku consists of three lines containing the correct number of syllables (5, 7, and 5, respectively). For the purpose of this problem, define a syllable to be any contiguous sequence of consecutive vowels (a, e, i, o, u, or y). According to this rule, haiku has two syllables and purpose has three syllables. Of course, the second example is wrong since the e in purpose is silent.
  26. ISBN numbers. Write a program to check whether an ISBN number is valid. Recall check digit. An ISBN number can also have hyphens inserted at arbitrary places. Use the string method replaceAll("-", "").
  27. Longest common prefix. Write a function that takes two input string s and t, and returns the longest common prefix of both strings. For example, if s = ACCTGAACTCCCCCC and t = ACCTAGGACCCCCC, then the longest common prefix is ACCT. Be careful if s and t start with different letters, or if one is a prefix of the other.
  28. Complemented palindrome detector. In DNA sequence analysis, a complemented palindrome is a string equal to its reverse complement. Adenine (A) and Thymine (T) are complements, as are Cytosine (C) and Guanine (G). For example, ACGGT is a complement palindrome. Such sequences act as transcription-binding sites and are associated with gene amplification and genetic instability. Given a text input of N characters, find the longest complemented palindrome that is a substring of the text. For example, if the text is GACACGGTTTTA then the longest complemented palindrome is ACGGT. Hint: consider each letter as the center of a possible palindrome of odd length, then consider each pair of letters as the center of a possible palindrome of even length.
  29. DNA validation. Write a function that takes as input a string and returns true if it consists entirely of A, C, G, and T's, and false otherwise.
  30. Highest density C+G region. Given a DNA string s of A, C, T, G and a parameter L, find a substring of s that contains the highest ratio of C + G characters among all substrings that have at least L characters.
  31. DNA to RNA. Write a function that takes a DNA string (A, C, G, T) and returns the corresponding RNA string (A, C, G, U).
  32. cDNA to mRNA. Write a program that reads in a cDNA sequnce (A, C, T, G) and prints out the corresponding mRNA sequence (replace T with U). Write a function that takes as input a DNA string (A, C, G, T) and returns the complementary base pairs (T, G, C, A).
  33. DNA complement. Write a function that takes as input a DNA string (A, C, G, T) and returns the complementary base pairs (T, G, C, A). DNA is typically found in a double helix structure. The two complementary DNA strands are joined in a spiral structure.
  34. Circular shifts. Application: computational biology. A string s is a circular shift of a string t if its characters can be circularly shifted to the right by some number of positions, e.g., actgacg is a circular shift of tgacgac, and vice versa. Write a program that checks whether one string s is a circular shift of another t. Hint: it's a one liner with indexOf and string concatenation.
  35. Substring of a circular shifts. Write a function that takes two strings s and t, and returns true if s is a substring of a circular string t, and false otherwise. For example gactt is a substring of the circular string tgacgact.
  36. DNA to Protein. A protein is a large molecule (polymer) consisting of a sequence of amino acids (monomers). Some examples of proteins are: hemoglobin, hormones, antibodies, and ferritin. There are 20 different amino acids that occur in nature. Each amino acid is specified by three DNA base pairs (A, C, G, or U). Write a program to read in a protein (specified by its base pairs) and converts it into a sequence of amino acids. Use the following table. For example, the amino acid Isoleucine (I) is encode by AUA, AUC, or AUU.

    Rosetta stone of life.

    UUU Phe    UCU Ser    UAU Tyr    UGU Cys
    UUC Phe    UCC Ser    UAC Tyr    UGC Cys
    UUA Leu    UCA Ser    UAA ter    UGA ter
    UUG Leu    UCG Ser    UAG ter    UGG Trp
    
    CUU Leu    CCU Pro    CAU His    CGU Arg
    CUC Leu    CCC Pro    CAC His    CGC Arg
    CUA Leu    CCA Pro    CAA Gln    CGA Arg
    CUG Leu    CCG Pro    CAG Gln    CGG Arg
    
    AUU Ile    ACU Thr    AAU Asn    AGU Ser
    AUC Ile    ACC Thr    AAC Asn    AGC Ser
    AUA Ile    ACA Thr    AAA Lys    AGA Arg
    AUG Met    ACG Thr    AAG Lys    AGG Arg
    
    GUU Val    GCU Ala    GAU Asp    GGU Gly
    GUC Val    GCC Ala    GAC Asp    GGC Gly
    GUA Val    GCA Ala    GAA Glu    GGA Gly
    GUG Val    GCG Ala    GAG Glu    GGG Gly
    


    Amino acid Abbrev Abbrev   Amino acid Abbrev Abbrev
    Alanine ala A Lleucine leu L
    Arginine arg R Lysine lys K
    Asparagine asn N Methionine met M
    Aspartic Acid asp D Phenylalanine phe F
    Cysteine cys C Proline pro P
    Glutamic Acid glu E Serine ser S
    Glutamine gln Q Threonine thr T
    Glycine gly G Tryptophan trp W
    Histidine his H Tyrosine tyr Y
    Isoleucine ile I Valine val V


  37. Counter. Write a program that reads in a decimal string from the command line (e.g., 56789) and starts counting from that number (e.g., 56790, 56791, 56792). Do not assume that the input is a 32 or 64 bit integer, but rather an arbitrary precision integer. Implement the integer using a String (not an array).
  38. Arbitrary precision integer arithmetic. Write a program that takes two decimal strings as inputs, and prints out their sum. Use a string to represent the integer.
  39. Boggle. The game of Boggle is played on a 4-by-4 grid of characters. There are 16 dice, each with 6 letters on the them. Create a 4-by-4 grid, where each die appears in one of the cells at random, and each die displays one of the 6 characters at random.
    FORIXB MOQABJ GURILW SETUPL CMPDAE ACITAO SLCRAE ROMASH
    NODESW HEFIYE ONUDTK TEVIGN ANEDVZ PINESH ABILYT GKYLEU
    
  40. Generating cryptograms. A cryptogram is obtained by scrambling English text by replacing each letter with another letter. Write a program to generate a random permutation of the 26 letters and use this to map letters. Give example: Don't scramble punctuation or whitespace.
  41. Scrabble. Write a program to determine the longest legal Scrabble word that can be played? To be legal, the word must be in The Official Tournament and Club Wordlist (TWL98), which consists of all 168,083 words between 2 and 15 letters in TWL98. The number of tiles representing each letter are given in the table below. In addition, there are two blanks which can be used to represent any letter.
    a b c d  e f g h i j k l m n o p q r s t u v w x y z -
    9 2 2 4 12 2 3 2 9 1 1 4 2 6 8 2 1 6 4 6 4 2 2 1 2 1 2
    

  42. Soundex. The soundex algorithm is a method of encoding last names based on the way it sounds rather than the way it is spelled. Names that sound the same (e.g., SMITH and SMYTH) would have the same soundex encoding. The soundex algorithm was originally invented to simplify census taking. It is also used by genealogists to cope with names with alternate spellings and by airline receptionists to avoid embarrassment when later trying to pronounce a customer's name.

    Write a program Soundex.java that reads in two lowercase strings as parameters, computes their soundex, and determines if they are equivalent. The algorithm works as follows:

    1. Keep the first letter of the string, but remove all vowels and the letters 'h', 'w', and 'y'.
    2. Assign digits to the remaining letter using the following rules:
      1:  B, F, P, V
      2:  C, G, J, K, Q, S, X, Z
      3:  D, T
      4:  L
      5:  M, N
      6:  R
      
    3. If two or more consecutive digits are the same, delete all of the duplicates.
    4. Convert the string to four characters: the first character is the first letter of the original string, the remaining three characters are the first three digits in the string. Pad the string with trailing 0's if there are not enough digits; truncate it if there are too many digits.
  43. Longest word. Given a dictionary of words and a starting word s, find the longest word that can be formed, starting at s, and inserting one letter at a time such that each intermediate word is also in the dictionary. For example, if the starting word is cal, then the following is a sequence of valid words coal, coral, choral, chorale. Reference.
  44. Phone words. Write a program PhoneWords.java that takes a 7 digit string of digits as a command line input, reads in a list of words from standard input (e.g., the dictionary), and prints out all 7-letter words (or 3-letter words followed by 4-letter words) in the dictionary that can be formed using the standard phone rules, e.g., 266-7883 corresponds to compute.
    0:  No corresponding letters
    1:  No corresponding letters
    2:  A B C
    3:  D E F
    4:  G H I
    5:  J K L
    6:  M N O
    7:  P Q R S
    8:  T U V
    9:  W X Y Z
    
  45. Rot13. Rot13 is a very simple encryption scheme used on some Internet newsgroups to conceal potentially offensive postings. It works by cyclically shifting each lowercase or uppercase letter 13 positions. So, the letter 'a' is replaced by 'n' and the letter 'n' is replaced by 'a'. For example, the string "Encryption" is encoded as "Rapelcgvba." Write a program ROT13.java that reads in a String as a command line parameter and encodes it using Rot13.
  46. Longest Rot13 word. Write a program that reads in a dictionary of words into an array and determines the longest pair of words such that each is the Rot13 of the other, e.g., bumpily and unfiber.
  47. Thue-Morse weave. Recall the Thue-Morse sequence from Exercises in Section 2.3. Write a program ThueMorse.java that reads in a command line input N and plots the N-by-N Thue-Morse weave in turtle graphics. Plot cell (i, j) black if the ith and jth bits in the Thue-Morse string are different. Below are the Thue-Morse patterns for N = 4, 8, and 16.

    4-by-4 Thue-Morse pattern 8-by-8 Thue-Morse pattern 16-by-16 Thue-Morse pattern

    Because of the mesmerizing non-regularity, for large N, your eyes may have a hard time staying focused.

  48. Repetition words. Write a program Repetition.java to read in a list of dictionary words and print out all words for which each letter appears exactly twice, e.g., intestines, antiperspirantes, appeases, arraigning, hotshots, arraigning, teammate, and so forth.
  49. Text twist. Write a program TextTwist.java that reads in a word from the command line and a dictionary of words from standard input, and prints out all words of at least four letters that can be formed by rearranging a subset of the letters in the input word. This forms the core of the Yahoo game Text Twist. Hint: create a profile of the input word by counting the number of times each of the 26 letters appears. Then, for each dictionary word, create a similar profile and check if each letter appears at least as many times in the input word as in the dictionary word.
  50. Word frequencies. Write a program (or several programs and use piping) that reads in a text file and prints out a list of the words in decreasing order of frequency. Consider breaking it up into 5 pieces and use piping: read in text and print the words one per line in lowercase, sort to bring identical words together, remove duplicates and print count, sort by count.
  51. VIN numbers. A VIN number is a 17-character string that uniquely identifies a motor vehicle. It also encodes the manufacturer and attributes of the vehicle. To guard against accidentally entering an incorrect VIN number, the VIN number incorporates a check digit (the 9th character). Each letter and number is assigned a value between 0 and 9. The check digit is chosen so to be the weighted sum of the values mod 11, using the symbol X if the remainder is 10.
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    1 2 3 4 5 6 7 8 - 1 2 3 4 5 - 7 - 9 2 3 4 5 6 7 8 9
    
    1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10  11  12  13  14  15  16  17
     8   7   6   5   4   3   2  10   0   9   8   7   6   5   4   3   2
    

    For example the check digit of the partial VIN number 1FA-CP45E-?-LF192944 is X because the weighted sum is 373 and 373 mod 11 is 10.

     1   F   A   C   P   4   5   E   X   L   F   1   9   2   9   4   4
     1   6   1   3   7   4   5   5   -   3   6   1   9   2   9   4   4
     8   7   6   5   4   3   2  10   -   9   8   7   6   5   4   3   2
    ------------------------------------------------------------------
     8  42   6  15  28  12  10  50   -  27  48   7  54  10  36  12   8
    

    Write a program VIN.java that takes a command line string and determines whether or not it is a valid VIN number. Allow the input to be entered with upper or lower case, and allow dashes to be inserted. Do thorough error checking, e.g., that the string is the right length, that no illegal characters are used (I, O, Q), etc.

  52. Pig Latin. Pig Latin is a fun secret language for young children. To convert a word to Pig Latin:
    • If it begins with a vowel, append "hay" to the end. At the beginning of a word, treat y as a vowel unless it is followed by a vowel.
    • If it begins with a sequence of consonants, move the consonants to the end, then append "ay". Treat a u following a q as a consonant.

    For example, "input" becomes "input-hay", "standard" becomes "andard-stay", "quit" becomes "it-quay". Write a program PigLatinCoder.java that reads in a sequence of words from standard input and prints them to standard output in Pig Latin. Write a program PigLatinDecoder.java that reads in a sequence of words encoded in Pig Latin from standard input and prints the original words out in.

  53. Rotating drum problem. Applications to pseudo-random number generators, computational biology, coding theory. Consider a rotating drum (draw picture of circle divided into 16 segments, each of one of two types - 0 and 1). We want that any sequence of 4 consecutive segments to uniquely identify the quadrant of the drum. That is, every 4 consecutive segments should represent one of the 16 binary numbers from 0000 to 1111. Is this possible? A de Bruijn sequence of order n is a shortest (circular) string such that every sequence of n bits appears as a substring at least once. For example, 0000111101100101 is a de Bruijn sequence of order 4, and all 2^4 possible 4-bit sequence (0000, 0001, ..., 1111) occur exactly once. Write a program DeBruijn.java that reads in a command line parameter n and prints out an order n de Bruijn sequence. Algorithm: start with n 0's. Append a 1 if the n-tuple that would be formed has not already appeared in the sequence; append a 0 otherwise. Hint: use the methods String.indexOf and String.substring.
  54. Ehrenfecucht-Mycielski sequence. The Ehrenfecucht-Mycielski sequence in a binary sequence that starts with "010". Given the first n bits b0, b1, ..., bn-1, bn is determined by finding the longest suffix bj, bj+1, ..., bn-1 that occurs previously in the sequence (if it occurs multiple times, take the last such occurrence). Then, bn is the opposite of the bit that followed the match. 0100110101110001000011110110010100100111010001100000101101111100. Use substring and lastIndexOf.

Under Construction

  1. Entropy. The Shannon entropy measures the information content of an input string and plays a cornerstone role in information theory and data compression. It was proposed by Claude Shannon in 1948, borrowing upon the concept in statistical thermodynamics. Assuming each character i appears with probability pi, the entropy is defined to be H = - sum pi log2 pi, where the contribution is 0 if pi = 0. Compute entropy of DNA sequence.
    1. Write a program to read in a ASCII text string from standard input, count the number of times each ASCII character occurs, and compute the entropy, assuming each character appears with the given probabilities.
    2. Repeat part (a) but use UNICODE.
  2. Shannon's entropy experiment. Recreate Shannon's experiment on the entropy of the English language by listing a number of letters in a sentence and prompting the user for the next symbol. Shannon concluded that there is approximately 1.1 bits of info per letter in the alphabet.
  3. Date format conversion. Write a program to read in a data of the form 2003-05-25 and convert it to 5/25/03.
  4. interesting English words

More stuff.

Here's a FAQ for manipulating pixels in Java. Here's a list of more possible transformations for monochrome images [pdf].

Create histogram of intensities of red, green, or blue components.

Read in two images and create a smooth animation from one picture to the other using a linear combination of the colors.

Miasma animation using MemoryImageSource.