Computer Science Building, Princeton University


Introduction to CS


1.  A Simple Machine

2.  Java Programming

3.  OOP

4.  Data Structures

    • Performance

    • Sorting

    • Linked Structures

    • Stacks and Queues

    • Multisets

    • Priority Queues

    • Symbol Tables

    • Graphs

5.  A Computing Machine

6.  Building a Computer

7.  Theory of Computation

8.  Systems

9.  Scientific Computation

10.  Perspective


 Lecture Notes

Assignments

FAQ









4.8 GRAPHS


This section under major construction.

Graphs. Graph = data type that represents pairwise connections. Vertex = element, edge = connection between pair of elements. Not the same as a graphing a function! The graph below contains 8 vertices names 0 through 7 and contains 12 edges.

Graph


Applications.

Graph arise in numerous applications.


GRAPH VERTICES EDGES
Communication telephones, computers fiber optic cable
Circuits gates, registers, processors wires
Mechanical joints rods, beams, springs
Hydraulic reservoirs, pumping stations pipelines
Financial stocks, currency transactions
Transportation street intersections, airports highways, air routes
Scheduling tasks precedence constraints
Software systems functions function calls
Internet web pages hyperlinks
Games board positions legal moves
Social networks people, actors, terrorists friendships, movie casts, associations
Protein interaction networks proteins protein-protein interactions
Genetic regulatory networks genes regulatory interactions
Neural networks neurons synapses
Infectious disease people infections
Electrical power grid transmission stations cable
Chemical compounds molecules chemical bonds


Graphs. Our graph implementation stores a list of vertex names in a symbol table. Associated with each vertex is its adjacency list: this is a list of the names of its immediate neighbors. Program Graph.java is a data type for graphs. Its internal representation is a symbol table of lists. It relies on the data type SymbolTable.java and AdjList.java, which we have already encountered in previous subsections.

Adjacency matrix vs. adjacency list. Two natural representations. Adjacency matrix is more efficient if the graph is dense; adjacency list is more efficient if the graph is sparse. Most graphs we encounter in the real world are very sparse.

Graph with adjacency list representation     Graph with adjacency matrix representation


List of neighbors. Program MovieFinder.java reads in a data file containing movie records (a movie followed by a list of actors appearing in that movie separated by backslashes) and builds a graph with movies and actors as nodes. Then, it prompted the user for the name of an actor (or movie) and prints out all movies (actors, resp) that the actor (movie, resp) appeared in. It uses In.java to read the input file one line at a time, and String.split to break up the line into tokens.

Depth first search. Web crawler.

Small world phenomenon. In the 1960s, the sociologist Stanely Milgram conducted an amazing experiment.... Six handshakes away from anyone in the world.

Other sociology applications:

Other applications.

Power laws.

Kevin Bacon game. Kevin Bacon is a prolific actor who appeared in many movies. Given the name of an actor or actress, the Kevin Bacon game is to find an alternating sequence of actors and movies that lead back to Kevin Bacon. Program Bacon.java reads in a data file containing movie records (a movie followed by a list of actors appearing in that movie, separated by backslashes), and runs breadth first search to find the shortest distance from the source (Kevin Bacon) to each other actor and movie. After computing the shortest path from Kevin Bacon to each actor, the user is prompted to enter actors and the program prints out the shortest chain.

Applications to computational biology. Here's a link to some press about a small world phenomenon experiment conducted at Columbia University.

The program BFSearcher.java takes a graph argument in the constructor. When its search method is called with a vertex s, it runs breadth first search starting at s. After running breadth first search, the client can make queries to path(v) to get the shortest path from s to v. The breadth first search algorithm uses Queue.java, which implements a FIFO queue.

Garbage collection. Automatic memory management in languages like Java is a challenging problem. Allocating memory is easy, but discovering when a program is finished with memory (and reclaiming it) is more difficult. Reference counting: doesn't work with circular linked structure. Mark-and-sweep algorithm. Root = local variables and static variables. Run DFS from roots, marking all variables references from roots, and so on. Then, make second pass: free all unmarked objects and unmark all marked objects. Or a copying GC would then move all of the marked objects to a single memory area. Uses one extra bit per object. JVM must pause while garbage collection occurs. Fragments memory.

Dijkstra. Shortest paths.


Lessons

  1. Use arrays or linked list to capture ordered structure of elements. Use binary trees to capture hierarchical structure. Use graphs to capture pairwise relationships.

Exercises

  1. Add methods vertices and edgesthat returs the number of vertices and edges in the graph, respectively.
  2. Modify addEdge to disallow self loops (an edge between a vertex v and itself), or to allow self-loops but only include one representation of the edge (instead of two).
  3. Modify addEdge to disallow parallel edges (more than one edge between v and w). To make it efficient, consider maintaining a symbol table for each list of neighbors (or a single symbol table of all the edges).
  4. Modify addEdge so that it reuses the memory for each vertex name (instead of creating many copies of the same string to put in each vertex's adjacency list).
  5. Add a boolean method exists(String v) that returns true if v is a vertex in the graph, and false otherwise.
  6. Add a boolean method exists(String v, String w) that returns true if v-w is an edge in the graph, and false otherwise. Calculate this quantity by traversing the neighbor list of v (or w).
  7. The previous implementation of exists(String v, String w) can take time proportional to the length of the adjacency list of v. To speed it up, maintain a symbol table of edges so that checking whether an edge v-w is in the graph is a single symbol table lookup.
  8. Suppose you use a stack instead of a queue when running breadth first in BFSearcher.java. Does it still compute Kevin Bacon numbers correctly?
  9. Add a method isReachable(v) to BFSearcher.java that returns true if v can reach s via some path, and false otherwise.
  10. Add a method path(String v) to BFSearcher.java that returns the shortest path from v to s as an array of strings.
  11. After doing the previous exercise, modify Bacon.java so that it prints out a proof of the Bacon number in the form below:
    % java Bacon input-hero.txt
    Stallone, Sylvester
    Kevin Bacon number is 6
    Stallone, Sylvester was in America: A Tribute to Heroes (2001) with Garcia, Andy
    Garcia, Andy was in Hero (1992) with Arnold, Tom
    Arnold, Tom was in Arnold Schwarzenegger: Hollywood Hero (1999) with Coburn, James
    Coburn, James was in Hell Is for Heroes (1962) with Guardino, Harry
    Guardino, Harry was in Hell with Heroes, The (1968) with McCarthy, Kevin
    McCarthy, Kevin was in Hero at Large (1980) with Bacon, Kevin
    
    if (!bfs.isReachable(actor)) {
       System.out.println(actor + " is not connected to Kevin Bacon");
    }
    else {
       int bacon = bfs.pathLength(actor) / 2;
       System.out.println("Kevin Bacon number is " + bacon);
       String[] proof = bfs.path(actor);
       for (int i = 0; i 
  12. Modify Bacon.java so that the user types two actors (one per line) and the program prints a shortest chain between the two actors.
  13. Create a copy constructor for Graph.java that takes as input a graph G, and creates and initializes a new copy of the graph. Any changes you make to G should not affect the newly created graph.
  14. Array of BFSearcher's to save away previous queries?

Creative Exercises

  1. Actor graph. An alternate way to compute Kevin Bacon numbers is to build a graph where each node is an actor. Two actors are connected by an edge if they appear in a movie together. Calculate Kevin Bacon numbers by running BFS on the actor graph. Compare the running time versus the algorithm described in the text. Explain why this approach is so much slower.
  2. Degree. The degree of a vertex is the number of incident edges. Add a method int getDegree(String s) to Graph that retuns the degree of the vertex s. Find the actor node in the Kevin Bacon graph that has the highest degree, i.e., the actor that has appeared in the most movies.
  3. Diameter. Find the diameter of the Kevin Bacon graph.
  4. Connected components. Write a program CCFinder.java that computes the connected components of a graph. The method search should preprocess the graph and compute all of the connected components. After the preprocessing step, the method areConnected(v, w) should return true if v and w are connected and false otherwise. The method components should return the number of connected components. Note: need a way to iterate over the vertices in the symbol table. Then run BFS from each vertex that hasn't yet been visited.
  5. Flood fill. Blob = collection of neighboring pixels of the same color. How many blobs in picture?
  6. Histogram of Kevin Bacon numbers. Modify Bacon.java so that it prints a table of Kevin Bacon numbers, indicating how many actors have a Bacon number of 1, 2, 3, etc. Use this data file that contains all of the actors once in random order.
  7. Center of the Hollywood universe. We can measure how good of a center that Kevin Bacon is by computing their Hollywood number. The Hollywood number of Kevin Bacon is the average Bacon number of all the actors. The Hollywood number of another actor is computed the same way, but we make them be the source instead of Kevin Bacon. Compute Kevin Bacon's Hollywood number and find an actor and actress with better Hollywood numbers.
  8. Fringe of the Hollywood universe. Find the actor (who is connected to Kevin Bacon) that has the highest Hollywood number.
  9. Word ladders. Write a program WordLadder.java that takes two 5 letter strings from the command line, and reads in a list of 5 letter words from standard input, and prints out a shortest word ladder connecting the two strings (if it exists). Two words can be connected in a word ladder chain if they differ in exactly one letter. As an example, the following word ladder connects green and brown.
    green greet great groat groan grown brown
    

    You can also try out your program on this list of 6 letter words.

  10. Faster word ladders. To speed things up (if the word list is very large), don't write a nested loop to try all pairs of words to see if they are adjacent. To handle 5 letter words, first sort the word list. Words that only differ in their last letter will appear consecutively in the sorted list. Sort 4 more times, but cyclically shift the letters one position to the right so that words that differ in the ith letter will appear consecutively in one of the sorted lists.

    Try out this approach using a larger word list with words of different sizes. Two words of different lengths are neighbors if the smaller word is the same as the bigger word, minus the last letter, e.g., brow and brown.

  11. All paths. Given an undirected graph and two vertices s and t, write a program AllPaths.java that can be used to count or print out all simple paths from s to t. A simple path does not revisit any vertex more than once. In two-dimensional grids, such paths are referred to as self-avoiding walks. It is a fundamental problem in in statistical physics and theoretical chemistry, e.g., to model the spatial arrangement of linear polymer molecules in a solution. Warning: there might be exponentially paths, so don't run this on large graphs. Reference.
  12. Directed graph. Create a data type Digraph.java for directed graph. It is almost identical to our Graph ADT, except that when we insert an edge v-w, we only insert w on the adjacently list of v, but not v onto the adjacency list of w.
  13. Garbage collection. Reference counting vs. mark-and-sweep.
  14. Markov chain. Implement a data type MarkovChain that supports the following operations: addTransition(v, w) that adds a transition from state v to state w and iterator that returns an iterator to the Markov chain, starting at a random state and iterating over the states with probability proportional to the number of links leaving that state.
  15. Language modeling. Create a Markov model of an input text and use it to automatically generate stylized pseudo-random text.
  16. Transitive closure. The method search should preprocess the graph. After the preprocessing step, the method isReachable(v, w) should return true if w is reachable from v along a directed path and false otherwise. Note: need a way to iterate over the vertices in the symbol table. Then run BFS from each vertex. Might need to store an N-by-N table or compute queries on the fly.
  17. Cycle. Given a directed graph, determine if there is a directed cycle. If so, print one out. Otherwise print out a topological order.
  18. Least common ancestor. Given the keys of two nodes in a binary search tree, find their least common ancestor. Hint: perform a simultaneous search for the the keys. The least common ancestor is the node where the search diverges.
  19. Topological order. Given a list of pairs of variables of the form x Topological order. Redo the previous problem, but print out all such orderings.
  20. Rogue. (Andrew Appel.) A monster and a player are each located at a distinct vertex in an undirected graph. In the role playing game Rogue, the player and the monster alternate turns. In each turn the player can move to an adjacent vertex or stays put. Determine all vertices that the player can reach before the monster. Assume the player gets the first move.
  21. Rogue. (Andrew Appel.) A monster and a player are each located at a distinct vertex in an undirected graph. The goal of the monster is to land on the same vertex as the player. Devise an optimal strategy for the monster.
  22. Hex. The game of Hex is played on a trapezoidal grid of hexagons.... Describe how to detect whether white or black has one the game using breadth first search.
  23. Hex. Prove that the game cannot end in a tie. Hint: consider the set of cells reachable from the left side of the board.
  24. Hex. Prove that the first player in Hex can always win if they play optimally. Hint: if the second player had a winning strategy, you could choose a random cell initially, and then just copy the second players winning strategy. This is called strategy stealing.
  25. Image processing. Given N-by-N image of color pixels, identify all clusters of pixels of the same color that are connected. Find connected components.
  26. Combinational circuits. Determining the truth value of a combinational circuit given its inputs is a graph reachability problem (on a directed acyclic graph).
  27. Checkers. Extend the rules of checkers to an N-by-N checkerboard. Show how to determine whether a checker can become in king in the current move. (Use BFS or DFS.) Show how to determine whether black has a winning move. (Find an Euler path.)