7.8. INTRACTABILITY
This section under major construction.
The analysis of algorithms enables us to classify algorithms
according to the amount of resources they will consume.
We often encounter problems for which we are unable to discover
any efficient algorithm. The study of intractability provides
a framework for studying our failures.
The impact of NP-completeness on the natural sciences has been undeniable. Papadimitriou lists 20 diverse scientific disciplines that were coping with internal questions. Ultimately, scientists discovered their inherent complexity after realizing that their core problems were NP-complete. NP-completeness is mentioned as a keyword in 6,000 scientific papers per year. Very few scientific theories have had such a breadth and depth of influence.
Computational complexity and lower bounds. The goal of complexity theory is to understanding the nature of efficient computation. As digital computers were developed in the 1940s and 1950s, the Turing machine served as the right theoretical model of computation. However, the Turing machine fails to accurately account for the amount of time and memory that a computation requires. In the 1960 Hartmanis and Stearns proposed measuring the time and memory needed by a computer as a function of the input size. Hartmanis-Stearns defined complexity classes in terms of multitape Turing machines and proved that some problems have "an inherent complexity that cannot be circumvented by clever programming." They also proved a formal version of the intuitive idea that if given more time or space, Turing machines can compute more things.
Computational complexity. Computational complexity the art and science of determining resource requirements for different problems. Sorting requires at least N log N comparisons (in decision tree model). Mergesort essentially matches this bound. Any solution to the Towers of Hanoi problems requires moving discs as least 2^N - 1 times. Our recursive solution in Section XYZ matches this bound. Brute force TSP takes N! steps. Using dynamic programming, can get it down to 2^N. Best lower bound = N. Essence of computational complexity = trying to find matching upper and lower bounds.
Polynomial-time. When implementing algorithms, we prefer an algorithm that takes 8 N log N steps to one that takes 3 N3 steps since when N is large, the first algorithm is significantly faster than the first. The second algorithm will ultimately solve the same problem, but it might take weeks instead of seconds. In contrast, an exponential time algorithm has a different qualitative behavior since exponential growth dwarfs technological change. For example, a brute force algorithm for the TSP might take N! steps. Even if each electron in the universe (1079) had the power of today's fastest supercomputer (1012 instructions per second), and each worked for the life of the universe (1017 seconds) on solving the problem, it would barely make a dent in solving a problem with N = 1,000 since 1000! >> 101000 >> 1079 * 1012 * 1017. As programmers gained more experience with computation, it was evident that polynomial algorithms were useful and exponential algorithms were not.
Create log-log scale plot of N, N3, N5, N10, 1.1N, 2N, N! as in Harel p. 74.
The complexity class P is the set of all decision problems solvable in a polynomial number of steps on a deterministic Turing machine. It is supposed the capture the set of problems that we can solve in practice on real machines. We list a few examples below:
| Problem | Description | Algorithm | Yes | No |
|---|---|---|---|---|
| EVEN | Is the integer x even? | Least significant bit | 18 | 17 |
| MULTIPLE | Is the integer x an integral multiple of y? | Grade school division | 51, 17 | 51, 18 |
| RELPRIME | Are the integers x and y relatively prime? | Euclid (300 BCE) | 34, 39 | 34, 51 |
| PRIME | Is the integer x prime? | Agarwal, Kayal, Saxena (2002) | 53 | 51 |
| COMPOSITE | Is the integer x composite? | Agarwal, Kayal, Saxena (2002) | 51 | 53 |
| EDIT-DISTANCE | Is x closer to y than z in terms of edit distance? |
Dynamic programming | niether, neither, another |
niether, another, neither |
| BACON | Is the Kevin Bacon number of x less than 6? |
Breadth-first search | Julia Roberts |
Akbar Abdi |
| PLANARITY | Given a graph G, can it be drawn in the plane such that no two edges cross? | Kuratowski (1930), Hopcroft-Tarjan (1974) | ||
| DIOPHANTINE | Given a (sparse) polynomial of one variable with integer coefficients, does it have an integral root? |
Smale et. al (1999) | x5 - 32 | x5 - 31 |
| LSOLVE | Given a rational matrix A and a rational vector b, does there exist a rational vector x such that Ax = b? |
Gauss-Edmonds elimination (1967) | x+y=1 2x+4y=3 |
x+y=1 2x+2y=2 |
| LP | Given a rational matrix A and a rational vector b, does there exist a rational vector x such that Ax ≤ b? |
Khachiyan's ellipsoid algorithm (1979) | x+y≤1 2x+4y≤3 |
x+y≤1 -2x-3y≤-4 -y≤1 |
We note that all of the algorithms to solve these problems
not only answer the yes-no question, but also solve the search
version of the problem. This behavior is typical.
As an example, for yes instances the algorithm for DIOPHANTINE finds
a value x such that f(x) = 0, and BACON computes the Kevin
Bacon number of x and produces the shortest chain.
For no instances, the algorithms typically report a proof
explaining why there are no such solutions. For
RELPRIME, Euclid's algorithm can be modified to output
integers a and b such ax + by = d, where d = gcd(x, y).
LSOLVE outputs a vector y such yTA = 0 and yTb ≠ 0.
LP outputs a vector y such that
yTA = 0, yTb <0 and y ≥ 0. sometimes the proof for no instances are non-obvious. one way to prove that a number is composite is to demonstrate a nontrivial factor. can you think of any succinct way to prove that a number is prime? see exercise xyz.
Extended Church-Turing Thesis. Some problems have different complexity depending on the type of machine on which they are run (single tape vs. multitape Turing machines). In the mid 1960s Cobham and Edmonds independently observed that the set of problems solvable in a polynomial number of steps remains invariant over a very wide range of computational models. Edmonds referred to polynomial algorithms as "good algorithms" and argued that polynomial time represents efficient computation. Godel letter to von Neumann in 1956 contains implicit notion that polynomiality is a desirable feature. Earlier (1953), von Neumann recognized the qualitative difference between polynomial and exponential algorithms. Of course, algorithms that take N100 or 10100 N2 steps are useless in practice, so this distinction must be qualified. The extended Church-Turing thesis asserts that the Turing machine is as efficient as any physical computing device. That is P is the set of decision problems solvable on real computers. If some piece of hardware solves a problem of size N in time T(N), then a deterministic Turing machine can do it in time T(N)k for some fixed constant k, where k depends on the particular problem. Andy Yao expresses the broad implications of this thesis:
They imply that at least in principle, to make future computers more efficient, one only needs to focus on improving the implementation technology of present-day computer designs.
The extended Church-Turing thesis is true for all known physical general purpose computers. For random access machines (e.g., your PC or Macintosh) the constant k = 2. So, for example, if a random access machine can perform a computation in time N3, then a Turing machine can do the same computation in time N3/2. One speculative model of computation - quantum computers - might be capable of solving some problems in a polynomial time that a deterministic Turing machine cannot do. Peter Shor discovered an N^3 algorithm for factoring N-digit integers that runs in polynomial time on a quantum computer, but the best known algorithm on a classical computer takes time exponential in N. Same idea could lead to a comparable speedup in simulating quantum mechanical systems. This explains the recent excitement in quantum computation, as it could result in a paradigm shift for computing. However, quantum computers do not yet violate the extended Church-Turing thesis since we don't yet know how to build them. Moreover, it is still possible that someone might discover a polynomial algorithm for factoring on a classical computer, although most experts suspect that this is not possible. Grover's algorithm: search in sqrt(N) time instead of N.
Certificates. Problems in NP have short certificates. For example, it is easy to convince someone that a number is composite by producing a factor. Then, the person just has to check (by long division) that you did not lie to them. Marin Mersenne conjectured that numbers of the form 2p - 1 are prime for p = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127 and 257. His conjecture for p = 67 was disproved by F. N. Cole over two hundred and fifty years later in 1903. According to E. T. Bell's book Mathematics: Queen and Servant of Science
In the October meeting of the AMS, Cole announced a talk "On the Factorisation of Large Numbers". He walked up to the blackboard without saying a word, calculated by hand the value of 267, carefully subtracted 1. Then he multiplied two numbers (which were 193707721 and 761838257287). Both results written on the blackboard were equal. Cole silently walked back to his seat, and this is said to be the first and only talk held during an AMS meeting where the audience applauded. There were no questions. It took Cole about 3 years, each Sunday, to find this factorisation, according to what he said.
For the record 267 - 1 = 193707721 × 761838257287 = 147573952589676412927.
Does P=NP? Clay Foundation offers a 1 million dollar prize for solving it.
NP-completeness. Informally, NP-complete problems are the "hardest" problems in P; they are the ones most likely not be in P. Define. Defining the concept of NP-completeness does not mean that such problems exist. In fact, the existence of NP-complete problems is an amazing thing. This is an example of universality: if we can solve any NP-complete, then we can solve any problem in NP. It is even more amazing that that there exist "natural" problems that are NP-complete.
Some NP-complete problems. Since the discovery that SAT is NP-complete, tens of thousands of problems have been identified as NP-complete. In 1972, Karp showed that many of the most juicy open problems in discrete mathematics were NP-complete, including Tsp, Knapsack, 3Color, and Clique. Below we list a small sampling of some NP-complete problems. This is only meant to illustrate their diversity and pervasiveness.
- Bin Packing.
You have n items and m bins.
Item i weighs w[i] pounds. Each bin can hold at
most W pounds. Can you pack all n items into the
m bins without violating the given weight limit?
This problem has many industrial applications. For example, UPS may need to ship a large number of packages (items) from one distribution center to another. It wants to put them into trucks (bins), and use as few trucks as possible. Other NP-complete variants allow volume requirements: each 3-dimensional package takes up space and you also have to worry about arranging the packages within the truck.
- Knapsack. You have a set of n items. Item i weighs w[i] pounds and has benefit b[i]. Can you select a subset of the items that have total weight less than or equal to W and total benefit greater than or equal to B? For example, when you go camping, you must select items to bring based on their weight and utility. Or, suppose you are burglarizing a home and can only carry W pounds of loot in your knapsack. Each item i weighs w[i] pounds has a street value of b[i] dollars. Which items should you steal?
- Subset Sum. Given n integers does there exists a subset of them that sum exactly to B? For example, suppose the integers are {4, 5, 8, 13, 15, 24, 33}. If B = 36 then the answer is yes (and 4, 8, 24 is a certificate). If B = 14 the answer is no.
- Partition. Given n integers, can you divide them into two subsets so that each subset sums to the same number? For example, suppose the integers are {4, 5, 8, 13, 15, 24, 33}. Then the answer is yes, and {5, 13, 33} is a certificate. Load balancing for dual processors.
- Integer linear programming. Given an integer matrix A and an integer vector b, does there exist an integer vector x such that Ax ≤ b? This is a central problem in operations research since many optimization problems can be formulated in this way. Note the contrast to the linear programming problem presented above where we are looking for a rational vector instead of an integer vector. The line between problems which are tractable and problems which are intractable can be very subtle.
- SAT.
Given n Boolean variables x1, x2, ..., xN and
a logical formula, is there an assignment of truth variables
that makes the formula satisfiable, i.e., true?
For example, suppose the formula is
(x1' + x2 + x3) (x1 + x2' + x3) (x2 + x3) (x1' + x2' + x3')
Then, the answer is yes and (x1, x2, x3) = (true, true, false) is a certificate. - Clique. Given n people and a list of pairwise friendships. Is there a group or clique of k people such that every possible pair of people within the group are friends? It is convenient to draw the friendship graph, where we include a node for each person and an edge connecting each pair of friends. In the following example with n = 11 and k = 4, the answer is yes, and {2, 4, 8, 9} is a certificate.
- Longest path. Given a set of nodes and pairwise distances between nodes, does there exists a simple path of length at least L connecting some pair of nodes?
- Machine Scheduling.
Your goal is to process n jobs on m machines.
For simplicity, assume each machine can process any
one job in 1 time unit.
Also, there can be precedence constraints: perhaps job
j must finish before job k can start. Can you schedule
all of the jobs to finish in L time units?
Scheduling problems have a huge number of applications. Jobs and machines can be quite abstract: to graduate Princeton you need to take n different courses, but are unwilling to take more than m courses in any one semester. Also, many courses have prerequisites (you can't take COS 226 or 217 before taking 126, but it is fine to take 226 and 217 at the same time). Can you graduate in L semesters?
- Shortest Common Superstring. Given the genetic alphabet { a, t, g, c } and N DNA fragments (e.g., ttt, atggtg, gatgg, tgat, atttg) is there a DNA sequence with K or fewer characters that contains every DNA fragment? Suppose K = 11 in the above example; then the answer is yes and atttgatggtg is a certificate. Applications to computational biology.
- Protein folding.
Proteins in organism fold in three dimensional
dimensional space in a very specific way, to their
native state. This geometric pattern determines
the behavior and function of a protein. One of the
most widely used folding models is the two dimensional
hydrophilic-hydrophobic (H-P) model.
In this model, a protein is a sequence of 0s and 1s, and
the problem is to embed it into a 2-d lattice such that the
number of pairs of adjacent 1s in the lattice, but not in the
sequence (its energy), is minimized. For example, the sequence
011001001110010 is embedded in the figure below in such a
way that there are 5 new adjacent pairs
of 1s (denoted by asterisks).
0 --- 1 --- 1 --- 0 * * | 0 --- 1 --- 1 0 | * | | 0 --- 1 * 1 * 1 | | | 0 0 --- 0Minimizing the H-P energy of a protein is NP-hard. (Papadimitriou, et al.) It is well accept by biologists that proteins fold to minimize their energies. A version of Levinthal's paradox asks how it is possible that proteins are able to efficiently solve apparently intractable problems.
- Integration. Given integers a1, a2, ..., aN, does the integral from θ = 0 to θ = 2 π of cos(a1 θ) × cos(a2 θ) × ..., × cos(aN θ) equal 0? If you see this integral in your next Physics course, you should not expect to be able to solve it. This should not come as a big surprise because in Section 7.4 we consider a version of integration that is undecidable.
- Soccer Elimination. There are N teams in a soccer (European football) league. The winner of each game gets 3 points, the loser 0. In the event of a tie, each team gets 1 point. The goal of each team is to finish the season with the most points. At some point during the season, you want to know whether or not your favorite team is mathematically eliminated, i.e., your team has no mathematical chance of finishing the season with the most points, regardless of the outcome of the remaining games? (The version for baseball, where 1 point is awarded for a win and 0 for a loss, is solvable in polynomial-time using a clever graph algorithm, although sports writers do not appear to do it correctly!)
- Crossword puzzle. Given an integer N, and a list of valid words, is it possible to assign letters to the cells of an N-by-N grid so that all horizontal and vertical words are valid? No easier if some of the squares are black as in a crossword puzzle.
- Tetris.
- Minesweeper.
- Regular expressions. Give two regular expressions over the unary alphabet { 1 }, do they represent different languages? Give two NFAs, do they represent different languages? It may not be apparent that either problem is even decidable since we don't have an obvious bound on the size of the smallest string that is in one language but not the other. [Note that the corresponding inequivalence problem for DFAs is polynomial solvable.] The reason why we phrase the problem as inequivalence instead of equivalence is that it is easy to check that the two entities are non-equivalent by demonstrating a string s. In fact, if the two languages are different, then the smallest string is polynomial in the size of the input. Thus, we can use the efficient algorithms from Section 7.xyz to check whether s is recognized by an RE or accepted by an NFA. However, to argue that two REs are equivalent, we would need an argument that guarantees that all strings in one are in the other, and vice versa. [It is possible to devise an (exponential) algorithm to test whether two REs or NFAs are equivalent, although this should not be obvious.]
- Lemmings. Is it possible to guide a tribe of green-haired lemming creatures to safety in a level of the game Lemmings?
- Multinomial mimization over unit hypercube. Given a multinomial of N variables, is the minimum <= c, assuming all variables are bounded between 0 and 1. classic calculus problem: min f(x)=ax^2 + bx + c over [0, 1]. derivative at x=?? is 0, but minimum occurs at boundary.
- Quadratic Diophantine equations. Given positive integers a, b, and c, are there positive integers x and y such that ax2 + by = c?
- Bounded Post Correspondence Problem. Given a post correspondence problem with N cards and an integer K &le N, is there a solution that uses at most K cards? Recall it is undecidable if there is no limit on K.
- Quadratic congruence.
Given positive integers a, b, and c, is there a positive integer x
2 = a (mod b)? - Ising model in 3d. Simple mathematical model of phase transitions, e.g., when water freezes or when cooling iron becomes magnetic. Computing lowest energy state is NP-hard. Solvable in polynomial time if graph is planar, but 3d lattice is nonplanar. Holy grail of statistical mechanics for 75 years before proved NP-hard. Establishing NP-compleness means that physicists won't spend another 75 years attemptying to solve the unsolvable.
- Bandwidth minimization. Given an N-by-N matrix A and an integer B, is it possible to permute the rows and columns of A such that Aij = 0 if |i - j| > B. Useful for numerical linear algebra.
- Voting and social choice. NP-hard for an individual to manipulate a voting scheme known as single transferable vote. NP-hard to determine who has won an election in a scheme seriously proposed by Lewis Carroll (Charles Dodgson) in 1876. In Carroll's scheme, the winner is the candidate who with the fewest pairwise adjacent changes in voters' preference rankings becomes the Condercet winner (a candidate who would beat all other candidates in a pairwise election). Shapley-Shubik voting power. Computing the Kemeny optimal aggregation.
Output polynomial-time. Some problems involve more output than a single bit of information. For example, outputting a solution to the Towers of Hanoi problem requires at least 2^N steps. This requirement is not because the solution is inherently hard to compute, but rather because there is 2^N symbols of output, and it takes one of unit of time to write each output symbol. Perhaps a more natural way to measure efficiency is a function both of the input size and of the output size. A classic electrical engineering problem with DFAs is to build a DFA from a RE that uses the minimum number of states. We would like an algorithm that is polynomial in the size of the input RE (number of symbols) and also in the size of the output DFA (number of states). Unless P = NP, designing such an algorithm is impossible. In fact, it's not even possible to design a polynomial algorithm that gets the answer within a constant (or even polynomial) number of states! Without the theory of NP-completeness, researchers would waste time following unpromising research directions.
Coping with intractability. The theory of NP-completeness says that unless P = NP, there are some important problems for which we can't create an algorithm that simultaneously achieves the following three properties:
- Guarantee to solve the problem in polynomial-time.
- Guarantee to solve the problem to optimality.
- Guarantee to solve arbitrary instances of the problem.
When we encounter an NP-complete problem, we must relax one of the three requirements. We will consider solutions to the TSP problem that relax one of the three goals. Complexity theory deals with worst-case behavior. This leaves open the possibility of designing algorithms that run quickly on some instances, but take a prohibitive amount of time on others. For example, Cook et al designed an algorithm called CONCORDE that managed to solve a TSP problem with 13,509 cities in the USA to optimality. Their algorithm does not guarantee to run in polynomial time, but the instances we're interested in may be "easy."
Sometimes we may be willing to sacrifice the guarantee on finding the optimal solution. Many heuristic techniques (simulating annealing, genetic algorithms, Metropolis algorithm) have been designed to find "nearly optimal" solutions to the TSP problem. Sometimes it is even possible to prove how good the resulting solution will be. For example, Sanjeev Arora designed an approximation algorithm for the Euclidean TSP problem that guarantees to find a solution that costs at most, say 1%, above the optimum. Designing approximation algorithms is an active area of research. Unfortunately, there are also non-approximability results of the form: if you can find an approximation algorithm for problem X that guarantees to get within a factor of 2 of the optimum, then P = NP. Thus, designing approximation algorithms for some NP-complete problems is not possible.
If we are trying to solve a special class of TSP problems, e.g., where the points lie on the boundary of a circle or the vertices of an M-by-N lattice, then we can design efficient (and trivial) algorithms to solve the problem.
Exploiting intractability. Having intractability problems is occasionally a good thing. In Section XYZ, we will exploit intractable problems to design cryptographic systems.
Other complexity classes. The complexity classes P, NP, and NP-complete are the three most famous complexity classes. Scott Aaronson's website The Complexity Zoo contains a comprehensive list of other complexity classes that are useful in classifying problems according to their computational resources (time, space, parallelizability, use of randomness, quantum computing). We describe a few of the most important ones below.
PSPACE. The complexity class PSPACE = problems solvable by a Turing machine using polynomial space. Note PSPACE = NPSPACE (Savitch's theorem). Note P ⊆ NP ⊆ PSPACE ⊆ EXP, and, by the time hierarchy theorem, at least one inclusion is strict, but unknown which one. It is conjectured that all inclusions are strict. PSPACE-complete = in PSPACE and every other problem in PSPACE can be reduced to it in polynomial time.
Here is a complexity version of the halting problem. Given a Turing machine that is limited to n tape cells, does it halt in at most k steps? The problem is PSPACE-complete, where n is encoded in unary. This means that unless P = PSPACE, we are unlikely to be able to tell whether a given program, running on a computer with n units of memory, will terminate before k steps substantially faster than the trivial method of running it for k steps and seeing what happens. Eppstein's list of hard games.
Bodlaender: given a graph with vertices 1, ..., N, two players alternate in labeling the vertices red, green, or blue. The first player to label a vertex the same color as one of its neighbors loses. Determining whether there is a winning strategy for the first player is PSPACE-complete.
Given an N-by-N checkerboard with Red kings on some squares and Black kings on some other squares. Assume Black goes first. Does Black have a forced win from the given position (according to standard rules of Checkers except to accommodate larger board size and more pieces)? Versions of many conventional games are provably intractable; this partially explains their appeal. Also natural generalizations of Othello, Hex, Geography, Shanghai, Rush Hour, go-moku, Instant Insanity, and Sokoban are PSPACE-complete.
Is a given string a member of a context sensitive grammar?
Do two regular expressions describe different languages? PSPACE-complete even over the binary alphabet and if one of the regular expressions is .*.
Another example that can be made rigorous is the problem of moving a complicated object (e.g., furniture) with attachments that can move and rotate through an irregularly shaped corridor.
Another example arises in parallel computing when the challenge is to determine whether a deadlock state is possible within a system of communicating processors.
EXPTIME. The complexity class EXPTIME = all decision problem solvable in exponential time on deterministic Turing machine. Note P ⊆ NP ⊆ PSPACE ⊆ EXPTIME, and it is known that at least one inequality is strict, but unknown which one. Example: Roadblock from Harel p. 85. Natural generalization of chess, checkers, Go (with Japanese style ko termination rule), and Shogi are EXPTIME-complete. Given a borad position, can the first player force a win? One reason that chess is harder than Othello from a theoretical standpoint is that chess games can take an exponential number of moves. Checkers: player can have an exponential number of moves at a given turn because of jump sequences. Note: dependending on termination rules, checkers can either be PSPACE-complete or EXPTIME-complete.
Here is a complexity version of the halting problem. Given a Turing machine, does it halt in at most k steps? Alternatively, given a fixed Java program and a fixed input, does it terminate in at most k steps? The problem is EXPTIME-complete. Moreover, No Turing machine can guarantee to solve it in, say, O(k / log k) steps. So brute force simulation is essentially best possible. This means that provably the problem cannot be solved substantially faster than the trivial method of running the Turing machine for the first k steps and seeing what happens.
Testing the circularity of attribute grammars (problem that arises in programming language semantics). Reference: Tarjan, Complexity of Combinatorial Algorithms. An EXPTIME-complete problem cannot be solved in polynomial time on a deterministic Turing machine - it does not depend on the P ≠ NP conjecture.
EXPSPACE. EXPSPACE-complete: given two "extended" regular expressions, do they represent different languages? By extended, we allow a squaring operation (two copies of an expression). Stockmeyer and Meyer (1973). Or, more simply set intersection (Hunt, 1973). Word problem for Abelian groups (Cardoza, Lipton, Meyer, 1976).
DOUBLE-EXP. The class DOUBLE-EXP is the set of all decision problems solvable in doubly exponential time. A remarkable example is determining whether a formula in first order Presburger arithmetic is true. Presburger arithmetic consists of statements involving integers with + as the only operation (no multiplication or division). It can model statements like the following: if x and y are integer such that x &le y + 2, then y + 3 > x. In 1929 Presburger proved that his system is consistent (can't prove a contradiction like 1 > 2) and complete (every statement can be proven true or false). In 1974, Fischer and Rabin proved that any algorithm that decides the truth of a Presburger formula requires at least 2(2cN) time for some constant c, where N is the length of the formula.
Non-elementary. More than 2^2^2^...^2^N for any finite tower. Given two regular expressions that allow squaring and complementation, do they describe different languages?
Decision problem vs. search problem. We can also definite computational complexity in terms of search problems or function problems. It is more awkward to study search problems than decision problems because the definition of reduction is more subtle (dealing with the output). FP = polynomial-time function problems, FNP = polynomial-time function problems on nondeterministic Turing machine. Typically the search problem reduces to the decision problem. Such search problems are referred to as self-reducible. This partially justifies our focus on decision problems. However, sometimes decision problem is easy, but search problem is hard. subset sum example. Given N numbers, find two (disjoint) subsets of these N numbers that sum to exactly the same value. If N = 77 and all the numbers are at most twenty-one decimal digits long, then by the pigeonhole prinicple, at least two subsets must sum to the same value. This is because there are 2^77 subsets but at most 1 + 77 * 10^21 <2^77 possible sums.
3-Sum hard. Given a set of N integers, do any three of them sum to 0? Quadratic algorithm exists (see exercise xyz), but no subquadratic algorithm known. 3-SUM linear reduces to many problems in computational geometry. (find whether set of points in the plane have 3 that are collinear, decide whether a set of line segments in the plane can be split into two subsets by a line, determining whether a set of triangles cover the unit square, can you translate a polygon P to be completely inside another polygon Q, robot motion planning).
Analog computation.
The P = NP question is a mathematical question regarding the capabilities of Turing machines and classical digital computers. We might also wonder whether the same is true for analog computers. By analog, we mean any "deterministic physical device that uses a fixed number of physical variables to represent each problem variable." Internal state represented by continuous variables instead of discrete. Vergis, Steiglitz, and Dickinson proposed an analog form of the Strong Church-Turing thesis:Any finite analog computer can be simulated efficiently by a digital computer, in the sense that the time required by the digital computer to simulate the analog computer is bounded by a polynomial function of the resources used by the analog computer.The resources of the analog computer could be time, volume, mass, energy, torque, or angular momentum. Reference: The Physics of Analog Computation
Quantum computing.
Possible (but not established) that digital computers cannot simulate quantum computers in polynomial time. Feynmann quote with respect to building a computer to simulate physics"The rule of simulation that I would like to have is that the number of computer elements required to simulate a large physical system is only to be proportional to the space-time volume of the physical system. I don't want to have an explosion."Rephrase in terms of modern complexity theory by replacing "proportional to" by "bounded by a polynomial function of".
Q+A
Q. What's so hard about factoring an integer N in polynomial time - can't I just divide all potential factors less than N (or √N) into x and see if any have a remainder of zero?
A. The algorithm is correct, but remember it takes only lg N bits to represent the integer N. Thus, for an algorithm to be polynomial in the input size, it must be polynomial in lg N, and not N.
Q. Is there a decision problem that is polynomial solvable on a quantum computers, but provably not in P?
A. This is an open problem. FACTOR is a candidate, but there is no proof that FACTOR is not in P, although this is widely believed.
Q. Are there any other outstanding problems in NP which are not known to be in P?
A. Finding a (mixed) Nash equilibrium in a two person game, where the set of strategies for each player is finite. Generalized linear programming.
Exercises
-
Which of the following can we infer from the fact that the traveling
salesperson problem is NP-complete, if we assume that P is not equal to NP?
- There does not exist an algorithm that solves arbitrary instances of the TSP problem.
- There does not exist an algorithm that efficiently solves arbitrary instances of the TSP problem.
- There exists an algorithm that efficiently solves arbitrary instances of the TSP problem, but no one has been able to find it.
- The TSP is not in P.
- All algorithms that are guaranteed to solve the TSP run in polynomial time for some family of input points.
- All algorithms that are guaranteed to solve the TSP run in exponential time for all families of input points.
Answer: (b) and (d) only.
-
Which of the following can we infer from the fact that PRIMALITY is in NP
but not known to be NP-complete, if we assume that P is not equal to NP?
- There exists an algorithm that solves arbitrary instances of PRIMALITY.
- There exists an algorithm that efficiently solves arbitrary instances of PRIMALITY.
- If we found an efficient algorithm for PRIMALITY, we could immediately use it as a black box to solve TSP.
Answer: We can infer only (a) since all problems in P are decidable. If P != NP, then there are problems in NP that are neither in P or NP-complete. PRIMALITY could be one of them (although this was recently disproved). Part (c) cannot be inferred since we don't know if PRIMALITY is NP-complete.
-
Which of the following are NP-complete?
- The brute force TSP algorithm.
- The quicksort algorithm for sorting.
- The Halting problem.
- Hilbert's 10th problem.
Answer: None. NP-completeness deals with *problems* not specific algorithm for problems. The Halting problem and Hilbert's 10th problem are undecidable, so they are not in NP (and all NP-complete problems are in NP).
-
Let X and Y be two decision problems. Suppose we know that X reduces
to Y. Which of the following can we infer?
- If Y is NP-complete then so is X.
- If X is NP-complete then so is Y.
- If Y is NP-complete and X is in NP then X is NP-complete.
- If X is NP-complete and Y is in NP then Y is NP-complete.
- X and Y can't both be NP-complete.
- If X is in P, then Y is in P.
- If Y is in P, then X is in P.
Answer: (d) and (g) only. X reduces to Y means that if you had a black box to solve Y efficiently, you could use it to solve X efficiently. X is no harder than Y.
-
Suppose that X is NP-complete, X reduces to Y, and Y reduces to X.
Is Y necessarily NP-complete?
Answer: No, since Y may not be in NP. For example if X = CIRCUIT-SAT and Y = CO-CIRCUIT-SAT then X and Y satisfy the conditions, but it is unknown whether Y is in NP.
- Show that CIRCUIT-SAT reduces to CIRCUIT-DIFF. Hint: create a circuit with N inputs that always outputs 0.
- Show that CIRCUIT-DIFF reduces to CIRCUIT-SAT.
-
Show that DETERMINANT is in NP:
given an N-by-N integer matrix A, is det(A) = 0?
Solution: certificate is a nonzero vector x such that Ax = 0.
-
Show that FULL-RANK is in NP:
given an N-by-N integer matrix A, is det(A) ≠ 0?
Solution: certificate is an N-by-N inverse matrix B such that AB = I.
- Search problems vs. decision problems.
We can formulate a search problem using a corresponding
decision problem. For example, the problem of finding the
prime factorization of an integer N can be formulate using
the decision problem: given two integers N and and L,
does N have a nontrivial factor strictly less than L.
The search problem is solvable in polynomial time if and only
if the corresponding decision problem is. To see why,
we can efficiently find the smallest factor p of N by using different
values of L along with binary search. Once we have the factor p,
we can repeat the process on N/p.
Usually we can show that the search problem and the decision problem are equivalent up to polynomial factors in running time. Papadimitriou (Example 10.8) gives an interesting counterexample to the rule. Given N positive integers such that their sum is less than 2^N - 1, find two subsets whose sum is equal. For example, the 10 numbers below sum to 1014 <1023.
23 47 59 88 91 100 111 133 157 205
Since there are more subsets of N integers (2^N) than numbers between 1 and 1014, there must be two different subsets with the same sum. But nobody know a polynomial time algorithm for finding such a subset. On the other hand, the natural decision problem is trivial solvable in constant time: are there two subsets of numbers that sum to the same value?
- Pratt's primality certificate. Show that PRIMES is in NP. Use Lehmer's theorem (Fermat's Little Theorem Converse) which asserts that an integer p > 1 is prime if and only if there exists an integer x such that xN-1 = 1 (mod p) and x(p-1)/d ≠ 1 (mod p) for all prime divisors d of p01. For example, if N = 7919, then the prime factorization of p-1 = 7918 = 2 × 37 × 107. Now x = 7 satisfies 77918 = 1 (mod 7919), but 77918/2 ≠ 1 (mod 7919), 77918/37 ≠ 1 (mod 7919), 77918/107 ≠ 1 (mod 7919). This proves that 7919 is prime (assuming that you recursively certify that 2, 37, and 107 are prime).
- Pell's equation. Find all positive integer solutions to Pell's equation: x^2 - 92y^2 = 1. Solution: (1151, 120), (2649601, 276240), etc. There are infinitely many solutions, but each successive one is about 2300 times the previous one.
- Pell's equation. In 1657, Pierre Fermat challenged his colleagues with the following problem: given a positive integer c, find a positive integer y such that cy2 is a perfect square. Fermat used c = 109. It turns out the smallest solution is (x, y) = (158,070,671,986,249, 15,140,424,455,100). Write a program Pell.java that reads in an integer c and finds the smallest solution to Pell's equation: x2 - c y2 = 1. Try c = 61. The smallest solution is (1,766,319,049, 226,153,980). For c = 313, the smallest solution is ( 3,218,812,082,913,484,91,819,380,158,564,160). The problem is provably unsolvable in a polynomial number of steps (as a function of the number of bits in the input c) because the output may require exponentially many bits!
