In the previous chapter, we established the limits of what is computable. We now know that some problems, like the Halting Problem, are undecidable; no algorithm can solve them. But among the problems that are solvable, do they all present equivalent computational difficulty?
Consider sorting a list versus finding the optimal route for a traveling salesperson visiting thousands of cities. Both are solvable, but the latter feels intuitively harder. An algorithm for sorting might finish in seconds, while the best-known algorithm for the traveling salesperson problem could run for centuries on the fastest supercomputers.
Complexity theory addresses this gap. It provides a framework for classifying decidable problems based on the computational resources, primarily time and memory (space), required to solve them. Its central goal is to distinguish between problems that are “practically solvable” and those that are “intractable,” even if they are theoretically decidable. This chapter introduces the foundational concepts of time and space complexity, the crucial complexity classes P and NP, and the theory of NP-completeness, which provides a powerful tool for identifying the hardest problems in NP.
6.1 Aims
By the end of this chapter, you will be able to:
- Measure Complexity: Understand how time and space complexity are formally defined for Turing machines, emphasizing worst-case analysis and the robustness of these measures.
- Use Asymptotic Notation: Effectively use O, , and notation to analyze and compare the growth rates of algorithms, abstracting away less significant details.
- Understand the Class P: Grasp the definition of the class P, its significance as the formal model for “efficiently solvable” problems, and its robustness across computational models.
- Understand the Class NP: Learn the two equivalent definitions of NP: via nondeterministic Turing machines and via polynomial-time verifiers, and appreciate its connection to proof verification.
- Grasp NP-Completeness: Understand the concepts of polynomial-time reduction, NP-hardness, and NP-completeness, and appreciate the significance of the P vs. NP problem and the Cook-Levin Theorem.
- Explore Complexity Hierarchy: Understand the relationships and hierarchy between fundamental complexity classes like DLOG, NLOG, P, NP, PSPACE, and EXPTIME.
6.2 Complexity Measures
To analyze the difficulty of a problem, we first need a formal way to measure the resources used by an algorithm. We use the multi-tape Turing machine as our standard model for abstract complexity theory. This model is simple enough for theoretical analysis yet robust enough to reflect the behavior of real computers.
6.2.1 Time Complexity
Definition 6.1 (Time Complexity)
Let be a multi-tape TM that halts on all inputs.
- The time complexity of on input , denoted , is the number of computational steps in the computation of on .
- The time complexity of , denoted , is a function of the input length . It is the maximum time complexity over all inputs of length . This is known as worst-case analysis. We focus on the worst case to guarantee that an algorithm will always perform at least this well, providing a reliable upper bound on its resource consumption. While average-case analysis can be useful, worst-case analysis is often simpler and provides stronger guarantees.
6.2.2 Space Complexity
Definition 6.2 (Space Complexity)
Let be a multi-tape TM that halts on all inputs.
- The space complexity of on input , denoted , is the maximum number of cells on any single work tape that are accessed by a head during the computation of on .
- The space complexity of , denoted , is the maximum space complexity over all inputs of length . It’s important to note that the space complexity typically refers to the work tapes, excluding the read-only input tape. The definition focuses on the maximum usage of a single tape, but it is equivalent to considering the sum of all tape usages, as a multi-tape TM can be simulated by a single-tape TM with only a constant factor increase in space (Lemma 6.1).
6.2.3 Robustness of Complexity Measures
The specific choice of the multi-tape Turing machine model and the definition of time/space complexity are robust. This means that fundamental results about complexity defined in this way also apply to the complexity of executing programs in arbitrary programming languages.
- Constant Factor Improvements: It’s possible to reduce time or space complexity by a constant factor by using a larger tape alphabet (Lemma 6.2, Task 6.1). This means that constant factors are generally ignored in complexity theory, as they don’t change the fundamental scaling behavior.
- Logarithmic vs. Uniform Cost Measures:
- Uniform Cost Measure: Each basic operation (e.g., arithmetic, comparison) costs 1 unit, regardless of the size of the operands. This is simple but can be unrealistic if operand sizes grow significantly.
- Logarithmic Cost Measure: The cost of an operation depends on the length of the binary representation of the operands. This is more realistic for arbitrary-precision arithmetic but can be more cumbersome to analyze. Complexity theory primarily uses the uniform cost measure for basic TM operations, but the robustness of the model ensures that the asymptotic classifications remain valid.
6.2.4 Asymptotic Notation
Exact complexity functions can be messy. In complexity theory, we are usually more interested in the rate of growth of the complexity as the input size increases. Asymptotic notation allows us to abstract away constant factors and lower-order terms, focusing on the dominant behavior for large inputs.
For a function
- Big-O (): means grows no faster than . Formally, there exist constants and such that for all , .
- Big-Omega (): means grows at least as fast as . Formally, there exist constants and such that for all , .
- Theta (): means grows at the same rate as . Formally, and .
- Little-o (): means grows strictly slower than . Formally, .
For example, an algorithm with time complexity is said to run in time.
6.3 Complexity Classes and the Class P
For the definition of complexity classes, we use the model of the multi-tape Turing machine. We consider complexity classes here only as language classes, i.e., as sets of decision problems.
Definition 6.5 (Basic Complexity Classes)
For all functions
- (Polynomial Time)
- (Polynomial Space)
- (Exponential Time)
6.3.1 The Class P: Efficiently Solvable Problems
The most important distinction in complexity theory is between algorithms that run in polynomial time and those that run in exponential time.
| 10 | 100 | 1,000 | 1,024 |
| 20 | 400 | 8,000 | ~1 million |
| 50 | 2,500 | 125,000 | ~ |
| 100 | 10,000 | 1,000,000 | ~ |
An algorithm with exponential complexity quickly becomes unusable for even moderately sized inputs. This observation leads to the definition of the class of “efficiently solvable” problems.
P is the class of all languages that are decidable by a deterministic Turing machine in polynomial time. A problem is in P if there exists an algorithm that solves it in time for some constant . The class P is considered the formal counterpart to the intuitive notion of problems that are practically or efficiently solvable.
Robustness of P: The definition of P is robust. It remains the same whether we use single-tape, multi-tape, or other “reasonable” deterministic models, up to a polynomial factor.
6.3.2 Relationships Between Deterministic Complexity Classes
There are fundamental relationships between these classes:
- : Any TM that runs in time can use at most space (Lemma 6.3).
- : This follows directly from the above.
- : Any TM that uses space can be simulated in exponential time (Theorem 6.2). This is because the number of possible configurations in space is exponential in , and a TM cannot repeat a configuration in a halting computation.
- : This follows directly from the above.
Combining these, we get the hierarchy: For each of these inclusions, it is unknown whether it is proper. However, some proper inclusions must exist, as shown by hierarchy theorems (e.g., and ). These theorems prove that for sufficiently larger resource bounds, strictly more problems can be solved.
6.3.3 Constructible Functions
The definition of complexity classes often relies on constructible functions, which are functions whose values can be computed efficiently.
Definition 6.6 (Space- and Time-Constructible Functions)
- A function is space-constructible if there exists a 1-tape TM such that for any input , writes on its tape and halts, using space.
- A function is time-constructible if there exists an MTM such that for any input , writes on its tape and halts, using time.
Most common functions like , , are constructible. These functions are important because they allow us to define complexity classes with well-behaved bounds.
6.4 Nondeterministic Complexity Measures
Nondeterministic Turing machines (NTMs) can perform many different computations on an input. To define their complexity, we take an optimistic view: an NTM always chooses the best possible option.
Definition 6.7 (Nondeterministic Time and Space Complexity)
Let be an NTM or a nondeterministic MTM.
- For , the nondeterministic time complexity of on , , is the length of a shortest accepting computation of on .
- The nondeterministic time complexity of , , is .
- For , the nondeterministic space complexity of on , , is the minimum space used by an accepting computation of on .
- The nondeterministic space complexity of , , is .
Definition 6.8 (Nondeterministic Complexity Classes)
For all functions
- (Nondeterministic Polynomial Time)
- (Nondeterministic Polynomial Space)
6.4.1 Relationships Between Deterministic and Nondeterministic Classes
- and : Any deterministic TM is also a nondeterministic TM, so deterministic classes are subsets of their nondeterministic counterparts.
- : If an NTM runs in time , it can use at most space.
- : An NTM using space can be simulated by a DTM in exponential time (Lemma 6.6).
- Savitch’s Theorem (Theorem 6.7): For any space-constructible function , . This is a powerful result showing that nondeterministic space can be simulated by deterministic space with only a polynomial overhead.
- Consequence: . Nondeterminism does not add power to polynomial space.
The overall hierarchy of complexity classes, including nondeterministic ones, is: Again, for each of these inclusions (except PSPACE = NPSPACE), it is unknown whether it is proper.
6.5 The Class NP and Proof Verification
The relationship between P and NP is one of the most central and profound questions in theoretical computer science. It touches upon the very nature of problem-solving and the efficiency of verification versus discovery.
6.5.1 NP as Verifiability
Many problems for which we don’t know a polynomial-time algorithm share an interesting property: if provided with a potential solution, you can quickly verify if it’s correct.
Consider the Hamiltonian Path problem: given a graph, is there a path that visits every node exactly once? Finding such a path seems to require trying many possibilities. But if someone gives you a path, you can easily check in polynomial time if it’s a valid Hamiltonian path.
This “guess and check” structure is the intuition behind the class NP. The class NP can be equivalently defined using the concept of a verifier.
Definition 6.9 (Polynomial-Time Verifier)
A verifier for a language is a deterministic algorithm that takes two inputs: a string (the problem instance) and a “certificate” or “witness” (a potential solution). The verifier accepts the pair if and only if is a valid certificate that . The verifier must run in polynomial time with respect to the length of , and the length of the certificate must also be polynomial in the length of .
The class VP is the set of all languages that have a polynomial-time verifier.
Theorem 6.8 (Equivalence of NP and VP)
A language is in NP if and only if it has a polynomial-time verifier.
Proof Idea
- : If , then there exists a polynomial-time NTM that accepts . The verifier for takes as input. The certificate is an encoding of the sequence of non-deterministic choices that makes to accept . simulates on , following the choices specified by . If accepts, accepts. Since runs in polynomial time, also runs in polynomial time.
- : If , then there exists a polynomial-time verifier for . An NTM for works as follows: on input , nondeterministically “guesses” a certificate (of polynomial length). Then, runs on . If accepts, accepts. Since runs in polynomial time, also runs in polynomial time.
This theorem is fundamental. It provides two equivalent ways to think about NP: as problems solvable by a polynomial-time NTM, or as problems where a proposed solution can be verified in deterministic polynomial time.
6.5.2 The P versus NP Problem
Clearly, if a problem is in P, it is also in NP (a deterministic TM is just a special case of an NTM, and a solution can be verified by simply computing it). This gives us the inclusion: (It’s important to remember that “NP” stands for Nondeterministic Polynomial time, not “Not Polynomial” time. Problems in NP are not necessarily hard; they just have efficiently verifiable solutions.)
The most famous open question in computer science, and one of the Millennium Prize Problems, is whether this inclusion is proper: If P = NP, it would mean that any problem for which a solution can be efficiently verified can also be efficiently solved. This would have staggering consequences, revolutionizing fields from cryptography to medicine to artificial intelligence. However, the overwhelming consensus among computer scientists is that . This belief is largely based on the lack of discovery of polynomial-time algorithms for thousands of problems in NP that have been studied for decades.
P vs. NP and Proofs: The P vs. NP question can be rephrased in terms of mathematical proofs:
- P: Corresponds to problems where finding a proof (solution) is efficient.
- NP: Corresponds to problems where verifying a given proof (solution) is efficient. The question “Does P = NP?” is equivalent to asking: “Is it as easy to find a mathematical proof as it is to verify one?” Most mathematicians and computer scientists believe that finding proofs is inherently harder than verifying them.
6.6 NP-Completeness
Within the vast landscape of NP, some problems are special: they are the “hardest” problems in the class. If we could solve any one of them efficiently, we could solve all problems in NP efficiently. These are the NP-complete problems.
To formalize this, we need the concept of a polynomial-time reduction.
Definition 6.10 (Polynomial-Time Reduction)
A language is polynomial-time reducible to a language , written , if there exists a polynomial-time computable function that transforms instances of into instances of such that: This means that if we have a “black box” solver for , we can solve by transforming its input and feeding it to the solver. The reduction itself must be efficient (polynomial time).
Definition 6.11 (NP-Hard and NP-Complete)
- A language is NP-hard if every language in NP is polynomial-time reducible to it ( for all ).
- A language is NP-complete if it is NP-hard and it is also in NP itself.
The NP-complete problems form a crucial equivalence class. If any NP-complete problem is found to be in P, then it would follow that P = NP. Conversely, if P NP, then no NP-complete problem can be solved in polynomial time.
6.6.1 The First NP-Complete Problem: SAT (Cook-Levin Theorem)
How do we find the first NP-complete problem? We need to show that every problem in NP can be reduced to it. This was the monumental achievement of Stephen Cook and Leonid Levin.
Theorem 6.9 (Cook-Levin Theorem)
The Boolean Satisfiability Problem (SAT) is NP-complete.
Proof Idea
The proof involves constructing a polynomial-time reduction from any language to SAT.
- Since , there exists a polynomial-time NTM that accepts . Let be the polynomial time bound for .
- For any input to , we construct a Boolean formula that is satisfiable if and only if accepts . The formula encodes the entire computation of on .
- The variables in represent the state of ‘s tape, head position, and internal state at each time step up to . For example:
- : True if the -th tape cell contains symbol at time .
- : True if is in state at time .
- : True if ‘s head is at position at time .
- The formula is a conjunction of several clauses that enforce:
- Initial Configuration: The state of at corresponds to the start state and input .
- Unique State/Position/Symbol: At any time , is in exactly one state, its head is at exactly one position, and each tape cell contains exactly one symbol.
- Valid Transitions: The configuration at time must follow from the configuration at time according to ‘s transition function. This is the most complex part, encoding all possible moves.
- Accepting State: reaches an accepting state at some time .
- The size of is polynomial in , and thus polynomial in . The construction of can be done in polynomial time.
This reduction shows that if we could solve SAT in polynomial time, we could solve any problem in NP in polynomial time. Since SAT is also in NP (a satisfying assignment can be verified in polynomial time), it is NP-complete.
Once we have one NP-complete problem, we can find others by reduction. Thousands of important problems have been proven to be NP-complete, including:
- 3-SAT: A restricted version of SAT where each clause has exactly three literals.
- CLIQUE: Given a graph and an integer , does contain a complete subgraph of size ?
- VERTEX-COVER: Given a graph and an integer , does have a set of vertices that touches every edge?
- HAM-PATH: Given a graph , does contain a Hamiltonian path (a path that visits every node exactly once)?
- TSP (Decision Version): Given a list of cities and distances between them, and an integer , is there a tour that visits every city exactly once and has a total cost less than or equal to ?
6.6.2 NP-Hardness for Optimization Problems
The concept of NP-hardness can be extended to optimization problems. An optimization problem is typically NP-hard if its decision version (e.g., “is there a solution with cost at most ?”) is NP-hard.
Definition 6.14 (NP-Hard Optimization Problem)
An optimization problem is NP-hard if its corresponding threshold language is NP-hard.
- (for minimization problems)
- (for maximization problems)
This allows us to classify optimization problems as NP-hard, implying that finding an optimal solution is likely intractable if P NP.
Summary
- Complexity Theory classifies solvable problems by the resources (time, space) required to solve them, distinguishing between practical and intractable problems.
- Time and Space Complexity are measured using multi-tape Turing machines, focusing on worst-case asymptotic behavior.
- The class P consists of problems solvable in deterministic polynomial time and is the formal model for “efficiently solvable” problems.
- The class NP consists of problems whose solutions can be verified in polynomial time (equivalently, solvable by a nondeterministic TM in polynomial time).
- The P vs. NP question, which asks if P = NP, is the most important open problem in computer science, with profound implications for many fields.
- NP-complete problems are the hardest problems in NP. The discovery of a polynomial-time algorithm for any one of them would imply P = NP.
- The Cook-Levin Theorem established SAT as the first NP-complete problem, providing a starting point for proving the NP-completeness of thousands of other problems.
- The hierarchy of complexity classes () provides a structured view of computational difficulty.
Previous Chapter: Chapter 5 - Computability Next Up: Chapter 7 - Algorithmics for Hard Problems
Exercises
Exercise 6.1 (CLIQUE is in NP)
A -clique in a graph is a set of vertices where every two vertices are connected by an edge. The language CLIQUE is defined as . Show that CLIQUE is in NP by describing a polynomial-time verifier for it.
Solution
To show that
CLIQUEis in NP, we need to describe a polynomial-time verifier that takes an instance and a certificate .
- Input to Verifier: .
- Certificate : The certificate is a list of vertices, say , which are claimed to form a -clique in .
- Verification Algorithm : a. Check if indeed contains distinct vertices from the vertex set of . If not, reject. b. For every pair of distinct vertices in , check if there is an edge between and in . c. If all pairs of vertices in are connected by an edge in , then accepts. Otherwise, rejects.
Runtime Analysis:
- Step (a) takes polynomial time (e.g., to check membership and distinctness).
- Step (b) involves checking pairs. Since , this is at most checks. Each check involves looking up an edge in , which can be done in polynomial time (e.g., if using an adjacency matrix, or if using adjacency lists depending on implementation).
- Overall, the verification algorithm runs in time polynomial in the size of and .
Since a polynomial-time verifier exists,
CLIQUEis in NP.
Exercise 6.2 (P=NP and Cryptography)
Explain why the statement “If P = NP, then all modern public-key cryptography is broken” is likely true.
Solution
Modern public-key cryptography (e.g., RSA, elliptic curve cryptography) relies on the computational difficulty of certain mathematical problems. These problems typically have the characteristic that:
- Generating keys/encrypting/decrypting with the private key is efficient.
- Breaking the encryption (e.g., finding the private key from the public key) is believed to be computationally intractable.
Many of these “hard” problems, such as integer factorization (for RSA) or the discrete logarithm problem, are known to be in NP. This means that if you are given a potential solution (e.g., the prime factors of a large number), you can verify its correctness in polynomial time.
If P = NP, it would imply that any problem whose solution can be efficiently verified can also be efficiently solved. Therefore, if P = NP, there would exist polynomial-time algorithms for problems like integer factorization and discrete logarithm. An attacker could then efficiently:
- Factor large numbers: Break RSA by finding the private key from the public modulus.
- Solve discrete logarithms: Break Diffie-Hellman key exchange and other related cryptosystems.
This would effectively render most widely used public-key cryptographic systems insecure, as the underlying “hard” problems would become “easy.”
Exercise 6.3 (Transitivity of Polynomial-Time Reductions)
Show that if and , then .
Solution
To show , we need to construct a deterministic polynomial-time algorithm for .
- Since , there exists a deterministic Turing machine that decides in polynomial time. Let its time complexity be for some constant .
- Since , there exists a polynomial-time computable function that transforms instances of into instances of . Let the time complexity of computing be for some constant . Also, the length of the output of is polynomially bounded, i.e., for some constant .
Now, we construct an algorithm to decide on input :
a. Compute . This step takes time. The length of is . b. Run on input . This step takes time. Since , . c. accepts if and only if accepts .
The total time complexity for is . Since and are constants, this is a polynomial in . Therefore, .
Exercise 6.4 (NP-Hardness of MAX-SAT)
Prove that MAX-SAT is NP-hard.
Solution
To prove that an optimization problem is NP-hard, we need to show that its corresponding threshold language is NP-hard. For MAX-SAT, the threshold language is: We will show that .
- Input to Reduction: An instance of SAT, which is a Boolean formula in CNF. Let have clauses.
- Reduction Function : The reduction function takes as input and outputs the pair . This function is clearly computable in polynomial time (it just counts the clauses).
- Equivalence: We need to show that .
- (): If , then there exists a satisfying assignment for . This assignment satisfies all clauses. Therefore, there exists an assignment that satisfies at least clauses. So, .
- (): If , then there exists an assignment that satisfies at least clauses. Since has only clauses, this means there must be an assignment that satisfies all clauses. Therefore, is satisfiable, so .
Since SAT is NP-complete (and thus NP-hard), and we have a polynomial-time reduction from SAT to , it follows that is NP-hard. Therefore, MAX-SAT is NP-hard.
Key Takeaways
- Complexity Theory: Beyond Computability: While computability theory tells us what problems can be solved, complexity theory tells us how efficiently they can be solved, classifying them by resource requirements (time and space).
- Polynomial Time as “Efficient”: The class P (deterministic polynomial time) is the widely accepted formal definition of problems that are “efficiently solvable” or “practically solvable.”
- NP: Efficiently Verifiable Solutions: The class NP (nondeterministic polynomial time) encompasses problems for which a proposed solution can be verified in polynomial time. This is equivalent to problems solvable by a nondeterministic Turing machine in polynomial time.
- The P vs. NP Problem: The fundamental open question in computer science is whether P = NP. The prevailing belief is that P NP, implying that finding solutions is generally harder than verifying them.
- NP-Completeness: The Hardest Problems in NP: NP-complete problems are those in NP to which all other NP problems can be efficiently reduced. If any NP-complete problem could be solved in polynomial time, then P would equal NP.
- Cook-Levin Theorem: This landmark result established the Boolean Satisfiability Problem (SAT) as the first NP-complete problem, providing a starting point for proving the NP-completeness of thousands of other problems.
- Complexity Hierarchy: The relationships between complexity classes form a hierarchy: . Understanding this hierarchy helps to map the landscape of computational difficulty.
- Implications for Algorithm Design: Complexity theory guides algorithm design by identifying problems that are likely intractable, prompting the development of approximation algorithms, heuristics, or specialized solutions for restricted problem instances.