Chapter 4 - Turing Machines

In the previous chapter, we explored finite automata, a computational model with strictly finite memory. We discovered their power but also their inherent limitations, proving that simple languages like ${0^{n} 1^{n} ∣ n \geq 0}$ are beyond their reach. To overcome these limitations, we need a model with infinite memory, which is the crucial conceptual leap.

This brings us to the Turing Machine (TM), the foundational model of computation that captures the very essence of what it means to be an “algorithm.” Proposed by Alan Turing in 1936, it was designed not to be a practical computer, but to formalize the intuitive process of a human mathematician following a set of rules to solve a problem. A TM consists of a finite-state control unit (like a DFA) and an infinite tape that it can read from, write to, and move along in both directions. This infinite tape is the crucial addition, giving the machine an unlimited memory capacity.

By studying this simple yet powerful device, we can explore the absolute limits of what is computable, setting the stage for understanding which problems can be solved by computers and which cannot.

4.1 Aims

By the end of this chapter, you will be able to:

Understand the Formal Definition: Grasp the 7-tuple definition of a Turing Machine and its components, including the crucial role of the infinite tape.
Trace Computations: Understand how a TM operates through configurations and computation steps, and how its behavior defines language acceptance.
Distinguish Language Classes: Learn the critical difference between recursively enumerable (recognizable) and recursive (decidable) languages, based on the TM’s halting behavior.
Design Basic TMs: Develop strategies for designing Turing Machines to solve specific problems, even if informally.
Appreciate Universality: Understand the significance of multi-tape TMs, non-deterministic TMs, and the Church-Turing Thesis, which posits that TMs can compute anything that any reasonable computational model can.
Grasp TM Encoding: Understand how Turing Machines themselves can be encoded as strings, enabling self-reference and laying the groundwork for proofs of undecidability.

4.2 Excerpt from History: Alan Turing’s Motivation

At the beginning of the 20th century, mathematicians were grappling with the foundations of mathematics, particularly the question of what constitutes an “effective procedure” or an “algorithm.” Alan Turing, in his seminal 1936 paper “On Computable Numbers, with an Application to the Entscheidungsproblem,” sought to formalize this intuitive concept.

Turing’s approach was not to build a physical computer, but to define the class of methods that a human “computer” (a person performing calculations) could execute. He imagined a human mathematician working with a pencil and paper, following a set of rules. His key insights were:

Finite State of Mind: The human brain, though complex, is always in one of a finite number of “states” at any given moment.
Local Observation: The human can only observe and process a finite, bounded amount of information at any given time (e.g., a few symbols on the paper).
Simple Operations: The actions performed are elementary: writing a symbol, erasing a symbol, moving attention to an adjacent spot.
Unlimited Paper: To handle arbitrarily long computations, the paper (memory) must be potentially infinite.

By simplifying these observations, Turing arrived at his abstract machine model, which, despite its simplicity, proved to be incredibly powerful. He showed that any operation involving a finite part of the text could be broken down into a sequence of these elementary operations, thus eliminating the need for complex, arbitrary “basis operations.”

4.3 The Model of the Turing Machine

A Turing machine can be seen as a generalization of a finite automaton, equipped with an infinite memory. Informally, it consists of:

A finite control (or finite-state unit) that holds the program (a finite set of states and transition rules). This is analogous to the CPU of a modern computer.
An infinite tape, divided into cells, which serves as both input medium and memory (working tape). This tape extends infinitely to the right and is delimited on the left by a special boundary symbol.
A read/write head that can move left and right along the tape, one cell at a time. It can read the symbol under it, write a new symbol, and move.

All cells on the tape not yet written to contain a special blank symbol ( $␣$ ). An elementary operation (a single step) depends on the machine’s current state and the symbol under the head.

4.3.1 Formal Definition

Definition 4.1 (Turing Machine)

A Turing machine (TM) is a 7-tuple $M = (Q, Σ, Γ, δ, q_{0}, q_{accept}, q_{reject})$ , where:

$Q$ is a finite, non-empty set of states.
$Σ$ is the input alphabet, which is a finite set of symbols. Crucially, $Σ$ does not contain the special boundary symbol $¢$ or the blank symbol $␣$ .
$Γ$ is the tape alphabet, which is a finite set of symbols. It must contain the input alphabet ( $Σ \subseteq Γ$ ), and also the special symbols $¢$ (left end marker) and $␣$ (blank symbol).
$δ : (Q ∖ {q_{accept}, q_{reject}}) \times Γ \to Q \times Γ \times {L, R, N}$ is the transition function. This function defines the machine’s behavior. For a given current state $q$ and a symbol $a$ read from the tape, $δ (q, a)$ specifies:
- The next state $p \in Q$ .
- The symbol $b \in Γ$ to write on the current tape cell.
- The head movement $D \in {L, R, N}$ (Left, Right, No movement). The transition function is not defined for the accepting or rejecting states, as the machine halts upon entering them. A crucial property is that $δ (q, ¢)$ must result in writing $¢$ and moving right or staying put, preventing the head from moving off the left end of the tape or overwriting the boundary.
$q_{0} \in Q$ is the start state.
$q_{accept} \in Q$ is the unique accepting state. If the TM enters this state, it accepts the input and halts.
$q_{reject} \in Q$ is the unique rejecting state. If the TM enters this state, it rejects the input and halts.

A configuration describes the TM’s complete status at any given moment. It includes the current state, the entire content of the tape, and the position of the read/write head. A common notation for a configuration is $¢ u q a v$ , which means:

The TM is in state $q$ .
The tape content is $¢ u a v ␣ ␣ ␣ \dots$ (where $u, v \in Γ^{*}$ and the rest of the tape is blank).
The head is currently positioned over the symbol $a$ .

A computation step is a relation $⊢_{M}$ between configurations, describing how the TM moves from one configuration to the next based on its transition function. For example, if $δ (q, a) = (p, b, R)$ , then $¢ u q a v ⊢_{M} ¢ u b p v$ .

A computation is a sequence of configurations, starting from an initial configuration $q_{0} ¢ x$ (where $x \in Σ^{*}$ is the input word). The computation halts if it enters $q_{accept}$ or $q_{reject}$ . Otherwise, it may run indefinitely.

The language accepted by a Turing machine $M$ , denoted $L (M)$ , is the set of all words $w \in Σ^{*}$ for which the computation of $M$ on input $w$ eventually reaches the accepting state $q_{accept}$ .

4.3.2 Language Classes: Recursively Enumerable vs. Recursive

The behavior of a TM on inputs defines two crucial classes of languages, distinguishing between problems that can be “recognized” and those that can be “decided.”

Definition 4.2 (Recursively Enumerable and Recursive Languages)

A language $L \subseteq Σ^{*}$ is recursively enumerable (RE) if there exists a TM $M$ such that $L = L (M)$ . This means for any $w \in L$ , $M$ halts and accepts. For $w \in / L$ , $M$ either halts and rejects, or it continues to run indefinitely (does not halt). These languages are also called Turing-recognizable.
- Intuition: If a word is in the language, the TM will eventually confirm it. If not, the TM might never stop searching for a proof of membership.
A language $L \subseteq Σ^{*}$ is recursive if there exists a TM $M$ that halts on all inputs and $L = L (M)$ . For any input $w$ , $M$ is guaranteed to halt in either $q_{accept}$ or $q_{reject}$ . These languages are also called decidable or Turing-decidable.
- Intuition: The TM always gives a definitive “yes” or “no” answer in finite time for any input.

Every recursive language is recursively enumerable. However, as we will see in the next chapter, the reverse is not true; there exist languages that are recursively enumerable but not recursive. A TM that halts on all inputs is a formal model of an algorithm in the strictest sense (a procedure guaranteed to terminate and produce a result).

A TM for $L = {0^{2^{n}} ∣ n \geq 0}$

This language consists of strings of 0s whose length is a power of two (e.g., $0, 00, 0000, 00000000, \dots$ ). Note: $0^{2^{0}} = 0^{1} = 0$ .

Strategy: The TM can recognize this language by repeatedly halving the number of 0s.

Initial Check: If the input is empty ( $¢ ␣$ ), reject. If the input has length 1 ( $0$ ), accept.

Sweep and Mark: Sweep across the tape from left to right. For every two 0s encountered, mark the first one (e.g., replace it with X) and leave the second one. If an odd number of 0s is encountered, reject.

Rewind: If the sweep completes successfully (an even number of 0s were found), rewind the head to the beginning.

Compress: Sweep across the tape again. For every 0, replace it with a blank $␣$ . For every X, replace it with a 0. This effectively halves the number of 0s. After this pass, compact the 0s to the left side of the tape.

Repeat: Go back to step 2 with the new, shorter string of 0s.

Acceptance: If at any point, the tape contains exactly one 0, accept.

4.4 Variants of Turing Machines

To make designing TMs easier and to better model real computers, we can introduce variants that, while seemingly more powerful, are ultimately equivalent to the basic single-tape TM in terms of what they can compute.

4.4.1 Multi-tape Turing Machines

A $k$ -tape Turing Machine is a TM equipped with:

A read-only input tape (containing the initial input, delimited by $¢$ and a right end marker $\$ ).
$k$ read/write “work” tapes, each with its own independent head. These tapes are infinite to the right and delimited by $¢$ on the left.

Its transition function takes the current state and the symbols under all $k + 1$ heads (one input head, $k$ work tape heads) to determine:

The next state.
What to write on each of the $k$ work tapes.
How to move each of the $k + 1$ heads (Left, Right, or No movement).

This model is much more convenient for designing algorithms, as it allows for parallel processing of information and easier manipulation of data.

A 2-tape TM for $L = {w # w ∣ w \in {0, 1}^{*}}$

This language consists of words where a binary string $w$ is followed by $#$ and then by an identical copy of $w$ .

Initial Scan: Scan the input tape to verify that the input has the form $x # y$ (i.e., exactly one $#$ symbol). If not, reject.

Copy $x$ : Rewind the input head to the beginning. Copy the prefix $x$ (the part before the $#$ ) from the input tape onto the first work tape.

Position Heads: Move the input head to the first symbol of $y$ (the part after the $#$ ). Move the first work tape head to the beginning of the copied $x$ .

Compare: Move both the input head and the first work tape head to the right in lockstep, comparing symbols one by one.

Decision:

If a mismatch occurs, reject.

If both parts are exhausted simultaneously (i.e., both heads reach their respective blank symbols or end markers) with no mismatch, accept.

If one part is exhausted before the other, reject (lengths don’t match).

Despite their convenience and apparent parallelism, multi-tape TMs are not more powerful than standard single-tape TMs. They have the same fundamental computational capabilities, differing only in efficiency.

Theorem 4.1 (Equivalence of TM Models)

For every multi-tape TM, there is an equivalent single-tape TM.

Proof Idea

A single-tape TM can simulate a $k$ -tape TM by encoding all $k$ tapes (plus the input tape) onto its single tape. It achieves this by dividing its single tape into multiple “tracks.” For example, for a $k$ -tape TM, the single tape might have $2 k + 1$ tracks: one for each of the $k$ work tapes, one for the input tape, and one for marking the head position on each virtual tape.

To simulate one step of the multi-tape TM, the single-tape TM performs the following sequence of operations:

Scan for Heads: It sweeps across its entire tape from left to right to find the positions of all $k + 1$ virtual heads and reads the symbols under them. It stores this information in its finite control.

Determine Action: Based on the current state (of the simulated multi-tape TM) and the symbols read under the virtual heads, it determines the next state, the symbols to write, and the head movements for each virtual tape according to the multi-tape TM’s transition function.

Update Tape: It then sweeps back across its tape, updating the symbols on the appropriate tracks and moving the head markers for each virtual tape according to the determined actions.

This simulation is much slower (polynomial slowdown), but it correctly mimics the behavior of the multi-tape TM, proving that the models are equivalent in computational power.

4.4.2 The Church-Turing Thesis

The fact that such different models (single-tape, multi-tape, and many others like lambda calculus, register machines, etc.) all turn out to be equivalent in power is not a coincidence. It leads to a foundational principle of computer science.

The Church-Turing Thesis

The intuitive notion of an “algorithm” is captured exactly by the formal model of a Turing machine that always halts.

More broadly, any function that can be computed by an algorithm can be computed by a Turing machine.

This is not a theorem that can be proven mathematically, but rather a fundamental thesis that has been universally accepted due to overwhelming evidence and lack of counterexamples. It asserts that any problem that can be solved by any reasonable computational device, now or in the future, can also be solved by a Turing machine. It provides the solid foundation for proving that certain problems are unsolvable: if we can prove they are not solvable by a Turing machine, we can be confident they are not solvable by any algorithm at all.

4.5 Non-deterministic Turing Machines

Just as with finite automata, we can introduce non-determinism into Turing machines. A non-deterministic Turing machine (NTM) has a transition function that, for a given state and tape symbol, can specify a set of possible moves, rather than a single unique move.

$δ : (Q ∖ {q_{accept}, q_{reject}}) \times Γ \to P (Q \times Γ \times {L, R, N})$

An NTM accepts an input if at least one of its possible computation paths leads to the accepting state. This is often described as the machine “guessing” the correct path, or exploring all possible paths simultaneously (like in a multiverse of computations). If no path leads to acceptance, the input is rejected.

Theorem 4.2 (Equivalence of NTMs and DTMs)

For every non-deterministic TM, there is an equivalent deterministic TM.

Proof Idea

A deterministic TM (DTM) can simulate an NTM by systematically exploring the NTM’s computation tree using a breadth-first search (BFS) strategy.

The DTM uses multiple tapes to manage this simulation:

Input Tape: Holds the original input, which is never modified.

Simulation Tape: Used to execute a single path of the NTM’s computation.

Address Tape: Stores the “address” of a node in the computation tree. This address indicates which sequence of non-deterministic choices led to the current configuration being simulated.

The DTM works as follows:

It systematically generates all possible computation paths of the NTM in a breadth-first manner. It tries all paths of length 1, then all paths of length 2, and so on.

For each path, it copies the original input to the simulation tape and then simulates the NTM along that specific path.

If, during the simulation of any path, the NTM enters an accepting state, the DTM immediately accepts and halts.

If the NTM has no accepting paths, the DTM may run forever (if the computation tree is infinite), correctly reflecting the NTM’s behavior.

This simulation proves that NTMs are no more powerful than DTMs in terms of what languages they can recognize. However, this simulation can cause an exponential slowdown. If an NTM can solve a problem in $T (n)$ steps, the simulating DTM might require $2^{O (T (n))}$ steps. The question of whether this exponential slowdown is necessary for all problems is the famous P vs. NP problem, one of the most significant unsolved problems in computer science.

4.6 Encoding of Turing Machines

Just as a program written in a high-level language can be compiled into machine code (a sequence of bits), any Turing machine can be encoded as a string over a finite alphabet (e.g., ${0, 1}$ ). This encoding allows us to treat Turing machines themselves as data that can be processed by other Turing machines.

Definition 4.5 (Encoding of a TM)

For any TM $M$ , we can construct a string $⟨ M ⟩ \in {0, 1}^{*}$ that uniquely describes it.

A common encoding scheme involves assigning unique binary codes to each state, tape symbol, and head movement direction. Then, each transition rule $δ (q, a) = (p, b, D)$ can be encoded by concatenating the codes for $q, a, p, b, D$ . Finally, the entire TM is encoded by concatenating the codes for all its transitions, along with information about the number of states and tape symbols, separated by special delimiters (e.g., $#$ ).

For example, if we have codes for states $q_{i}$ , symbols $A_{j}$ , and movements $L, R, N$ , a transition could be encoded as: Code(q_current) # Code(symbol_read) # Code(q_next) # Code(symbol_write) # Code(move_direction) The entire TM would then be a concatenation of these transition encodings, prefixed by information about the TM’s overall structure.

This encoding is crucial because it allows us to treat Turing machines as inputs to other Turing machines. This self-referential capability is profoundly powerful, enabling a TM to analyze, simulate, or even modify other TMs. This opens the door to asking questions like: “Can a TM decide if another given TM will halt on a given input?” This leads directly to the concept of a Universal Turing Machine and the proof of the Halting Problem’s undecidability in the next chapter.

The set of all such encodings, $Kod_{TM} = {⟨ M ⟩ ∣ M is a TM}$ , is a language over ${0, 1}^{*}$ . Since these encodings are binary strings, they can be ordered canonically. This allows us to speak of the $i$ -th Turing machine, $M_{i}$ , which is the TM whose encoding is the $i$ -th string in $Kod_{TM}$ in canonical order.

Summary

The Turing Machine is the fundamental formal model of an algorithm, consisting of a finite control, an infinite tape for memory and input, and a read/write head.
Recursive (decidable) languages are those for which a TM exists that halts on all inputs, providing a definitive “yes” or “no” answer. Recursively enumerable (recognizable) languages are those for which a TM halts and accepts if the input is in the language, but may run indefinitely otherwise.
Variants like multi-tape and non-deterministic TMs are more convenient for algorithm design but are computationally equivalent to the standard single-tape model. This equivalence supports the Church-Turing Thesis, a foundational axiom stating that TMs capture the intuitive notion of computability.
All TMs can be encoded as strings, allowing them to be used as inputs to other TMs. This self-referential property is key to understanding the limits of computation and the existence of undecidable problems.

Previous Chapter: Chapter 3 - Finite Automata Next Up: Chapter 5 - Computability

Exercises

Exercise 4.1 (Non-Regularity Proof)

Prove that the language $L_{Mitte} = {w \in {0, 1}^{*} ∣ w = x 1 y, where ∣ x ∣ = ∣ y ∣}$ is not regular.

Solution

We use the Pumping Lemma for Regular Languages.

Assume for Contradiction: Assume $L_{Mitte}$ is regular.

Apply Pumping Lemma: By the Pumping Lemma, there exists a pumping length $p \geq 1$ .

Choose a “Clever” String: Choose the string $w = 0^{p} 1 0^{p}$ . This string is in $L_{Mitte}$ because it has a 1 in the middle, with $p$ zeros before and $p$ zeros after. Its length is $∣ w ∣ = 2 p + 1$ , which is $\geq p$ .

Consider All Decompositions: By the Pumping Lemma, $w$ can be decomposed into $x yz$ such that $∣ x y ∣ \leq p$ and $∣ y ∣ \geq 1$ . Since $∣ x y ∣ \leq p$ and $w$ starts with $p$ zeros, both $x$ and $y$ must consist entirely of the initial 0s. Let $x = 0^{a}$ , $y = 0^{b}$ , and $z = 0^{c} 1 0^{p}$ , where $a \geq 0$ , $b \geq 1$ , $c \geq 0$ , and $a + b + c = p$ .

Find a Contradiction: Let’s pump down with $i = 0$ . The new string is $x y^{0} z = x z = 0^{a + c} 1 0^{p}$ . Since $b \geq 1$ , we know that $a + c = p - b < p$ . Therefore, the string $x z$ has $p - b$ zeros before the 1 and $p$ zeros after the 1. Since $p - b \neq = p$ , the number of zeros before the 1 is not equal to the number of zeros after the 1.

Conclude: Thus, the pumped string $x z \in / L_{Mitte}$ . This contradicts condition (3) of the Pumping Lemma. Therefore, $L_{Mitte}$ is not a regular language.

Exercise 4.2 (TM Design Sketch)

Sketch the design of a Turing Machine for the language $L = {a^{n} b^{n} ∣ n \geq 0}$ .

Solution

Strategy: The TM will match a’s with b’s by marking them.

Scan and Mark a: Start at the leftmost a. Replace it with a special marker, say X. Move right.

Find and Mark b: Scan right, skipping a’s, until the first b is found. If no b is found before a blank or end of input, reject (mismatch in counts or order). Replace this b with another marker, say Y. Move left.

Rewind: Scan left, skipping a’s and Y’s, until the leftmost X is found. Move one step right to the next unmarked a.

Repeat: Repeat steps 1-3 until all a’s are marked.

Final Check: After all a’s are marked, scan the tape to ensure there are no unmarked b’s remaining. If any b is found, reject. If only Xs, Ys, and blanks remain, accept.

Example Trace for aabb: ¢aabb␣ (start) ¢Xabb␣ (mark first a) ¢XaYb␣ (mark first b) ¢XaYb␣ (rewind to next a) ¢XXYb␣ (mark second a) ¢XXYY␣ (mark second b) ¢XXYY␣ (rewind to next a, none found) ¢XXYY␣ (scan for remaining bs, none found) → Accept.

Exercise 4.3 (Multi-tape TM Design)

Describe informally how a 2-tape TM could accept the language $L = {a^{n} b^{n} c^{n} ∣ n \geq 0}$ .

Solution

A 2-tape TM can efficiently compare the counts of a’s, b’s, and c’s.

Phase 1: Count a’s and Copy:

Scan the input tape from left to right. For each a encountered, write a marker (e.g., 1) on Work Tape 1.

If a b or c is encountered before all a’s are read, reject (incorrect order).

After processing all a’s, rewind Work Tape 1 to its beginning.

Phase 2: Match b’s:

Continue scanning the input tape. For each b encountered, move the head on Work Tape 1 one position to the right (consuming a 1 marker).

If a c is encountered before all b’s are read, reject.

If Work Tape 1 runs out of 1s before all b’s are processed (or vice-versa), reject (mismatch in a and b counts).

After processing all b’s, rewind Work Tape 1 to its beginning.

Phase 3: Match c’s:

Continue scanning the input tape. For each c encountered, move the head on Work Tape 1 one position to the right (consuming a 1 marker).

If any other symbol is encountered before all c’s are read, reject.

If Work Tape 1 runs out of 1s before all c’s are processed (or vice-versa), reject (mismatch in a and c counts).

Final Check: After processing all c’s, ensure the input tape head is at the end of the input (blank symbol) and Work Tape 1 is also empty (all 1s consumed). If all conditions are met, accept. Otherwise, reject.

Key Takeaways

Turing Machines as Universal Models: The TM is the most widely accepted formal model for algorithms, capable of performing any computation that a modern computer can. Its infinite tape provides unlimited memory, overcoming the limitations of finite automata.
Decidability vs. Recognizability: A crucial distinction exists between recursive (decidable) languages (for which a TM always halts, giving a definitive yes/no) and recursively enumerable (recognizable) languages (for which a TM halts and accepts only if the input is in the language, possibly running forever otherwise).
Equivalence of TM Variants: Multi-tape and non-deterministic Turing machines, while offering practical advantages in design and potentially efficiency, are fundamentally equivalent in computational power to the basic single-tape deterministic Turing machine. This supports the robustness of the TM model.
The Church-Turing Thesis: This foundational axiom states that any intuitively computable function can be computed by a Turing machine. It forms the bedrock for proving the limits of computation.
Encoding TMs: The ability to encode Turing machines as strings allows them to be treated as data, enabling self-reference and leading to profound results about computability, such as the existence of universal TMs and undecidable problems.

CS Notes

Explorer

Chapter 4 - Turing Machines

4.1 Aims

4.2 Excerpt from History: Alan Turing’s Motivation

4.3 The Model of the Turing Machine

4.3.1 Formal Definition

Definition 4.1 (Turing Machine)

4.3.2 Language Classes: Recursively Enumerable vs. Recursive

Definition 4.2 (Recursively Enumerable and Recursive Languages)

4.4 Variants of Turing Machines

4.4.1 Multi-tape Turing Machines

4.4.2 The Church-Turing Thesis

4.5 Non-deterministic Turing Machines

4.6 Encoding of Turing Machines

Summary

Exercises

Key Takeaways

Table of Contents

Graph View

Backlinks