10 Extensions of the Turing Machine Model

Lecture from: 17.10.2025 | Video: Videos ETHZ

We’ve formally defined the Turing Machine (TM), our model for what an “algorithm” is. It consists of a finite control, a single infinite tape, and a read/write head.

Today, we’ll explore some extensions to this basic model. It might seem like adding more features would make the model more powerful, allowing it to solve problems that the standard TM cannot. The surprising and profound result we will establish is that these extensions, while convenient, do not increase the fundamental computational power of the model.

The Multi-Tape Turing Machine (MTM)

The most significant and useful extension is the Multi-Tape Turing Machine, often called a $k$ -tape TM. This model is much closer to how we think about programming. It has:

A single, shared finite control.
A dedicated, read-only input tape. The head on this tape can move left and right but cannot alter the input.
A set of $k$ read/write work tapes. Each work tape has its own independent read/write head.

Computation with an MTM

Initial Configuration: The input word $w$ is placed on the input tape, with its head at the start. All $k$ work tapes are initially blank. The machine is in its start state $q_{0}$ .
Computation Step: In a single step, the machine reads the symbols under all $k + 1$ heads (one input head, $k$ work heads). Based on its current state and this tuple of symbols, the transition function dictates:
- A new state for the finite control.
- A new symbol to write on each of the $k$ work tapes.
- An independent movement (L, R, or N) for each of the $k + 1$ heads.

A configuration for a $k$ -tape TM is a tuple that captures the state, the position of the input head, and the contents and head position for each of the $k$ work tapes:

(q, i, u_{1}, i_{1}, u_{2}, i_{2}, \dots, u_{k}, i_{k})

Equivalence of Computational Models

Our goal is to show that this seemingly more powerful model can’t actually do more than the standard single-tape TM. To do this, we need to define what it means for two machines, or two entire models, to be “equivalent.”

Definition (Equivalence of Machines)

Two machines $A$ and $B$ (TMs or MTMs) with the same input alphabet $Σ$ are equivalent if, for every input word $x \in Σ^{*}$ , they produce the same outcome:

$A$ accepts $x ⟺ B$ accepts $x$ .
$A$ rejects $x ⟺ B$ rejects $x$ .
$A$ loops on $x ⟺ B$ loops on $x$ .

Note that if two machines are equivalent, they accept the same language ( $L (A) = L (B)$ ), but the reverse is not necessarily true. Two machines could accept the same language but one might reject a non-member while the other loops on it.

Definition (Equivalence of Models)

Two machine models (classes of machines) $A$ and $B$ are equivalent if for every machine in one class, there exists an equivalent machine in the other class, and vice-versa.

$\forall A \in A, \exists B \in B$ such that $A$ and $B$ are equivalent.
$\forall B \in B, \exists A \in A$ such that $A$ and $B$ are equivalent.

Theorem

The single-tape Turing Machine (TM) model and the multi-tape Turing Machine (MTM) model are equivalent.

We prove this in two parts.

Lemma 4.1: For every TM $A$ , there exists an equivalent 1-tape MTM $B$ . This direction is trivial. A standard TM is just a special case of a 1-tape MTM where the single work tape is also used for input. We can easily simulate this by having the MTM first copy its input tape to its work tape and then proceed exactly as the original TM would.

Lemma 4.2: For every MTM $A$ , there exists an equivalent single-tape TM $B$ . This is the heart of the proof. We need to show how a single tape can simulate multiple tapes.

Proof Idea: Simulating Multiple Tapes on One

The idea is to use the single tape of the TM to store the contents of all the MTM’s tapes, interleaved. We can expand the alphabet of our single-tape TM to include tuples. Each symbol on the new tape will represent a “slice” through all the tapes of the MTM.

To keep track of the head positions, we augment the alphabet even further. For each tape track, we have two versions of each symbol: a plain version and a “marked” version (indicated by a dot or box). The marked symbol signifies that the head for that tape is currently at that position.

A single step of the MTM is simulated by the single-tape TM in a multi-step process:

Sweep Left: The TM head sweeps from its rightmost position all the way to the left end of the tape. During this sweep, it reads and stores the marked symbols (the symbols under each of the MTM’s virtual heads) in its finite control.
Consult Transition Function: Once it has collected all the necessary information (the MTM’s current state and all symbols under its heads), the TM has everything it needs to determine the MTM’s next move. It consults the MTM’s transition function $δ$ .
Sweep Right: The TM head sweeps back to the right. During this sweep, it updates the tape according to the transition rule: it overwrites the marked symbols with new ones and moves the “marks” one step left or right as required. It also updates its own internal state to the MTM’s new state.

This simulation is complex and much slower than the MTM’s single step, but it works. It proves that anything a multi-tape machine can compute, a single-tape machine can also compute.

The Church-Turing Thesis

This equivalence between single-tape and multi-tape TMs is not an isolated result. It’s part of a much broader pattern. Over the decades, computer scientists have proposed many different models of computation: machines with multiple heads, 2D tapes, random-access memory (RAM machines), and even abstract functional systems like the Lambda Calculus.

Remarkably, every single one of these models that captures our intuitive notion of “algorithm” has been proven to be equivalent in computational power to the standard Turing Machine. This leads to one of the most fundamental principles of computer science.

The Church-Turing Thesis:

The Turing Machine is a suitable formalization of our intuitive notion of an “algorithm.”

This is a “thesis,” not a “theorem,” because “intuitive notion of an algorithm” is not a formal mathematical object. We can’t prove it. However, nearly a century of evidence supports it. No one has ever found a convincing model of computation that can solve a problem that a Turing Machine cannot.

This thesis allows us to make a powerful claim:

The class of recursive languages ( $L_{R}$ ) represents the class of all algorithmically solvable (decidable) problems.
If we can prove that no Turing Machine exists for a given problem, we can conclude that no algorithm exists for that problem, on any computing device, ever.

Nondeterministic Turing Machines

Just as with finite automata, we can introduce the concept of nondeterminism to Turing Machines.

Important: This is not about randomness! A nondeterministic machine is not a probabilistic one.

Nondeterminism means that at any point, the machine may have multiple possible next moves. Its transition function maps a state and symbol not to a single outcome, but to a set of possible outcomes.

δ : Q \times Γ \to P (Q \times Γ \times {L, R, N})

Think of it as the machine exploring many possible computation paths in parallel.

Acceptance Condition: A nondeterministic TM accepts an input word $w$ if there exists at least one possible sequence of choices (one computation path) that leads to the accept state. If all possible paths lead to rejection or loop forever, the word is not accepted.

This “guess and check” model is incredibly powerful for thinking about problems. For example, to solve the Hamilton Cycle problem on a graph:

Guess: Nondeterministically write a permutation of the graph’s vertices onto the tape.
Check: Deterministically verify if this permutation forms a valid Hamilton cycle in the input graph.

If a Hamilton cycle exists, there is a path of “correct” guesses that will write it onto the tape, and the verification step will succeed. If no cycle exists, no matter what is guessed, the verification will fail.

The big question, which we will explore later, is whether this nondeterministic guessing power actually allows us to solve more problems than a deterministic TM. For Turing Machines, the answer is surprisingly “no” in terms of computability, but it has massive implications for efficiency (the P vs. NP problem).

CS Notes

Explorer