05 Proving the Prime Number Theorem, Introduction to Finite Automata

A Tool Sharpened: Kolmogorov Complexity Recap

Before we dive into today’s main event, let’s recall the essential tools we’ve developed. The whole game rests on a simple, powerful idea: counting.

The Existence of Randomness

We established that for any length $n$ , there must be strings that are incompressible. The argument is beautifully simple:

We have $2^{n}$ possible strings of length $n$ . Think of these as our pigeons.
A “short description” is a program of length less than $n$ . The number of such programs is at most the number of binary strings of length less than $n$ , which is $2^{0} + 2^{1} + \dots + 2^{n - 1} = 2^{n} - 1$ . These are our pigeonholes.

Since we have more pigeons than pigeonholes, at least one string of length $n$ cannot be generated by any shorter program. Its Kolmogorov complexity must be at least its own length.

\exists x_{n} with ∣ x_{n} ∣ = n such that K (x_{n}) \geq n

This same logic applies to numbers. The binary representation of a number always starts with a ‘1’, so we have slightly less freedom. For any length $n$ , there are $2^{n - 1}$ numbers whose binary representation has length $n$ . A similar counting argument shows that for any $n$ , there must exist a number $m_{n}$ with $∣ bin (m_{n}) ∣ = n$ that is nearly incompressible:

K (m_{n}) \geq n - c

for some small constant $c$ .

Crucially, this argument isn’t just about Kolmogorov complexity. It applies to any fixed compression scheme. No single compression algorithm can shorten every string. There will always be strings that are incompressible with respect to that specific algorithm. This is the key we’ll use today.

A Weaker Prime Number Theorem

Our goal is to prove a beautiful result about the density of prime numbers. The famous Prime Number Theorem states that $π (k)$ , the number of primes up to $k$ , is about $\frac{k}{l n k}$ . We will prove a slightly weaker, but still powerful, lower bound.

Lemma 2.4 (variant): For infinitely many numbers $k$ , the number of primes $π (k)$ satisfies:

π (k) \geq \frac{k}{c \cdot lo g _{2} k \cdot ( lo g _{2} lo g _{2} k ) ^{2}}

for some constant $c$ .

The Big Idea: Proof by Compression

The strategy is a proof by contradiction, and it’s one of the most elegant applications of information theory.

The “What If”: What if primes were too rare, sparser than our lemma claims?
The Consequence: If primes were too sparse, we could invent a clever compression scheme for numbers based on their prime factorization. This scheme would be “too good.”
The Contradiction: It would be so good it could compress every large number. But we know from our counting argument that for any fixed compression scheme, there must be numbers it cannot compress.
The Conclusion: Our initial assumption must be wrong. Therefore, primes cannot be that rare.

Let’s build this compression scheme.

Step 1: The Compression Scheme

Take any natural number $n$ . Let $p_{m}$ be the largest prime factor of $n$ , where $m$ is its index in the ordered list of primes ( $p_{1} = 2, p_{2} = 3, \dots$ ). We can perfectly reconstruct $n$ from the pair of numbers $(m, n / p_{m})$ .

(m, n / p_{m}) ⟶ n

The magic happens if primes are sparse. If they are, then $p_{m}$ is a very large number, but its index $m$ is a much smaller number. Representing $p_{m}$ by $m$ is a huge saving.

Step 2: The Encoding Problem

How do we encode the pair $(m, n / p_{m})$ into a single, unambiguous binary string? The naive approach of just concatenating them fails:

BIN (m) BIN (n / p_{m}) \leftarrow Ambiguous!

If $BIN (m) = 101$ and $BIN (n / p_{m}) = 1011$ , the result is 1011011. We have no way of knowing where the first number ends and the second begins.

We need a self-delimiting encoding for the first part.

A First Fix: Doubling Bits

Let’s invent a new encoding, which we’ll call $\overline{BIN}$ , that is self-delimiting.

Encode 0 as 00.
Encode 1 as 11.
Use 01 as a special “end-of-string” marker.

For example, if $BIN (m) = 101101$ , then $\overline{BIN} (m) = 11001111001101$ . Now we can create an unambiguous encoding for our pair:

\overline{BIN} (m) BIN (n / p_{m})

The length of this encoding is roughly $2∣ bin (m) ∣ + ∣ bin (n / p_{m}) ∣$ . This works, but doubling the length of $bin (m)$ is expensive. We can do better.

A Better Fix: Encoding the Length

Instead of doubling the bits of $m$ itself, let’s just encode the length of its binary representation in a self-delimiting way.

\overline{BIN} (∣ bin (m) ∣) BIN (m) BIN (n / p_{m})

This is much cheaper! The length of this is roughly $2∣ bin (∣ bin (m) ∣) ∣ + ∣ bin (m) ∣ + ∣ bin (n / p_{m}) ∣$ , which is $2 lo g_{2} (lo g_{2} m) + lo g_{2} m + \dots$ .

Why stop there? The length of the length is also a number. We can encode its length! This gives us our final, highly efficient compression scheme, $comp (n)$ :

comp (n) = \overline{BIN} (lo g_{2} (lo g_{2} m)) BIN (lo g_{2} m) BIN (m) BIN (n / p_{m})

The length of this compressed representation is:

∣ comp (n) ∣ \approx 2 lo g_{2} (lo g_{2} (lo g_{2} m)) + lo g_{2} (lo g_{2} m) + lo g_{2} m + lo g_{2} (n / p_{m})

Step 3: The Contradiction

Now we bring it all together. We need to find a number $n_{i}$ that is incompressible. But we need it to be incompressible in two ways:

It must be Kolmogorov-random, so we can reason about its prime factors.
It must be incompressible by our specific scheme, comp, so we can set up our inequality.

We know from our counting arguments that most numbers have these properties. So we can choose an infinite sequence of numbers ${n_{i}}_{i = 1}^{\infty}$ such that for each $n_{i}$ :

{K (n_{i}) \geq lo g_{2} n_{i} - 2 ∣ comp (n_{i}) ∣ \geq lo g_{2} n_{i} - 2

The first condition, combined with our previous proof, guarantees that this sequence must be factored by infinitely many different primes. This ensures our argument isn’t just about a single prime.

The second condition gives us our main inequality. Let’s write it out:

lo g_{2} n_{i} - 2 \leq ∣ comp (n_{i}) ∣

lo g_{2} n_{i} - 2 \leq 2 lo g_{2} (lo g_{2} (lo g_{2} m)) + lo g_{2} (lo g_{2} m) + lo g_{2} m + lo g_{2} (n_{i} / p_{m})

Now, the magic. We know that $lo g_{2} (n_{i} / p_{m}) = lo g_{2} n_{i} - lo g_{2} p_{m}$ .

lo g_{2} n_{i} - 2 \leq 2 lo g_{2} (lo g_{2} (lo g_{2} m)) + lo g_{2} (lo g_{2} m) + lo g_{2} m + lo g_{2} n_{i} - lo g_{2} p_{m}

The $lo g_{2} n_{i}$ terms on both sides cancel! We can rearrange to get a direct relationship between the prime $p_{m}$ and its index $m$ :

lo g_{2} p_{m} \leq lo g_{2} m + lo g_{2} (lo g_{2} m) + 2 lo g_{2} (lo g_{2} (lo g_{2} m)) + c^{'}

Exponentiating both sides ( $2^{\dots}$ ) gives us an upper bound on the size of the $m$ -th prime:

p_{m} \leq c \cdot m \cdot (lo g_{2} m) \cdot (lo g_{2} lo g_{2} m)^{2}

This inequality must hold for infinitely many primes $p_{m}$ .

Step 4: The Final Result

This result tells us that the $m$ -th prime cannot be “too large” relative to its index $m$ . We can rephrase this to get our theorem. Let $k = p_{m}$ , then $m = π (k)$ . Substituting into our inequality:

k \leq c \cdot π (k) \cdot (lo g_{2} π (k)) \cdot (lo g_{2} lo g_{2} π (k))^{2}

Since $π (k) \leq k$ , we know $lo g π (k) \leq lo g k$ . We can rearrange to get a lower bound on $π (k)$ :

π (k) \geq \frac{k}{c ^{'} \cdot lo g _{2} k \cdot ( lo g _{2} lo g _{2} k ) ^{2}}

And the proof is complete. We have used the abstract idea of incompressibility to derive a concrete, quantitative result about the distribution of prime numbers.

Introduction to Finite Automata

We now shift gears from the highly abstract world of Kolmogorov Complexity to the simplest, most concrete model of computation: the Finite Automaton.

The Core Idea: Computation with Constant Memory

What is the defining characteristic of a Finite Automaton? It is an algorithm whose memory requirement is constant and does not grow with the size of the input.

How can we model this? Imagine a program with no variables. The only “memory” it has is its state, which corresponds to the current line number of the program counter. Since the program has a finite number of lines, it has a finite number of states.

The computation proceeds as follows:

The program starts in a designated initial state (e.g., line 0).
It reads the input string one symbol at a time, from left to right.
After reading a symbol, based on its current state and the symbol it just read, it transitions to a new state. This is like a goto statement: if (input == '0') goto line_7; else goto line_3;.
The input symbol is then discarded. The automaton cannot go back and re-read it.
After reading the entire input string, the automaton halts in some final state.
The set of all possible states (program lines) is partitioned into accepting and rejecting states. If the final state is an accepting one, the input string is accepted (“yes”). Otherwise, it is rejected (“no”).

A More Intuitive Representation: State Diagrams

Writing programs with goto is a nightmare to analyze. A much clearer way to represent a finite automaton is as a directed graph, called a state diagram.

These concepts are synonymous:

Zeile = Knoten = Zustand (Line = Node = State)

Nodes (Vertices): Represent the states of the automaton. We often label them $q_{0}, q_{1}, \dots$ .
Start State: One state is designated as the start state, indicated by an incoming arrow with no source.
Accepting States: A subset of states are designated as accepting (or final) states, indicated by a double circle.
Edges (Transitions): An edge from state $q_{i}$ to $q_{j}$ labeled with a symbol a means: “If you are in state $q_{i}$ and you read the symbol a, move to state $q_{j}$ .”

Example: The Parity Checker

Let’s analyze the automaton from the lecture. What language does it recognize?

The key to understanding an automaton is to figure out the meaning of each state. Each state represents a property of the prefix of the input string that has been read so far.

Let’s analyze the structure. Imagine cutting the automaton in half, both vertically and horizontally.

Vertical Cut: The line separates ${q_{0}, q_{2}}$ from ${q_{1}, q_{3}}$ . Notice that every transition on a 1 crosses this line. This tells us something about the parity of 1s. If we are on the left, we have seen an even number of 1s. If we are on the right, we have seen an odd number of 1s.
Horizontal Cut: The line separates ${q_{0}, q_{1}}$ from ${q_{2}, q_{3}}$ . Every transition on a 0 crosses this line. This tells us about the parity of 0s. If we are on the top, we have seen an even number of 0s. If we are on the bottom, we have seen an odd number of 0s.

So, the meaning of each state is:

$q_{0}$ : Even 0s, Even 1s.
$q_{1}$ : Even 0s, Odd 1s.
$q_{2}$ : Odd 0s, Even 1s.
$q_{3}$ : Odd 0s, Odd 1s.

The accepting states are $q_{0}$ and $q_{3}$ . When do we end up in one of these states?

To be in $q_{0}$ : We’ve read an even number of 0s and an even number of 1s. The total length is (even + even) = even.
To be in $q_{3}$ : We’ve read an odd number of 0s and an odd number of 1s. The total length is (odd + odd) = even.

In both accepting cases, the total length of the string is even. In the rejecting states ( $q_{1}, q_{2}$ ), the total length is odd. Therefore, this automaton accepts the language of all binary strings of even length.

Design Exercises

Let’s practice designing automata for some simple languages over $Σ = {0, 1}$ .

Language $L = {0, 1}^{*}$ (accepts all strings, including the empty string) We need a single state that is both the start and accepting state. Any input symbol just returns us to this state.
Language $L = {0, 1}^{*} - {λ}$ (accepts all non-empty strings) The empty string should be rejected, so the start state $q_{0}$ is not accepting. Any input symbol takes us to an accepting state $q_{1}$ , and from there we stay in an accepting state forever.
Language $L = {011}$ (accepts only this specific string) We need a linear path of states that spells out 011. The final state on this path is accepting. Any deviation from this path must lead to a non-accepting “trap” state (labeled $q_{A b f a ll}$ or “garbage state”), from which there is no escape. The state labels here are very descriptive: $q_{λ}$ (we’ve seen nothing), $q_{0}$ (we’ve seen the prefix 0), $q_{01}$ (we’ve seen 01), and $q_{011}$ (we’ve seen the full word).

Searching for Patterns: Substring, Suffix, and Prefix

This is a classic and powerful application of finite automata.

Substring Search

Let’s design an automaton for $L = {x 0110 y ∣ x, y \in {0, 1}^{*}}$ , the language of all strings containing 0110 as a substring.

The core idea is a “search path” for 0110. Once we find it, we move to a final accepting state and stay there forever. The tricky part is handling “mismatches.” If we are partway through the pattern and read the wrong symbol, we don’t necessarily go back to the start. We go to the state that represents the longest proper suffix of what we’ve seen that is also a prefix of our target pattern.

Let’s trace the failure transitions:

In $q_{01}$ (we’ve seen 01): If we see a 0, the pattern is broken. The last symbol is 0, which is a prefix of 0110. So we go to state $q_{0}$ .
In $q_{011}$ (we’ve seen 011): If we see a 1, the pattern is broken. The last part of what we’ve seen is 11. This is not a prefix of 0110. We have to start our search over, so we go back to $q_{λ}$ .

Prefix Search

Let’s design for $L = {0110 y ∣ y \in {0, 1}^{*}}$ , the language of strings starting with 0110.

This is the simplest case. We build the path for 0110. Any deviation from this path is an immediate failure and goes to a non-accepting trap state. Once the prefix is successfully read, we land in an accepting state and stay there.

Suffix Search

Finally, let’s design for $L = {x 0110 ∣ x \in {0, 1}^{*}}$ , the language of strings ending with 0110.

This is the most subtle. The structure is similar to the substring automaton, but the final state is not a “success trap.” If more symbols arrive after we’ve found 0110, the string might no longer end in 0110, so we must transition out of the final state. The logic for these transitions is the same as the substring automaton’s failure logic.

For example, if we are in the accepting state $q_{0110}$ and we read a 0, the new suffix is 1100. The longest proper suffix of this that is a prefix of our target 0110 is just 0. So we transition from $q_{0110}$ to $q_{0}$ .

Continue here: 06 Formalizing Finite Automata

CS Notes

Explorer