04 Kolmogorov Complexity and the Nature of Randomness

Recap and Warm-up

In our last lecture, we introduced Kolmogorov Complexity, $K (x)$ , a powerful concept defining the information content of a string $x$ as the length of the shortest program that generates it.

A crucial point to remember is that we’re less concerned with the exact complexity of a single, concrete object. Why? Because there’s always a “dirty constant” (Schmutzfaktor), an additive overhead that depends on the chosen programming language. This makes precise measurement for a single string tricky. Instead, we focus on the asymptotic behavior for an infinite sequence of objects. As the objects get larger, this constant becomes negligible, and the true, underlying information content becomes clear.

Let’s start with a warm-up exercise to solidify this.

Bounding Complexity for Structured Sequences

Consider the set of words of the form $x = 0^{N}$ , where the length $N$ is a number of the form $N = 2^{i} \cdot 3^{j} \cdot 5^{k}$ for some natural numbers $i, j, k$ . How can we find an upper bound on $K (x)$ ?

The strategy is always to devise a short program that generates $x$ .

# A program to generate x = 0^N where N = 2^i * 3^j * 5^k
function generate_x(i, j, k):
  # Calculate the large number N from the small exponents
  N = (2**i) * (3**j) * (5**k)
  
  # In a loop, print '0' N times
  output = ""
  for _ in range(N):
    output += '0'
  print(output)

To generate a specific $x$ from this family, the program needs the specific values of $i, j, k$ . The program that generates this $x$ would look like this:

# A specific program for a specific x
i = ... # The specific exponent for 2
j = ... # The specific exponent for 3
k = ... # The specific exponent for 5
 
N = (2**i) * (3**j) * (5**k)
print('0' * N)

The length of this program is the length of the code template (a constant, $c$ ) plus the length of the binary representations of the numbers $i, j,$ and $k$ .

How large can these exponents be? Since $N = ∣ x ∣$ , we know that $2^{i} \leq N$ , which implies $i \leq lo g_{2} N$ . The same logic applies to $j$ and $k$ . The exponents are, at most, logarithmic in the length of $x$ .

The number of bits needed to represent a number like $i$ is roughly $lo g_{2} (i)$ . So, the number of bits for our exponents is roughly $lo g_{2} (lo g_{2} N)$ .

Therefore, we can bound the complexity:

K (x) \leq c + O (lo g lo g ∣ x ∣)

This demonstrates that strings with this kind of deep mathematical structure are highly compressible and have very low information content.

Defining Randomness

Kolmogorov complexity gives us something profound that probability theory cannot: a formal definition of randomness for a single object. The intuition is simple and beautiful:

An object is random if it has no pattern. The shortest way to describe it is to present the object itself.

In our formal language:

A binary string $x$ is random if it is incompressible. Formally,
$K (x) \geq ∣ x ∣$

A random string has no internal structure that a program could exploit to generate it from a shorter description.

The Existence of Random Strings

This definition would be hollow if random strings were a mere theoretical curiosity. However, a simple counting argument shows they are not only real but abundant.

Lemma 2.5: For every natural number $n$ , there exists a random binary string of length $n$ .

Proof by Counting

This is a classic application of the pigeonhole principle.

The Objects to Describe (Pigeons): There are exactly $2^{n}$ distinct binary strings of length $n$ .
The Possible Short Descriptions (Pigeonholes): A “short description” is a program with a length less than $n$ . The total number of binary strings (and thus possible programs) of length less than $n$ is: $i = 0 \sum n - 1 2^{i} = 2^{0} + 2^{1} + \dots + 2^{n - 1} = 2^{n} - 1$

We have $2^{n}$ strings that need a description, but only $2^{n} - 1$ available short descriptions. Therefore, at least one string of length $n$ cannot be generated by any program shorter than $n$ . Its shortest program must have a length of at least $n$ , making it random.

In fact, we can make a much stronger statement: most strings are random. A similar counting argument shows that at least half of all binary strings are random (or very close to it). Randomness is the norm, not the exception.

Randomness for Numbers

We can extend this concept to natural numbers by considering their binary representation.

A natural number $n$ is random if its standard binary representation, $bin (n)$ , is a random string.
$K (n) := K (bin (n)) \geq ∣ bin (n) ∣ - c$
for some universal constant $c$ .

The exact length of the binary representation of $n$ is

∣ bin (n) ∣ = ⌊ lo g_{2} n ⌋ + 1.

The Invariance Theorem: A Robust Definition

A potential weakness of our definition is its reliance on a specific programming language. What if a string is compressible in Python but not in Java? The Invariance Theorem assures us this is not a major issue.

Theorem: For any two universal programming languages (e.g., Turing machines, Python, Java), $A$ and $B$ , there exists a constant $c_{A \to B}$ such that for all strings $x$ :

K_{B} (x) \leq K_{A} (x) + c_{A \to B}

This means the complexity of a string might change from one language to another, but only by a fixed additive constant. For large strings, this difference is negligible.

Proof Idea

We can write a program in language $B$ that functions as an interpreter (or Übersetzer) for language $A$ . This interpreter is a fixed program with a constant length, $c_{A \to B}$ .

To generate a string $x$ in language $B$ , we can construct the following program:

The code for the interpreter of $A$ (written in $B$ ).
The shortest program for $x$ written in language $A$ .

This combined program is a valid program in language $B$ . It works by interpreting and executing the language $A$ code to produce $x$ . Its total length is $∣ P_{A} ∣ + c_{A \to B} = K_{A} (x) + c_{A \to B}$ . Since $K_{B} (x)$ is the length of the shortest program in $B$ , it must be less than or equal to this construction.

Kolmogorov Complexity as a Proof Tool

Now we’ll see how this abstract concept can be used as a powerful research instrument to prove concrete mathematical theorems.

A New Proof for the Infinitude of Primes

We all know Euclid’s classic proof by contradiction. Here is a completely different argument based on information theory.

Lemma 2.6 (variant): There are infinitely many prime numbers.

Proof by Contradiction

Assumption: Assume there are only a finite number of primes: $p_{1}, p_{2}, \dots, p_{k}$ .
Representation: Any natural number $n_{i}$ can be uniquely described by its prime factorization using this finite set of primes: $n_{i} = p_{1}^{e_{i 1}} \cdot p_{2}^{e_{i 2}} \dots p_{k}^{e_{ik}}$ This means the list of $k$ exponents $(e_{i 1}, \dots, e_{ik})$ is a complete description of $n_{i}$ .
Compression: We can write a program that takes these $k$ exponents and reconstructs $n_{i}$ . The primes $p_{1}, \dots, p_{k}$ are fixed and can be hardcoded. The only information needed to specify a particular $n_{i}$ is the list of its exponents.
- The size of each exponent $e_{ij}$ is at most $lo g_{2} (n_{i})$ .
- The number of bits to represent each exponent is thus $O (lo g lo g n_{i})$ .
- Since $k$ is a fixed constant, the total length of the description for $n_{i}$ is bounded by: $K (n_{i}) \leq c + k \cdot O (lo g lo g n_{i}) = c^{'} + O (lo g lo g n_{i})$
Contradiction: We know that there are infinitely many random numbers. We can pick a random number $n_{i}$ that is large enough. For this random number, by definition: $K (n_{i}) \geq ∣ bin (n_{i}) ∣ \approx lo g_{2} n_{i}$ This leads to the inequality: $lo g_{2} n_{i} \leq K (n_{i}) \leq c^{'} + O (lo g lo g n_{i})$ The function $lo g n$ grows asymptotically faster than $lo g lo g n$ . For a sufficiently large $n_{i}$ , this inequality cannot hold. This is a contradiction.
Conclusion: Our assumption that there is a finite number of primes must be false.

This style of argument is incredibly powerful: if assuming a certain mathematical structure allows for too much compression (violating the existence of random objects), then that structure cannot exist.

Complexity of Words in a Recursive Language

Let’s connect complexity to the languages we defined earlier. What can we say about the complexity of words in a recursive (i.e., decidable) language?

Lemma 2.6: Let $L$ be an infinite, recursive language. Let $x_{n}$ be the $n$ -th word in $L$ (in canonical order). Then there exists a constant $c_{L}$ such that:

K (x_{n}) \leq c_{L} + O (lo g n)

Proof Idea

Since $L$ is recursive, there exists an algorithm decide_L(y) that returns true if $y \in L$ and false otherwise. We can use this to build a program that generates $x_{n}$ .

function generate_nth_word(n):
  # This part of the program is fixed and has constant size c_L
  # It includes the code for decide_L and the generator logic.
  
  counter = 0
  y = "" # Start with the first word in canonical order (lambda)
  
  while True:
    if decide_L(y):
      counter += 1
    
    if counter == n:
      print(y)
      return
      
    y = next_string_in_canonical_order(y)

This program’s only input is the integer $n$ . The length of this input is about $lo g_{2} n$ . The rest of the program is a fixed constant. Therefore, the complexity of $x_{n}$ is bounded by $c_{L} + O (lo g n)$ .

A Crucial Caveat

Does this lemma imply that all words in a recursive language are simple (have low complexity)? No!

The key is that $n$ is the order of the word in the language, not its length. Consider the language $L = {0, 1}^{*}$ . This language is recursive (the decider just always returns true). But what is the $n$ -th word? The number of words of length up to $m$ is about $2^{m + 1}$ . So, a word of length $m$ will have an order $n$ that is roughly $2^{m}$ .

n \approx 2^{∣ x_{n} ∣} ⟹ ∣ x_{n} ∣ \approx lo g_{2} n

Plugging this into our bound:

K (x_{n}) \leq c_{L} + O (lo g n) \approx c_{L} + O (∣ x_{n} ∣)

This bound is trivial; it tells us nothing new. The lemma is only powerful for “sparse” languages, where the $n$ -th word grows much faster than $n$ .

The Uncomputability of Kolmogorov Complexity

We have a beautiful, robust definition of information. Now for the final, stunning conclusion: it is impossible to compute.

Theorem: The function $K (x)$ is not computable.

Proof by Contradiction

Assumption: Assume there exists an algorithm ComputeK(x) that, for any string $x$ , halts and returns the integer $K (x)$ .

Construction: We can use this hypothetical algorithm to build a new program, FindComplexString(n), which performs the following task:

function FindComplexString(n):
  # This program takes an integer n as input.
  y = "" # Start with the first word in canonical order (lambda)
  
  while True:
    # Use our hypothetical algorithm to find the complexity of y
    if ComputeK(y) >= n:
      # We found the first string with complexity at least n.
      print(y)
      return 
    
    # Generate the next string in the sequence
    y = next_string_in_canonical_order(y)

Analysis: This program, FindComplexString(n), generates a specific string, let’s call it $x_{n}$ , which is the first string in canonical order with complexity at least $n$ . The program itself serves as a description for $x_{n}$ .
- The code for the generator, the loop, and the call to ComputeK is fixed. Its length is a constant, $c$ .
- The only variable part of the program is the input value $n$ . The length of the binary representation of $n$ is about $lo g_{2} n$ .
- Therefore, we have constructed a program of length $c + O (lo g n)$ that generates $x_{n}$ . This gives us an upper bound on the complexity of $x_{n}$ : $K (x_{n}) \leq c + O (lo g n)$
Contradiction: By its very construction, $x_{n}$ is a string whose complexity is at least $n$ : $K (x_{n}) \geq n$ Combining our two findings, we get the absurd inequality: $n \leq K (x_{n}) \leq c + O (lo g n)$ For any fixed constant $c$ , we can choose an $n$ large enough such that $n > c + O (lo g n)$ . This is a fundamental contradiction.
Conclusion: Our initial assumption, that an algorithm ComputeK(x) exists, must be false.

We have arrived at a profound and somewhat unsettling result. We have found what seems to be the “correct” definition of information and randomness, but it describes a property that is fundamentally beyond our ability to measure or compute.

Continue here: 05 Proving the Prime Number Theorem, Introduction to Finite Automata

CS Notes

Explorer

04 Kolmogorov Complexity and the Nature of Randomness

Recap and Warm-up

Bounding Complexity for Structured Sequences

Defining Randomness

The Existence of Random Strings

Proof by Counting

Randomness for Numbers

The Invariance Theorem: A Robust Definition

Proof Idea

Kolmogorov Complexity as a Proof Tool

A New Proof for the Infinitude of Primes

Proof by Contradiction

Complexity of Words in a Recursive Language

Proof Idea

A Crucial Caveat

The Uncomputability of Kolmogorov Complexity

Proof by Contradiction

Table of Contents

Graph View

Backlinks