Chapter 2 - Alphabets, Words, Languages, and Problem Representation

In the world of computing, everything ultimately reduces to sequences of symbols. Programs are symbol sequences, data is stored as bits, and every interaction with a computer, from a mouse click to a complex calculation, can be represented in this fundamental way. To study computation, we first need a precise vocabulary to describe these symbol sequences.

This chapter introduces the fundamental vocabulary of theoretical computer science. We will build a formal framework for describing information, which will serve as the bedrock for all the concepts that follow, from simple automata to the limits of what can be computed.

2.1 Aims

This chapter is designed to equip you with the foundational concepts for our entire journey. By the end, you will be able to:

Define the Core Vocabulary: Understand and use the fundamental concepts of Alphabets, Words, and Languages. These are the basic building blocks for describing information in a computational context, much like letters form words and sentences in natural languages.
Formalize Algorithmic Problems: Use this new vocabulary to precisely define what an “algorithmic problem” is. We will focus on crucial classes such as Decision Problems, Optimization Problems, Relation Problems, and Generation/Enumeration Problems.
Measure Information Content: Explore the idea of compressibility by introducing Kolmogorov Complexity. This powerful concept allows us to measure the intrinsic information content of a word and provides a formal, computable definition for the elusive concept of randomness. This also serves as a valuable instrument for investigating computations and proving non-existence results.

2.2 Alphabets, Words, and Languages

Just as natural languages are built from letters, the languages of computation are built from symbols. To precisely define what computers process and how they do it, we must first formalize these fundamental building blocks.

2.2.1 Alphabets: The Building Blocks

Definition 2.1 (Alphabet)

An alphabet is a finite, non-empty set of symbols. We typically denote it with the Greek letter $Σ$ (Sigma). The elements of an alphabet are called letters, characters, or symbols.

The choice of symbols is arbitrary; what matters is that the set is finite. The meaning of the symbols comes from how we interpret them, not from their appearance. For instance, ‘0’ and ‘1’ could just as easily be ‘A’ and ‘B’ – their computational role remains the same.

Common Examples of Alphabets:

The Boolean Alphabet: $Σ_{b oo l} = {0, 1}$ . This is the most fundamental alphabet in computing, representing binary data.
The Latin Alphabet: $Σ_{l a t} = {a, b, c, \dots, z}$ .
A Keyboard Alphabet: $Σ_{k ey b o a r d}$ includes all symbols on a standard computer keyboard: letters, numbers, punctuation, and special characters like the space symbol ( $␣$ ).
The Alphabet for m-adic numbers: $Σ_{m} = {0, 1, \dots, m - 1}$ for any integer $m \geq 1$ .
The Alphabet for Logic: $Σ_{l o g i c} = {0, 1, x, (,), \land, \lor, \neg}$ , used to represent Boolean formulas.

2.2.2 Words: Sequences of Symbols

Once we have an alphabet, we can form sequences of symbols. In computer science, any text, regardless of its length or meaning in natural language, is considered a “word.”

Definition 2.2 (Word)

A word (or string) over an alphabet $Σ$ is a finite sequence of letters from $Σ$ .

The empty word, denoted by $λ$ (or sometimes $ϵ$ ), is the sequence with zero letters. Its length is $∣ λ ∣ = 0$ .
The length of a word $w$ , denoted $∣ w ∣$ , is the number of letters in the sequence.
$Σ^{*}$ is the set of all possible words over the alphabet $Σ$ , including the empty word. This set represents every possible finite combination of letters from $Σ$ .
$Σ^{+} = Σ^{*} ∖ {λ}$ is the set of all non-empty words over $Σ$ .

For example, 010011 is a word over $Σ_{b oo l}$ with length $∣010011∣ = 6$ . The set $(Σ_{b oo l})^{*}$ contains all possible binary strings: ${λ, 0, 1, 00, 01, 10, 11, 000, \dots}$ .

2.2.3 Representing Objects as Words (Encoding)

The true power of this formalism comes from its ability to represent any object of interest in computation as a unique word. This universal representation process is called encoding. It’s how we translate complex ideas—numbers, graphs, logical formulas, or even entire computer programs—into the simple sequences of symbols that computers can process.

Representing Numbers

A binary word $x = x_{1} x_{2} \dots x_{n} \in {0, 1}^{*}$ can represent the natural number: $Number (x) = \sum_{i = 1}^{n} x_{i} \cdot 2^{n - i}$ We denote the standard, shortest binary representation of a number $m$ as $Bin (m)$ . For $m > 0$ , this representation conventionally starts with a ‘1’. For example, $Bin (5) = 101$ . $Bin (0) = 0$ .

A sequence of numbers, like $(a_{1}, a_{2}, \dots, a_{m})$ , can be encoded as a single word using a special separator symbol, for instance: $Bin (a_{1}) # Bin (a_{2}) # \dots # Bin (a_{m}) \in {0, 1, #}^{*}$

Representing Graphs

A directed graph $G = (V, E)$ with $n$ vertices can be represented by its $n \times n$ adjacency matrix, $M_{G}$ . In this matrix, the entry $a_{ij}$ is 1 if there is an edge from vertex $v_{i}$ to vertex $v_{j}$ , and 0 otherwise.

We can flatten this matrix into a word by listing its rows, separated by a $#$ symbol.

Graph Encoding

The graph with the adjacency matrix
$0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{pmatrix} $$ is encoded as the word `0011#0011#0101#0000#` over the alphabet $\{0,1,\#\}$. This representation is unambiguous; we can perfectly reconstruct the graph from the word.$

Weighted graphs can be represented similarly by encoding the integer weights in binary, using a single $#$ to separate weights in a row and a double ## to separate rows. For example, if an edge weight is 7, it would be $Bin (7)$ or 111.

Representing Logical Formulas

Boolean formulas can also be encoded as words. We can use an alphabet like $Σ_{l o g i c} = {0, 1, x, (,), \land, \lor, \neg}$ . To handle an infinite number of variables like $x_{1}, x_{2}, \dots$ , we can encode the variable $x_{i}$ as the word x followed by $Bin (i)$ .

Formula Encoding

The formula $(x_{1} \lor x_{7}) \land \neg (x_{12})$ is encoded as: (x1 ∨ x111) ∧ ¬(x1100) Here, $Bin (1) = 1$ , $Bin (7) = 111$ , and $Bin (12) = 1100$ .

2.2.4 Operations on Words

We can manipulate words using several fundamental operations.

Definition 2.3 (Concatenation)

The concatenation of two words $x$ and $y$ , written $x y$ , is the word formed by appending $y$ to the end of $x$ . For example, if $x = 0aa1$ and $y = 11b$ , then $x y = 0aa111b$ .

Concatenation is associative (i.e., $(uv) w = u (v w)$ ). The set of all words $Σ^{*}$ with the concatenation operation forms a mathematical structure called a monoid, with the empty word $λ$ as the neutral element ( $x λ = λ x = x$ ). Concatenation is generally not commutative (i.e., $x y \neq = y x$ unless specific conditions are met). The length of concatenated words is additive: $∣ x y ∣ = ∣ x ∣ + ∣ y ∣$ .

Definition 2.4 (Reversal)

For a word $w = a_{1} a_{2} \dots a_{n}$ , where $a_{i} \in Σ$ , the reversal of $w$ is the word $w^{R} = a_{n} \dots a_{2} a_{1}$ . The reversal of the empty word is $λ^{R} = λ$ . A useful property is that for any words $u, v \in Σ^{*}$ , $(uv)^{R} = v^{R} u^{R}$ .

Definition 2.5 (Iteration)

For any word $w \in Σ^{*}$ and any integer $i \geq 0$ , the $i$ -th iteration of $w$ , denoted $w^{i}$ , is the word formed by concatenating $w$ with itself $i$ times. By definition, $w^{0} = λ$ and $w^{i + 1} = w w^{i}$ . For example, $(ab)^{3} = ababab$ .

Definition 2.6 (Subwords)

Let $v, w \in Σ^{*}$ .

$v$ is a subword (or substring) of $w$ if there exist words $x, y \in Σ^{*}$ such that $w = xv y$ .
$v$ is a prefix of $w$ if there exists a word $y \in Σ^{*}$ such that $w = v y$ .
$v$ is a suffix of $w$ if there exists a word $x \in Σ^{*}$ such that $w = xv$ . A subword, prefix, or suffix $v$ is proper if it is not equal to the word $w$ itself (i.e., $v \neq = w$ ).

Definition 2.7 (Symbol Count)

For a word $x \in Σ^{*}$ and a symbol $a \in Σ$ , the notation $∣ x ∣_{a}$ denotes the number of occurrences of the symbol $a$ in $x$ . For example, for $x = abbab$ , $∣ x ∣_{a} = 2$ and $∣ x ∣_{b} = 3$ . The total length of a word is the sum of the counts of all its symbols: $∣ x ∣ = \sum_{a \in Σ} ∣ x ∣_{a}$ .

2.2.5 Languages: Sets of Words

In formal language theory, the term “language” has a very broad meaning, encompassing any collection of words.

Definition 2.9 (Language)

A language $L$ over an alphabet $Σ$ is any subset of $Σ^{*}$ .

This means any set of strings, finite or infinite, is a language. Think of it as a collection of “valid” or “meaningful” words from the infinite pool of all possible words. Languages are the primary way we define the properties of strings we are interested in.

Examples of languages over $Σ = {a, b}$ :

$L_{1} = \emptyset$ (the empty language, containing no words).
$L_{2} = {λ}$ (the language containing only the empty word).
$L_{3} = {a^{n} b^{n} ∣ n \in N} = {λ, ab, aabb, aaabbb, \dots}$ . This language represents strings with a block of a’s followed by an equal number of b’s.
$L_{4} = {w \in {a, b}^{*} ∣ ∣ w ∣_{a} = ∣ w ∣_{b}}$ (all words with an equal number of a’s and b’s).
The set of all syntactically correct C++ programs (a language over $Σ_{k ey b o a r d}$ ).

Since languages are sets, we can apply standard set operations like union ( $L_{1} \cup L_{2}$ ), intersection ( $L_{1} \cap L_{2}$ ), and complement ( $\overline{L} = Σ^{*} ∖ L$ ). We can also extend word operations to languages:

Concatenation: $L_{1} L_{2} = {v w ∣ v \in L_{1}, w \in L_{2}}$ . This forms a new language by concatenating every word from $L_{1}$ with every word from $L_{2}$ .
Kleene Star: $L^{*} = ⋃_{i \geq 0} L^{i} = L^{0} \cup L^{1} \cup L^{2} \cup \dots$ , where $L^{0} = {λ}$ . The Kleene star represents concatenating words from the language zero or more times, including the empty word.
Positive Closure: $L^{+} = ⋃_{i \geq 1} L^{i} = L L^{*}$ . This is similar to the Kleene star but excludes the empty word.

Lemma 2.1 (Distributivity of Concatenation over Union)

For any languages $L_{1}, L_{2}, L_{3}$ over an alphabet $Σ$ , the following holds: $L_{1} (L_{2} \cup L_{3}) = L_{1} L_{2} \cup L_{1} L_{3}$ This means concatenation distributes over union, similar to multiplication over addition in arithmetic.

Lemma 2.2 (Inclusion for Concatenation and Intersection)

For any languages $L_{1}, L_{2}, L_{3}$ over an alphabet $Σ$ , the following inclusion holds: $L_{1} (L_{2} \cap L_{3}) \subseteq L_{1} L_{2} \cap L_{1} L_{3}$

Proof Idea

If a word $x$ is in $L_{1} (L_{2} \cap L_{3})$ , it means $x = yz$ where $y \in L_{1}$ and $z \in L_{2} \cap L_{3}$ . This implies $z \in L_{2}$ and $z \in L_{3}$ . Therefore, $x \in L_{1} L_{2}$ and $x \in L_{1} L_{3}$ , which means $x \in L_{1} L_{2} \cap L_{1} L_{3}$ .

Lemma 2.3 (Strict Inclusion Example)

There exist languages $U_{1}, U_{2}, U_{3}$ for which the inclusion in Lemma 2.2 is proper (i.e., not an equality).

Proof of Lemma 2.3

Let $Σ = {0, 1}$ . Consider $U_{1} = {λ, 1}$ , $U_{2} = {0}$ , and $U_{3} = {10}$ .

First, calculate the left side: $U_{2} \cap U_{3} = \emptyset$ (since $0 \neq = 10$ ). So, $U_{1} (U_{2} \cap U_{3}) = U_{1} \emptyset = \emptyset$ .

Next, calculate the right side: $U_{1} U_{2} = {λ \cdot 0, 1 \cdot 0} = {0, 10}$ . $U_{1} U_{3} = {λ \cdot 10, 1 \cdot 10} = {10, 110}$ . The intersection is $U_{1} U_{2} \cap U_{1} U_{3} = {10}$ .

Since $\emptyset \neq = {10}$ , the equality $U_{1} (U_{2} \cap U_{3}) = U_{1} U_{2} \cap U_{1} U_{3}$ does not hold. The word 10 is in $U_{1} U_{2} \cap U_{1} U_{3}$ because it can be formed as $1 \cdot 0$ (where $1 \in U_{1}, 0 \in U_{2}$ ) and as $λ \cdot 10$ (where $λ \in U_{1}, 10 \in U_{3}$ ). However, it cannot be formed by concatenating a word from $U_{1}$ with a word from $U_{2} \cap U_{3}$ , as $U_{2} \cap U_{3}$ is empty.

2.2.6 Canonical Order and Homomorphisms

To work with infinite sets of words, it’s often useful to have a standardized, predictable way to list them. This is where the concept of a canonical order comes in.

Definition 2.8 (Canonical Order)

Let $Σ = {s_{1}, s_{2}, \dots, s_{m}}$ be an alphabet with a fixed ordering on its symbols, $s_{1} < s_{2} < \dots < s_{m}$ . The canonical order (or lexicographical order) on $Σ^{*}$ is defined as follows: for any two words $u, v \in Σ^{*}$ , we say $u < v$ if:

$∣ u ∣ < ∣ v ∣$ (shorter words come first), OR
$∣ u ∣ = ∣ v ∣$ and $u$ comes before $v$ in the standard dictionary order (comparing symbol by symbol from left to right).

For example, for $Σ = {0, 1}$ with $0 < 1$ , the canonical order is: $λ, 0, 1, 00, 01, 10, 11, 000, \dots$

This ordering is crucial for enumeration problems, where we need to list the words of a language in a specific sequence.

Another powerful tool for relating languages to each other is a homomorphism, which is a function that preserves the structure of word concatenation.

Definition 2.10 (Homomorphism)

A homomorphism is a function $h : Σ_{1}^{*} \to Σ_{2}^{*}$ between two alphabets $Σ_{1}$ and $Σ_{2}$ that satisfies two properties:

$h (λ) = λ$ (it maps the empty word to the empty word).
$h (uv) = h (u) h (v)$ for all words $u, v \in Σ_{1}^{*}$ .

Because of the second property, a homomorphism is completely determined by its action on the individual symbols of the alphabet $Σ_{1}$ . If we know $h (a)$ for every $a \in Σ_{1}$ , we can find the image of any word by concatenating the images of its symbols. For example, if $x = x_{1} x_{2} \dots x_{m}$ , then $h (x) = h (x_{1}) h (x_{2}) \dots h (x_{m})$ .

Homomorphism for Encoding

We can use a homomorphism to encode words from one alphabet into another. Let $Σ_{1} = {a, b}$ and $Σ_{2} = {0, 1}$ . Define a homomorphism $h$ by $h (a) = 01$ and $h (b) = 11$ . Then the word aba is mapped to: $h (aba) = h (a) h (b) h (a) = 011101$ .

Homomorphisms are fundamental in formal language theory for simplifying proofs and relating different classes of languages.

2.3 Algorithmic Problems

With our formal vocabulary of words and languages, we can now define what we mean by an “algorithmic problem.” Just as we formalized data, we must formalize the questions we ask about that data to study their solvability and complexity. We’ll focus on several main types.

2.3.1 Decision Problems

A decision problem is a question with a yes/no answer. We can formalize this using languages.

Definition 2.11 (Decision Problem)

A decision problem is a pair $(Σ, L)$ , where $Σ$ is an alphabet and $L \subseteq Σ^{*}$ is a language. The problem is: for any given word $x \in Σ^{*}$ , decide whether $x \in L$ .

An algorithm solves or decides this problem if it halts on every input $x$ and outputs “yes” (e.g., 1) if $x \in L$ , and “no” (e.g., 0) if $x \in / L$ . A language for which such an algorithm exists is called recursive or decidable.

Primality Test

Problem: Given a number, is it prime?

Formalization: Let $Σ = {0, 1}$ and $L_{PR I ME} = {x \in {0, 1}^{*} ∣ Number (x) is a prime number}$ . The problem is $(Σ, L_{PR I ME})$ . An algorithm for this problem would take a binary string $x$ as input and output 1 if $Number (x)$ is prime, and 0 otherwise.

Hamiltonian Cycle Problem

Problem: Does a given graph contain a cycle that visits every vertex exactly once?

Formalization: Let $Σ = {0, 1, #}$ and $L_{H C} = {w \in Σ^{*} ∣ w encodes a graph with a Hamiltonian cycle}$ . The problem is $(Σ, L_{H C})$ . The input $w$ would be an encoded graph, and the output would be 1 or 0.

2.3.2 Optimization Problems

Many real-world problems ask for the “best” solution, not just a yes/no answer.

Definition 2.14 (Optimization Problem)

An optimization problem is a 6-tuple $U = (Σ_{I}, Σ_{O}, L, M, cost, goal)$ , where:

$Σ_{I}$ is the input alphabet.
$Σ_{O}$ is the output alphabet.
$L \subseteq Σ_{I}^{*}$ is the language of valid problem instances (i.e., inputs that have a meaningful interpretation). An $x \in L$ is called a problem instance.
$M$ is a function from $L$ to $P (Σ_{O}^{*})$ , where $M (x)$ is the set of feasible solutions for an instance $x \in L$ .
$cost$ is a function, $cost : ⋃_{x \in L} (M (x) \times {x}) \to R^{+}$ , assigning a positive real cost to a feasible solution $s \in M (x)$ for a given instance $x$ .
$goal \in {Minimum, Maximum}$ is the optimization objective.

The task is to find a feasible solution with the optimal (minimum or maximum) cost. An algorithm solves $U$ if for every $x \in L$ , it outputs a solution $A (x) \in M (x)$ such that $cost (A (x), x)$ is optimal.

Traveling Salesman Problem (TSP)

Input: An edge-weighted complete graph $(G, c)$ . This would be encoded as a word $x \in Σ_{I}^{*}$ .

Feasible Solutions $M (G, c)$ : The set of all Hamiltonian cycles in $G$ . Each cycle can be encoded as a word $s \in Σ_{O}^{*}$ .

Cost: The sum of the weights of the edges in a cycle.

Goal: Minimum.

The main difficulty in optimization problems is that the set of feasible solutions $M (x)$ is often astronomically large, making it impossible to check them all. For a graph with $n$ vertices, there are $\frac{( n - 1 )!}{2}$ distinct Hamiltonian cycles, which grows extremely fast.

2.3.3 Relation Problems

Beyond decision and optimization problems lies a more general class: relation problems. While a decision problem asks for a yes/no answer (a function to ${0, 1}$ ) and an optimization problem seeks a single best solution, a relation problem allows for multiple valid outputs for a single input.

Definition 2.13 (Relation Problem)

A relation problem is defined by a relation $R \subseteq Σ_{I}^{*} \times Σ_{O}^{*}$ , where $Σ_{I}$ and $Σ_{O}$ are the input and output alphabets. The task is: for a given input $x \in Σ_{I}^{*}$ , find any output $y \in Σ_{O}^{*}$ such that $(x, y) \in R$ .

If for a given $x$ no such $y$ exists, the problem may be undefined or require a specific output indicating this. The key difference from a function is that a relation can associate one input with many possible correct outputs.

Finding a Factor

Problem: Given a composite number, find one of its non-trivial factors.

Formalization: Let $R_{f a c t or} \subseteq {0, 1}^{*} \times {0, 1}^{*}$ be the relation where $(x, y) \in R_{f a c t or}$ if and only if $Number (x)$ is composite and $Number (y)$ is a factor of $Number (x)$ such that $1 < Number (y) < Number (x)$ .

Task: For input $x$ representing the number 15 (e.g., 1111), both outputs $y$ representing 3 (11) and $y$ representing 5 (101) are correct. An algorithm only needs to find one of them.

2.3.4 Generation and Enumeration Problems

Some algorithmic tasks don’t require an input at all. Their purpose is to generate a specific output or to list all members of a set.

Definition 2.15 (Generation Problem)

The task of creating a single, specific word $x \in Σ^{*}$ . An algorithm solves this by taking no input (or a trivial one like $λ$ ) and producing $x$ . A program that generates $x$ can be considered an alternative representation of $x$ .

Definition 2.16 (Enumeration Problem)

The task of listing the words of a language $L$ . An algorithm for this task, given an integer $n$ , typically outputs the first $n$ words of $L$ according to a defined order (like the canonical order).

A language is recursively enumerable if an algorithm exists that can list all of its words (though the process might never halt if the language is infinite). If an algorithm can also decide membership in the language (a decision problem), the language is called recursive or decidable.

Enumerating Primes

Problem: List the first $n$ prime numbers.

Formalization: Let $L_{PR I ME}$ be the language of binary representations of prime numbers.

Task: An enumeration algorithm, given input $n$ , would output the first $n$ words from $L_{PR I ME}$ in canonical order. For $n = 4$ , the output would be 10, 11, 101, 111 (representing $2, 3, 5, 7$ ).

2.4 Kolmogorov Complexity

Is the string 0101010101010101 complex? What about 1101001011101001? Intuitively, the first is simple because it has a short description: “repeat 01 eight times.” The second seems random, with no obvious pattern. Kolmogorov complexity formalizes this intuition by measuring the intrinsic information content of a word.

Definition 2.17 (Kolmogorov Complexity)

The Kolmogorov complexity of a word $x \in {0, 1}^{*}$ , denoted $K (x)$ , is the length of the shortest program (in a fixed, universal programming language) that generates $x$ as its output and then halts.

This definition elegantly circumvents the problem of choosing a specific compression method by allowing any computable compression. A program that generates a string is the ultimate compressed representation of that string. The length of this shortest program is the measure of its information content.

Lemma 2.4 (Upper Bound)

There exists a constant $c$ such that for every word $x \in {0, 1}^{*}$ : $K (x) \leq ∣ x ∣ + c$ This is because we can always write a simple program that just contains the string $x$ literally and prints it. The constant $c$ accounts for the fixed length of the print instruction itself and any overhead for encoding the literal string. This shows that the Kolmogorov complexity of a word is never significantly larger than its length.

Lemma 2.5 (Incompressibility)

For every integer $n > 0$ , there exists at least one word $w_{n}$ of length $n$ such that $K (w_{n}) \geq n$ .

Proof Idea

This is a simple counting argument. There are $2^{n}$ possible binary words of length $n$ . The number of programs (binary strings) shorter than $n$ is the sum of the number of binary strings of length $0, 1, \dots, n - 1$ . This sum is $\sum_{i = 0}^{n - 1} 2^{i} = 2^{n} - 1$ . Since there are $2^{n}$ words of length $n$ but only $2^{n} - 1$ programs shorter than $n$ , by the Pigeonhole Principle, at least one word of length $n$ cannot be generated by any program shorter than itself. This word is incompressible.

This implies that most long strings are incompressible, which leads to the idea of defining randomness as incompressibility.

Theorem 2.1 (Invariance Theorem)

Let A and B be any two universal programming languages. There exists a constant $c_{A, B}$ (that depends only on A and B) such that for all words $x$ : $∣ K_{A} (x) - K_{B} (x) ∣ \leq c_{A, B}$ This theorem is crucial. It states that the choice of programming language does not significantly change the Kolmogorov complexity of a string. The complexity is robust and language-independent, up to an additive constant. The constant represents the length of a compiler or interpreter program that translates between the two languages. This justifies using “the” Kolmogorov complexity without specifying a particular programming language.

The existence of an algorithm to decide a language can lead to strong statements about the complexity of the words in that language.

Theorem 2.2 (Kolmogorov Complexity of Decidable Language Elements)

Let $L$ be a recursive (decidable) language over ${0, 1}^{*}$ . Let $z_{n}$ be the $n$ -th word in $L$ according to the canonical order. Then there exists a constant $c$ such that for all $n > 0$ : $K (z_{n}) \leq ⌈ lo g_{2} (n + 1)⌉ + c$

Proof Idea

We can construct a program that generates $z_{n}$ . This program takes $n$ as input and uses the decider for $L$ as a subroutine. It works as follows:

Initialize a counter i = 0.

Generate all words $x \in {0, 1}^{*}$ one by one in canonical order ( $λ, 0, 1, 00, \dots$ ).

For each word $x$ , run the decider for $L$ .

If $x \in L$ , increment the counter i.

If i equals the input n, halt and output the current word $x$ .

The program itself has a fixed size, independent of $n$ . The only part of the program that changes is the input number $n$ . The length of the binary representation of $n$ is approximately $lo g_{2} (n)$ . The constant $c$ accounts for the fixed size of the generation and checking logic. This theorem shows that if a word belongs to a decidable set, its complexity is bounded by the complexity of its index in that set. This implies that words in decidable languages, when ordered canonically, cannot be “too random” relative to their position in the enumeration.

A word $x$ is considered random if it is incompressible, i.e., if $K (x) \geq ∣ x ∣$ . This provides a powerful, algorithmic definition of randomness, aligning with the intuition that a random object has no shorter description than itself.

Definition 2.19 (Randomness)

A word $x \in {0, 1}^{*}$ is called random if its Kolmogorov complexity is at least its length, i.e., $K (x) \geq ∣ x ∣$ . A number $n$ is called random if $K (n) = K (Bin (n)) \geq ⌈ lo g_{2} (n + 1)⌉ - 1$ .

Application: Lower Bounds for Prime Numbers (Weaker Prime Number Theorem)

Kolmogorov complexity can be used to prove results in number theory. Here, we present a proof idea for a weaker version of the Prime Number Theorem, demonstrating that prime numbers are not “too sparse.”

Lemma 2.6 (Infinite Prime Factors for Complex Numbers)

Let $n_{1}, n_{2}, n_{3}, \dots$ be a strictly increasing infinite sequence of natural numbers such that $K (n_{i}) \geq ⌈ lo g_{2} n_{i} ⌉ /2$ for every $i \in N^{+}$ . For every $i$ , let $q_{i}$ be the largest prime number that divides $n_{i}$ . Then the set $Q = {q_{i} ∣ i \in N^{+}}$ is infinite.

Proof Idea (by contradiction)

Assume $Q$ is finite. Let $p_{m}$ be the largest prime in $Q$ . Then every $n_{i}$ can be uniquely represented as a product of primes from $Q$ : $n_{i} = p_{1}^{r_{1}} \cdot p_{2}^{r_{2}} \cdot \dots \cdot p_{m}^{r_{m}}$ . We can construct a program that generates $n_{i}$ given the exponents $(r_{1}, \dots, r_{m})$ . The length of this program would be roughly proportional to the sum of the lengths of the binary representations of these exponents. Since each $r_{j} \leq lo g_{2} n_{i}$ , the program length would be approximately $c + m \cdot lo g_{2} (lo g_{2} n_{i})$ . If $Q$ were finite, $m$ would be a constant. This would imply $K (n_{i}) \leq c^{'} + lo g_{2} (lo g_{2} n_{i})$ for some constant $c^{'}$ . However, for sufficiently large $n_{i}$ , $lo g_{2} (lo g_{2} n_{i})$ grows much slower than $⌈ lo g_{2} n_{i} ⌉ /2$ . This contradicts our assumption that $K (n_{i}) \geq ⌈ lo g_{2} n_{i} ⌉ /2$ . Therefore, $Q$ must be infinite.

This lemma, combined with careful encoding arguments, can be used to derive lower bounds on the density of prime numbers, showing that they appear frequently enough to support various computational tasks, especially in randomized algorithms.

2.5 Summary and Outlook

This chapter has laid the essential groundwork for our study of theoretical computer science.

We defined alphabets, words, and languages as the formal tools to describe data and problems, emphasizing that any computational object can be encoded as a word.
We categorized algorithmic tasks into decision problems, optimization problems, relation problems, and generation/enumeration problems, providing a precise framework for discussing what algorithms do.
We introduced Kolmogorov complexity as a robust measure of the intrinsic information content of a string, which also provides a formal definition of randomness. This concept is powerful for both theoretical proofs and understanding the limits of compression.

With this formal language in hand, we are now ready to explore the machines that process it. In the next chapter, we will start with the simplest model of computation: the finite automaton, and use the concepts developed here to understand its capabilities and limitations.

Previous Chapter: Chapter 1 - Introduction Next Up: Chapter 3 - Finite Automata

2.6 Exercises

Exercise 2.1

How many different words of length $i$ can be formed over an alphabet $Σ$ with $∣Σ∣ = m$ ?

Solution

For each of the $i$ positions in the word, there are $m$ choices for the symbol. Thus, there are $m \times m \times \dots \times m$ ( $i$ times) = $m^{i}$ possible words.

Exercise 2.2

Given the alphabet $Σ = {0, 1, #}$ . Let $k, n$ be positive integers with $k \leq n$ . (a) Determine the number of different words of length $n$ with exactly $k$ occurrences of the symbol $#$ . (b) Determine the number of different words of length $n$ with at most $k$ occurrences of the symbol $#$ .

Solution

(a) This is a combinatorial problem. We need to choose $k$ positions for the $#$ symbol out of $n$ available positions. The number of ways to do this is given by the binomial coefficient $(k n)$ . For the remaining $n - k$ positions, we can place either a 0 or a 1. Since there are 2 choices for each of these positions, there are $2^{n - k}$ possibilities. Therefore, the total number of such words is $(k n) \cdot 2^{n - k}$ .

(b) The number of words of length $n$ with at most $k$ occurrences of $#$ is the sum of words with $0, 1, \dots, k$ occurrences of $#$ . This is $\sum_{j = 0}^{k} (j n) \cdot 2^{n - j}$ .

Exercise 2.3

A binary representation of every positive number begins with a 1. How long is $Bin (m)$ for a given number $m$ ?

Solution

For $m > 0$ , the length of $Bin (m)$ is $⌊ lo g_{2} m ⌋ + 1$ . For $m = 0$ , $Bin (0) = 0$ , so its length is 1.

Exercise 2.4

Let $x \in Σ_{m}^{*}$ for an $m \geq 1$ . Consider $x$ as an $m$ -adic representation of a number $Number_{m} (x)$ . How do you compute $Number_{m} (x)$ ?

Solution

If $x = x_{1} x_{2} \dots x_{n}$ , where $x_{i} \in Σ_{m}$ , then $Number_{m} (x) = \sum_{i = 1}^{n} x_{i} \cdot m^{n - i}$ . This is the standard positional numeral system conversion.

Exercise 2.5

The proposed representation of a graph as a word over ${0, 1, #}$ has length $n (n + 1)$ for a graph with $n$ vertices (using $n$ rows and $n$ separators). Think of a shorter unique representation of graphs over the alphabet ${0, 1, #}$ .

Solution

For an undirected graph, the adjacency matrix is symmetric. We only need to store the upper (or lower) triangle of the matrix. For $n$ vertices, there are $\frac{n ( n - 1 )}{2}$ entries in the upper triangle (excluding the diagonal). We could list these entries in a fixed order (e.g., row by row, then column by column). For example, for a graph with 3 vertices, instead of a11a12a13#a21a22a23#a31a32a33#, we could just list a12a13a23. This would be $\frac{n ( n - 1 )}{2}$ bits, plus separators if needed. If we assume a fixed number of vertices $n$ is known, we might not even need separators.

Exercise 2.6

Design a representation for graphs over the alphabet $Σ_{b oo l}$ .

Solution

We can encode the number of vertices $n$ first, then the adjacency matrix. For example, Bin(n)# followed by the flattened adjacency matrix (row by row, without separators between bits, but with a separator between rows). A more compact way: $Bin (n)$ followed by the binary string representing the concatenation of all rows of the adjacency matrix. For example, for a 3-vertex graph with adjacency matrix 011;101;110, the encoding could be Bin(3)#011101110. The number of vertices $n$ allows us to reconstruct the $n \times n$ matrix.

Exercise 2.7

Prove that for any words $u, v \in Σ^{*}$ , $(uv)^{R} = v^{R} u^{R}$ .

Solution

Let $u = u_{1} u_{2} \dots u_{n}$ and $v = v_{1} v_{2} \dots v_{m}$ , where $u_{i}, v_{j} \in Σ$ . Then $uv = u_{1} u_{2} \dots u_{n} v_{1} v_{2} \dots v_{m}$ . The reversal of $uv$ is $(uv)^{R} = v_{m} \dots v_{2} v_{1} u_{n} \dots u_{2} u_{1}$ .

Separately, $v^{R} = v_{m} \dots v_{2} v_{1}$ and $u^{R} = u_{n} \dots u_{2} u_{1}$ . Concatenating these gives $v^{R} u^{R} = (v_{m} \dots v_{1}) (u_{n} \dots u_{1}) = v_{m} \dots v_{1} u_{n} \dots u_{1}$ .

Since $(uv)^{R}$ and $v^{R} u^{R}$ are identical, the property holds.

Exercise 2.8

What is the maximum number of distinct subwords a word of length $n$ can have? List all different subwords of the word abbcbbab.

Solution

A word of length $n$ can have at most $\frac{n ( n + 1 )}{2} + 1$ distinct subwords (including the empty word). This maximum is achieved when all possible substrings are unique, for example, in a word where all characters are distinct.

For the word abbcbbab (length $n = 8$ ):

Length 0: $λ$

Length 1: a, b, c

Length 2: ab, bb, bc, cb, ba

Length 3: abb, bbc, bcb, cbb, bba, bab

Length 4: abbc, bbcb, bcbb, cbba, bbab

Length 5: abbcb, bbcbb, bcbba, cbbab

Length 6: abbcbb, bbcbbab, bcbbab

Length 7: abbcbbab

Length 8: abbcbbab

Distinct subwords: $λ$ , a, b, c, ab, bb, bc, cb, ba, abb, bbc, bcb, cbb, bba, bab, abbc, bbcb, bcbb, cbba, bbab, abbcb, bbcbb, bcbba, cbbab, abbcbb, bbcbbab, bcbbab, abbcbbab. Total count: 28 distinct subwords. Using the formula: $\frac{8 ( 8 + 1 )}{2} + 1 = \frac{72}{2} + 1 = 36 + 1 = 37$ . The difference arises because not all subwords are unique (e.g., bb appears multiple times).

Exercise 2.9

Let $L_{1} = {λ, ab, b^{3} a^{4}}$ and $L_{2} = {ab, b, a b^{2}, b^{4}}$ . Which words are in the language $L_{1} L_{2}$ ?

Solution

$L_{1} L_{2} = {v w ∣ v \in L_{1}, w \in L_{2}}$

From $v = λ$ : $λ \cdot ab = ab$ , $λ \cdot b = b$ , $λ \cdot a b^{2} = a b^{2}$ , $λ \cdot b^{4} = b^{4}$ .

From $v = ab$ : $ab \cdot ab = abab$ , $ab \cdot b = abb$ , $ab \cdot a b^{2} = aba b^{2}$ , $ab \cdot b^{4} = ab b^{4}$ .

From $v = b^{3} a^{4}$ : $b^{3} a^{4} \cdot ab = b^{3} a^{5} b$ , $b^{3} a^{4} \cdot b = b^{3} a^{4} b$ , $b^{3} a^{4} \cdot a b^{2} = b^{3} a^{5} b^{2}$ , $b^{3} a^{4} \cdot b^{4} = b^{3} a^{4} b^{4}$ .

So, $L_{1} L_{2} = {ab, b, a b^{2}, b^{4}, abab, abb, aba b^{2}, ab b^{4}, b^{3} a^{5} b, b^{3} a^{4} b, b^{3} a^{5} b^{2}, b^{3} a^{4} b^{4}}$ .

Exercise 2.10

Let $L_{1}, L_{2}, L_{3}$ be languages over the alphabet ${0}$ . Does $L_{1} (L_{2} \cap L_{3}) = L_{1} L_{2} \cap L_{1} L_{3}$ hold?

Solution

No, the equality does not hold. The same counterexample logic applies as for multi-letter alphabets. Let $L_{1} = {λ, 0}$ , $L_{2} = {00}$ , and $L_{3} = {0}$ .

Left side: $L_{2} \cap L_{3} = \emptyset$ . So $L_{1} (L_{2} \cap L_{3}) = \emptyset$ .

Right side: $L_{1} L_{2} = {λ \cdot 00, 0 \cdot 00} = {00, 000}$ . $L_{1} L_{3} = {λ \cdot 0, 0 \cdot 0} = {0, 00}$ . $L_{1} L_{2} \cap L_{1} L_{3} = {00}$ .

Since $\emptyset \neq = {00}$ , the equality does not hold.

Exercise 2.11

Let $L_{1} \subseteq Σ_{1}^{*}$ and $L_{2}, L_{3} \subseteq Σ_{2}^{*}$ for two alphabets $Σ_{1}$ and $Σ_{2}$ with $Σ_{1} \cap Σ_{2} = \emptyset$ . Does $L_{1} (L_{2} \cap L_{3}) = L_{1} L_{2} \cap L_{1} L_{3}$ hold?

Solution

Yes, the equality does hold if $Σ_{1} \cap Σ_{2} = \emptyset$ .

We know that $L_{1} (L_{2} \cap L_{3}) \subseteq L_{1} L_{2} \cap L_{1} L_{3}$ always holds.

Now we prove the reverse inclusion: $L_{1} L_{2} \cap L_{1} L_{3} \subseteq L_{1} (L_{2} \cap L_{3})$ . Let $x \in L_{1} L_{2} \cap L_{1} L_{3}$ . This means $x \in L_{1} L_{2}$ and $x \in L_{1} L_{3}$ . So, $x = y_{1} z_{1}$ for some $y_{1} \in L_{1}$ and $z_{1} \in L_{2}$ . And $x = y_{2} z_{2}$ for some $y_{2} \in L_{1}$ and $z_{2} \in L_{3}$ .

Since $L_{1} \subseteq Σ_{1}^{*}$ and $L_{2}, L_{3} \subseteq Σ_{2}^{*}$ , and $Σ_{1} \cap Σ_{2} = \emptyset$ , any word $x$ that can be decomposed into a prefix from $Σ_{1}^{*}$ and a suffix from $Σ_{2}^{*}$ has a unique such decomposition.

From $x = y_{1} z_{1}$ and $x = y_{2} z_{2}$ , we must have $y_{1} = y_{2}$ (since they are both prefixes in $Σ_{1}^{*}$ ) and $z_{1} = z_{2}$ (since they are both suffixes in $Σ_{2}^{*}$ ).

Let $y = y_{1} = y_{2}$ and $z = z_{1} = z_{2}$ . Then we have $y \in L_{1}$ , and $z \in L_{2}$ and $z \in L_{3}$ . This implies $z \in L_{2} \cap L_{3}$ . Therefore, $x = yz \in L_{1} (L_{2} \cap L_{3})$ .

Since we have shown inclusion in both directions, the equality holds when the alphabets are disjoint.

Exercise 2.12

Do there exist languages $L_{1}, L_{2}, L_{3}$ such that $L_{1} (L_{2} \cap L_{3})$ is finite and $L_{1} L_{2} \cap L_{1} L_{3}$ is infinite?

Solution

No, such languages do not exist. The property $L_{1} (L_{2} \cap L_{3}) \subseteq L_{1} L_{2} \cap L_{1} L_{3}$ always holds. If a set A is a subset of set B, and A is infinite, then B must also be infinite. Conversely, if B is finite, A must be finite. Therefore, if $L_{1} L_{2} \cap L_{1} L_{3}$ were infinite, its subset $L_{1} (L_{2} \cap L_{3})$ could not be finite.

Exercise 2.13

Prove or disprove: $({a}^{*} {b}^{*})^{*} = {a, b}^{*}$ .

Solution

This statement is true.

Let $L = {a}^{*} {b}^{*}$ . This language consists of all words that have zero or more a’s followed by zero or more b’s (e.g., $λ, a, b, aa, ab, bb, aaa, aab, abb, bbb, \dots$ ).

The left side is $L^{*}$ , which means concatenating zero or more words from $L$ . The right side is ${a, b}^{*}$ , which is the set of all possible words over the alphabet ${a, b}$ .

We need to show that any word in ${a, b}^{*}$ can be formed by concatenating words from $L$ .

Consider any word $w \in {a, b}^{*}$ . If $w$ contains no b’s, it’s of the form $a^{k}$ , which is in ${a}^{*} \subseteq L$ . So $w \in L^{*}$ . If $w$ contains no a’s, it’s of the form $b^{k}$ , which is in ${b}^{*} \subseteq L$ . So $w \in L^{*}$ .

If $w$ contains both a’s and b’s, it can be broken down into segments of a’s and b’s. For example, aaabbab. This can be written as $(aaa) (bb) (a) (b)$ . Each of these segments can be viewed as a word from $L$ :

$aaa \in {a}^{*} {b}^{*}$ (with $j = 0$ )

$bb \in {a}^{*} {b}^{*}$ (with $i = 0$ )

$a \in {a}^{*} {b}^{*}$

$b \in {a}^{*} {b}^{*}$

Therefore, any word in ${a, b}^{*}$ can be decomposed into a concatenation of words from $L$ .

Conversely, any word formed by concatenating words from $L$ will only contain a’s and b’s, so it will be in ${a, b}^{*}$ .

Thus, the equality holds.

Exercise 2.14

Let $h$ be a homomorphism from $Σ_{1}^{*}$ to $Σ_{2}^{*}$ . Prove by induction that for every word $x = x_{1} x_{2} \dots x_{m}$ , where $x_{i} \in Σ_{1}$ for $i = 1, \dots, m$ , $h (x) = h (x_{1}) h (x_{2}) \dots h (x_{m})$ .

Solution

Base Case: For $m = 0$ , $x = λ$ . By definition of homomorphism, $h (λ) = λ$ . The right side is an empty concatenation, which is also $λ$ . So the base case holds.

Inductive Hypothesis: Assume that for any word $x^{'}$ of length $k$ , $h (x^{'}) = h (x_{1}^{'}) \dots h (x_{k}^{'})$ .

Inductive Step: Consider a word $x$ of length $k + 1$ , so $x = x_{1} x_{2} \dots x_{k} x_{k + 1}$ . We can write $x = (x_{1} \dots x_{k}) x_{k + 1}$ . Let $u = x_{1} \dots x_{k}$ and $v = x_{k + 1}$ . By definition of homomorphism, $h (x) = h (uv) = h (u) h (v)$ . By the inductive hypothesis, $h (u) = h (x_{1}) \dots h (x_{k})$ . And $h (v) = h (x_{k + 1})$ (since $v$ is a single symbol, its image is just $h$ applied to it). Therefore, $h (x) = (h (x_{1}) \dots h (x_{k})) h (x_{k + 1}) = h (x_{1}) \dots h (x_{k}) h (x_{k + 1})$ .

This completes the induction.

Exercise 2.15

Define a homomorphism from ${0, 1, #}^{*}$ to $(Σ_{b oo l})^{*}$ that maps infinitely many different words from ${0, 1, #}^{*}$ to the same word from $(Σ_{b oo l})^{*}$ .

Solution

A homomorphism is defined by its action on individual symbols. To map infinitely many different words to the same word, we can map some symbols to the empty word $λ$ .

Let $h : {0, 1, #}^{*} \to {0, 1}^{*}$ be defined as: $h (0) = 0$ $h (1) = 1$ $h (#) = λ$

Then, any word containing one or more # symbols will map to the same word as one without them. For example: $h (01) = 01$ $h (0#1) = h (0) h (#) h (1) = 0 λ 1 = 01$ $h (0##1) = h (0) h (#) h (#) h (1) = 0 λλ 1 = 01$

Thus, all words of the form $0 #^{k} 1$ for $k \geq 0$ map to 01. This is an infinite set of words mapping to a single word.

Exercise 2.16

Define an injective homomorphism from $(Σ_{l o g i c})^{*}$ to $(Σ_{b oo l})^{*}$ to create a unique representation of Boolean formulas over $Σ_{b oo l}$ .

Solution

To ensure injectivity, each symbol from $Σ_{l o g i c}$ must map to a unique non-empty binary string, and the mapping must be prefix-free (no mapping of a symbol is a prefix of another symbol’s mapping) to allow unique decoding.

Let $Σ_{l o g i c} = {0, 1, x, (,), \land, \lor, \neg}$ . We can define $h : (Σ_{l o g i c})^{*} \to {0, 1}^{*}$ as follows: $h (0) = 000$ $h (1) = 001$ $h (x) = 010$ $h (() = 011$ $h ()) = 100$ $h (\land) = 101$ $h (\lor) = 110$ $h (\neg) = 111$

Each symbol is mapped to a unique 3-bit string. This ensures that any word in $(Σ_{l o g i c})^{*}$ will have a unique image in ${0, 1}^{*}$ . For example, the formula $(x_{1} \lor x_{7}) \land \neg (x_{12})$ would first be encoded into a word over $Σ_{l o g i c}$ (e.g., (x1 ∨ x111) ∧ ¬(x1100)), and then this homomorphism would map each symbol of that word to its 3-bit representation.

Exercise 2.17

Let $Σ_{1}$ and $Σ_{2}$ be two alphabets. Let $h$ be a homomorphism from $Σ_{1}^{*}$ to $Σ_{2}^{*}$ . For every language $L \subseteq Σ_{1}^{*}$ , we define $h (L) = {h (w) ∣ w \in L}$ . Let $L_{1}, L_{2} \subseteq Σ_{1}^{*}$ . Prove or disprove the following statement: $h (L_{1}) h (L_{2}) = h (L_{1} L_{2})$ .

Solution

The statement is true. We prove this by double inclusion.

$h (L_{1} L_{2}) \subseteq h (L_{1}) h (L_{2})$ : Let $y \in h (L_{1} L_{2})$ . By definition of $h (L_{1} L_{2})$ , there exists a word $x \in L_{1} L_{2}$ such that $y = h (x)$ . By definition of language concatenation, $x = w_{1} w_{2}$ for some $w_{1} \in L_{1}$ and $w_{2} \in L_{2}$ . Since $h$ is a homomorphism, $y = h (w_{1} w_{2}) = h (w_{1}) h (w_{2})$ . By definition of $h (L)$ , $h (w_{1}) \in h (L_{1})$ and $h (w_{2}) \in h (L_{2})$ . Therefore, $y \in h (L_{1}) h (L_{2})$ .

$h (L_{1}) h (L_{2}) \subseteq h (L_{1} L_{2})$ : Let $y \in h (L_{1}) h (L_{2})$ . By definition of language concatenation, $y = y_{1} y_{2}$ for some $y_{1} \in h (L_{1})$ and $y_{2} \in h (L_{2})$ . By definition of $h (L)$ , there exist $w_{1} \in L_{1}$ such that $y_{1} = h (w_{1})$ , and there exist $w_{2} \in L_{2}$ such that $y_{2} = h (w_{2})$ . Substituting these into the expression for $y$ : $y = h (w_{1}) h (w_{2})$ . Since $h$ is a homomorphism, $h (w_{1}) h (w_{2}) = h (w_{1} w_{2})$ . Since $w_{1} \in L_{1}$ and $w_{2} \in L_{2}$ , it follows that $w_{1} w_{2} \in L_{1} L_{2}$ . Therefore, $y = h (w_{1} w_{2}) \in h (L_{1} L_{2})$ .

Since we have shown inclusion in both directions, the equality holds.

Exercise 2.18

For the Traveling Salesman Problem (TSP) on a complete graph with $n > 2$ vertices, how many feasible solutions (Hamiltonian cycles) are there?

Solution

A Hamiltonian cycle visits every vertex exactly once and returns to the starting vertex.

Fix a starting vertex: Let’s pick an arbitrary vertex, say $v_{1}$ , as the starting and ending point.

Arrange the remaining vertices: The remaining $n - 1$ vertices can be arranged in $(n - 1)!$ different sequences. Each sequence forms a unique path starting from $v_{1}$ , visiting all other vertices, and returning to $v_{1}$ .

Account for cycle symmetry: For an undirected graph, a cycle can be traversed in two directions (e.g., $v_{1} \to v_{2} \to v_{3} \to v_{1}$ is the same cycle as $v_{1} \to v_{3} \to v_{2} \to v_{1}$ ). So, we divide by 2.

The total number of distinct Hamiltonian cycles is $\frac{( n - 1 )!}{2}$ .

Exercise 2.19

The minimum vertex cover problem (MIN-VC) is a minimization problem where one seeks a vertex cover with minimal cardinality for a given (undirected) graph G. (a) Determine the set of all vertex covers of the graph from Figure 2.5. (b) Give the formal description of MIN-VC. Specify the representation of inputs and outputs over the alphabet ${0, 1, #}$ .

Solution

(a) Assuming Figure 2.5 refers to a specific graph not provided here, I will describe the process. A vertex cover is a subset of vertices $C \subseteq V$ such that for every edge $(u, v) \in E$ , at least one of $u$ or $v$ is in $C$ . To find all vertex covers, one would systematically check subsets of vertices. For a minimal vertex cover, one would find all vertex covers and then select those with the smallest cardinality.

(b) Formal description of MIN-VC: $U = (Σ_{I}, Σ_{O}, L, M, cost, goal)$

$Σ_{I} = {0, 1, #}$ (input alphabet for graph representation).

$Σ_{O} = {0, 1, #}$ (output alphabet for vertex set representation).

$L \subseteq Σ_{I}^{*}$ : The language of words encoding valid undirected graphs. A word $x \in L$ encodes a graph $G = (V, E)$ . (e.g., Bin(n)# followed by the upper triangle of the adjacency matrix).

$M (x)$ : For an input $x$ encoding graph $G = (V, E)$ , $M (x)$ is the set of all words $s \in Σ_{O}^{*}$ such that $s$ encodes a subset of vertices $C \subseteq V$ and $C$ is a vertex cover for $G$ . (e.g., a subset $C$ could be encoded as a binary string of length $n$ where the $i$ -th bit is 1 if $v_{i} \in C$ , and 0 otherwise).

$cost (s, x)$ : For a solution $s$ (encoding vertex set $C$ ) and instance $x$ (encoding graph $G$ ), $cost (s, x) = ∣ C ∣$ (the cardinality of the vertex cover).

$goal = Minimum$ .

Exercise 2.24
Argue why $K (0^{n})$ is approximately $lo g_{2} (n) + c$ for some constant $c$ . What does this tell you about the “randomness” of the string $0^{n}$ ?
Solution
To generate the string $0^{n}$ , a program only needs to know the value of $n$ and then print 0 that many times. A simple program could look like:
function generate_zeros(n):
  for i from 1 to n:
    print('0')
The binary representation of the number $n$ requires approximately $lo g_{2} (n)$ bits. The rest of the program (the function definition, loop structure, print statement, etc.) has a fixed length, which we can denote as a constant $c$ . Therefore, the length of the shortest program to generate $0^{n}$ is roughly $lo g_{2} (n) + c$ .

This tells us that $0^{n}$ is not random. Its Kolmogorov complexity $K (0^{n})$ is significantly smaller than its length $∣ 0^{n} ∣ = n$ (for large $n$ ). A random string, by definition, should have a Kolmogorov complexity approximately equal to its length. The string $0^{n}$ is highly compressible and predictable.

Exercise 2.27

Prove that for any integers $n > 0$ and $0 \leq i < n$ , there are at least $2^{n} - 2^{n - i}$ distinct words $x$ of length $n$ such that $K (x) \geq n - i$ .

Solution

This is a counting argument.

Total words of length $n$ : There are $2^{n}$ distinct binary words of length $n$ .

Words with low Kolmogorov complexity: Consider all programs (binary strings) whose length is less than $n - i$ . The number of binary strings of length $k$ is $2^{k}$ . The total number of programs with length less than $n - i$ is: $\sum_{k = 0}^{n - i - 1} 2^{k} = (2^{n - i - 1 + 1} - 1) = 2^{n - i} - 1$ Each of these $2^{n - i} - 1$ programs can generate at most one distinct word. Therefore, there are at most $2^{n - i} - 1$ words whose Kolmogorov complexity is less than $n - i$ .

Words with high Kolmogorov complexity: The number of words $x$ of length $n$ such that $K (x) \geq n - i$ must be at least the total number of words of length $n$ minus the number of words with Kolmogorov complexity less than $n - i$ . So, the number of such words is at least $2^{n} - (2^{n - i} - 1) = 2^{n} - 2^{n - i} + 1$ . The question asks to prove at least $2^{n} - 2^{n - i}$ , which is a slightly looser but still correct bound.

Exercise 2.28

Prove that there are infinitely many numbers $m$ such that $K (m) \geq ⌈ lo g_{2} (m + 1)⌉ - 1$ .

Solution

This is a direct consequence of the counting argument used in Lemma 2.5. Let $L (m) = ⌈ lo g_{2} (m + 1)⌉$ be the length of the binary representation of $m$ . We know that there are $2^{k}$ numbers whose binary representation has length $k$ . The number of programs (binary strings) of length less than $k - 1$ is $\sum_{j = 0}^{k - 2} 2^{j} = 2^{k - 1} - 1$ .

Consider numbers $m$ such that $Bin (m)$ has length $k$ . There are $2^{k - 1}$ such numbers (from $2^{k - 1}$ to $2^{k} - 1$ ). If all these $2^{k - 1}$ numbers had Kolmogorov complexity less than $k - 1$ , it would mean we could generate $2^{k - 1}$ distinct numbers using only $2^{k - 1} - 1$ programs, which is a contradiction.

Therefore, for every length $k$ , there must be at least one number $m$ such that $K (m) \geq k - 1$ . Since $k = ⌈ lo g_{2} (m + 1)⌉$ , this means $K (m) \geq ⌈ lo g_{2} (m + 1)⌉ - 1$ . Since there are infinitely many possible lengths $k$ , there are infinitely many such numbers $m$ .

Key Takeaways

Formal Language Theory is Foundational: Alphabets, words, and languages provide the precise mathematical framework for describing all computational objects and problems.
Encoding is Universal: Any object relevant to computation can be uniquely encoded as a word over a finite alphabet, enabling universal processing by algorithms.
Problem Types: Algorithmic tasks are formally categorized into decision, optimization, relation, generation, and enumeration problems, each with specific objectives.
Kolmogorov Complexity: This concept offers a robust, machine-independent measure of a word’s intrinsic information content, defining randomness as incompressibility. It’s a powerful tool for theoretical proofs.
Limits of Computation: Even at this foundational level, we begin to see hints of the inherent limits of computation, both in terms of what can be compressed and what can be efficiently solved.

CS Notes

Explorer

Chapter 2 - Alphabets, Words, Languages, and Problem Representation

2.1 Aims

2.2 Alphabets, Words, and Languages

2.2.1 Alphabets: The Building Blocks

Definition 2.1 (Alphabet)

2.2.2 Words: Sequences of Symbols

Definition 2.2 (Word)

2.2.3 Representing Objects as Words (Encoding)

Representing Numbers

Representing Graphs

Representing Logical Formulas

2.2.4 Operations on Words

Definition 2.3 (Concatenation)

Definition 2.4 (Reversal)

Definition 2.5 (Iteration)

Definition 2.6 (Subwords)

Definition 2.7 (Symbol Count)

2.2.5 Languages: Sets of Words

Definition 2.9 (Language)

2.2.6 Canonical Order and Homomorphisms

Definition 2.8 (Canonical Order)

Definition 2.10 (Homomorphism)

2.3 Algorithmic Problems

2.3.1 Decision Problems

Definition 2.11 (Decision Problem)

2.3.2 Optimization Problems

Definition 2.14 (Optimization Problem)

2.3.3 Relation Problems

Definition 2.13 (Relation Problem)

2.3.4 Generation and Enumeration Problems

Definition 2.15 (Generation Problem)

Definition 2.16 (Enumeration Problem)

2.4 Kolmogorov Complexity

Definition 2.17 (Kolmogorov Complexity)

Definition 2.19 (Randomness)

Application: Lower Bounds for Prime Numbers (Weaker Prime Number Theorem)

2.5 Summary and Outlook

2.6 Exercises

Key Takeaways

Table of Contents

Graph View

Backlinks