08 Proving Non-Regularity

We continue our exploration of finite automata, focusing on the crucial topic of proving non-existence. We’ve established that finite automata can recognize a certain class of languages, the regular languages. But what are their limits?

We have three primary techniques to prove that a language is not regular. All three are indirect proofs by contradiction. They all start with the same premise: “Assume the language is regular.” This assumption implies the existence of a finite automaton, and it’s the finiteness of this automaton that we will exploit to derive a contradiction.

Recap: Lemma 3.3

Our first technique, which we covered in detail, was Lemma 3.3. Let’s quickly recap its core idea.

Lemma 3.3: If a language is regular, then there exists a DFA for it. If two different words, $x$ and $y$ , drive this DFA to the same state $p$ , then for any suffix $z$ , the words $x z$ and $yz$ must also end in the same state. Consequently, either both $x z$ and $yz$ are in the language, or neither is.

x z \in L (A) ⟺ yz \in L (A)

We used this by finding a set of words larger than the number of states, guaranteeing a collision (two words landing in the same state), and then choosing a “killer suffix” $z$ that made one word valid and the other invalid, leading to a contradiction.

The Pumping Lemma for Regular Languages

Today, we introduce the second, and perhaps most famous, technique: the Pumping Lemma. The underlying principle is the same, exploiting the finite memory of the automaton, but the approach is slightly different. Instead of looking at two different words, we look at one single, very long word.

Lemma 3.4 (Pumping Lemma for Regular Languages): Let $L \subseteq Σ^{*}$ be a regular language. Then there exists a constant $n_{0} \in N$ (the “pumping length”) such that every word $w \in Σ^{*}$ with $∣ w ∣ \geq n_{0}$ can be decomposed into three parts, $w = y x z$ , satisfying three conditions:

The prefix and the pumpable part are short: $∣ y x ∣ \leq n_{0}$ .
The pumpable part is not empty: $∣ x ∣ \geq 1$ .
The word can be “pumped”: For all $k \in N_{0}$ , the word $y x^{k} z$ is either always in $L$ or never in $L$ . Formally: ${y x^{k} z ∣ k \in N_{0}} \subseteq L or {y x^{k} z ∣ k \in N_{0}} \cap L = \emptyset$

The Proof Idea

The proof is beautifully intuitive.

Assume $L$ is regular. This means there exists a DFA $A$ with a finite number of states. Let’s set the pumping length $n_{0}$ to be the number of states in this automaton, $n_{0} = ∣ Q ∣$ .
Take a long word. Choose any word $w$ with length $∣ w ∣ \geq n_{0}$ .
The Pigeonhole Principle again. As the automaton processes the first $n_{0}$ symbols of $w$ , it passes through $n_{0} + 1$ configurations (the start configuration plus one for each symbol read). Since there are only $n_{0}$ states, it must visit at least one state twice within this prefix.
Finding the loop. This repeated state creates a loop in the computation path.
Decomposition. We can now decompose the word $w$ into $y x z$ :
- $y$ is the prefix that takes the automaton from the start state to the first time it enters the repeated state.
- $x$ is the substring that takes the automaton around the loop and back to the repeated state.
- $z$ is the rest of the word.
Checking the conditions:
- (i) $∣ y x ∣ \leq n_{0}$ : The loop must occur within the first $n_{0}$ symbols, so this holds.
- (ii) $∣ x ∣ \geq 1$ : A loop requires at least one transition, so at least one symbol is read.
- (iii) Pumping: We can traverse the $x$ loop zero times (skipping it), once (the original word), twice, or any number of times. Since the rest of the computation ( $z$ ) starts from the same state in every case, the final outcome (accept or reject) must be the same for all pumped versions of the word.

Applying the Pumping Lemma: The Game

Using the Pumping Lemma to prove a language is not regular is like playing a game against an adversary.

The Adversary’s Move: The Pumping Lemma states that if $L$ is regular, there exists a pumping length $n_{0}$ . To prove it’s not regular, we must show that for any possible $n_{0}$ the adversary gives us, we can win.
Our Move: The lemma says for all words $w$ with $∣ w ∣ \geq n_{0}$ . We only need to find one clever counterexample word $w$ (that depends on $n_{0}$ ) to break the lemma.
The Adversary’s Move: The lemma says there exists a decomposition $w = y x z$ . We must show that for all possible decompositions that satisfy conditions (i) and (ii), we can find a contradiction.
Our Move: We show that for any such decomposition, condition (iii) fails. We find a pumping factor $k$ (often $k = 0$ or $k = 2$ ) such that $y x^{k} z$ has a different membership status in $L$ than the original word $w$ .

Example 1: $L = {a^{n} b^{n} ∣ n \in N}$ is not regular

Proof by Contradiction:

Assume $L$ is regular. By the Pumping Lemma, there exists a pumping length $n_{0}$ .
We choose a word. We need a word $w \in L$ with $∣ w ∣ \geq n_{0}$ . A strategic choice is $w = a^{n_{0}} b^{n_{0}}$ .
The adversary gives a decomposition. The lemma guarantees that $w$ can be split into $y x z$ such that:
- (i) $∣ y x ∣ \leq n_{0}$
- (ii) $∣ x ∣ \geq 1$
We find the contradiction. From condition (i), we know that the substring $y x$ must be part of the initial block of $a$ ‘s. This means $y$ and $x$ consist only of $a$ ‘s. From condition (ii), we know $x$ contains at least one $a$ . So, $x = a^{l}$ for some $l$ where $1 \leq l \leq n_{0}$ .
Now we pump. Let’s choose $k = 2$ . The new word is $w^{'} = y x^{2} z$ . $w^{'} = y xx z = a^{n_{0} + l} b^{n_{0}}$ Since $l \geq 1$ , the number of $a$ ‘s is not equal to the number of $b$ ‘s. Therefore, $w^{'} \in / L$ .
This is a contradiction. The original word $w$ was in $L$ , but the pumped word $w^{'}$ is not. This violates condition (iii) of the Pumping Lemma.
Conclusion: Our initial assumption was wrong. $L$ is not regular.

Example 2: $L = {0^{n^{2}} ∣ n \in N}$ is not regular

Proof by Contradiction:

Assume $L$ is regular. There exists a pumping length $n_{0}$ .
We choose a word. Let’s pick $w = 0^{n_{0}^{2}}$ . This word is in $L$ and $∣ w ∣ \geq n_{0}$ .
The adversary gives a decomposition $w = y x z$ where $∣ y x ∣ \leq n_{0}$ and $∣ x ∣ \geq 1$ .
We find the contradiction. The substring $x$ must be a block of zeros, so $x = 0^{l}$ . From the conditions, we know $1 \leq l \leq n_{0}$ .
Let’s pump with $k = 2$ . The new word is $w^{'} = y x^{2} z = 0^{n_{0}^{2} + l}$ .
Is the length of this new word a perfect square?
- The current length is $n_{0}^{2} + l$ .
- The next perfect square after $n_{0}^{2}$ is $(n_{0} + 1)^{2} = n_{0}^{2} + 2 n_{0} + 1$ .
- We know $1 \leq l \leq n_{0}$ . Therefore, $n_{0}^{2} < n_{0}^{2} + l < (n_{0} + 1)^{2}$ .
- The length $n_{0}^{2} + l$ lies strictly between two consecutive perfect squares, so it cannot be a perfect square itself. Thus, $w^{'} \in / L$ .
This is a contradiction. $w \in L$ but $w^{'} \in / L$ .
Conclusion: $L$ is not regular.

A Third Technique: Kolmogorov Complexity

Our third method for proving non-regularity is perhaps the most elegant. It connects the finite memory of automata directly to the information content of strings.

Theorem 3.1: Let $L \subseteq {0, 1}^{*}$ be a regular language. For any prefix $x \in {0, 1}^{*}$ , consider the “suffix language” $L_{x} = {y \in {0, 1}^{*} ∣ x y \in L}$ . Then there exists a constant $c$ such that for all $x, y \in {0, 1}^{*}$ :

If y is the n -th word in L_{x}, then K (y) \leq ⌈ lo g_{2} (n + 1)⌉ + c

In simpler terms: for any regular language, the words that can follow a given prefix to form a valid string cannot be too complex. Their Kolmogorov complexity is bounded by the logarithm of their rank.

Proof Idea

If $L$ is regular, there is a DFA $A$ for it. For any prefix $x$ , the automaton ends up in some state $q = \hat{δ} (q_{0}, x)$ . The language $L_{x}$ is then the set of all words that drive the automaton from state $q$ to an accepting state. This means $L_{x}$ is itself a regular language, accepted by an automaton that is identical to $A$ but with $q$ as its start state.

We can write a program that finds the $n$ -th word $y$ in $L_{x}$ . This program needs:

The description of the automaton $A$ (constant size).
The description of the start state $q$ (constant size).
The integer $n$ (size $lo g n$ ).

The program simulates $A$ starting from state $q$ on all possible strings in canonical order, counting until it finds the $n$ -th one that is accepted. The length of this program gives us the upper bound on $K (y)$ .

Application: $L = {0^{n} 1^{n} ∣ n \in N}$ is not regular (again)

Proof by Contradiction:

Assume $L$ is regular. Then Theorem 3.1 holds.
Consider a class of prefix languages. For every $m \in N$ , let’s look at the prefix $x = 0^{m}$ . The corresponding suffix language is: $L_{0^{m}} = {y \in {0, 1}^{*} ∣ 0^{m} y \in L}$ By the definition of $L$ , the only word that can follow $0^{m}$ to form a valid string is $1^{m}$ . So, $L_{0^{m}} = {1^{m}}$ .
Find the first word. For each of these languages, the first word is $y = 1^{m}$ . ( $L_{0^{n}} = {1^{n}, 0 1^{m + 1}, 00 1^{m + 2}, \dots}$ ) So, for $n = 1$ , we can apply the theorem.
Apply the theorem. According to Theorem 3.1, there is a constant $c$ such that for all $m$ : $K (1^{m}) \leq ⌈ lo g_{2} (1 + 1)⌉ + c = 1 + c = c^{'}$
The Contradiction. This implies that all strings of the form $1^{m}$ have a constant, bounded Kolmogorov complexity. But this is absurd. The set ${1^{m} ∣ m \in N}$ is an infinite set of distinct strings. There can only be a finite number of programs of length less than or equal to $c^{'}$ . Therefore, there must be strings $1^{m}$ whose complexity is much larger than $c^{'}$ .
Conclusion: Our assumption that $L$ is regular leads to a contradiction. $L$ is not regular.

CS Notes

Explorer

08 Proving Non-Regularity

Recap: Lemma 3.3