We continue our exploration of finite automata, focusing on the crucial topic of proving non-existence. We’ve established that finite automata can recognize a certain class of languages, the regular languages. But what are their limits?

We have three primary techniques to prove that a language is not regular. All three are indirect proofs by contradiction. They all start with the same premise: “Assume the language is regular.” This assumption implies the existence of a finite automaton, and it’s the finiteness of this automaton that we will exploit to derive a contradiction.

Recap: Lemma 3.3

Our first technique, which we covered in detail, was Lemma 3.3. Let’s quickly recap its core idea.

Lemma 3.3: If a language is regular, then there exists a DFA for it. If two different words, and , drive this DFA to the same state , then for any suffix , the words and must also end in the same state. Consequently, either both and are in the language, or neither is.

We used this by finding a set of words larger than the number of states, guaranteeing a collision (two words landing in the same state), and then choosing a “killer suffix” that made one word valid and the other invalid, leading to a contradiction.

The Pumping Lemma for Regular Languages

Today, we introduce the second, and perhaps most famous, technique: the Pumping Lemma. The underlying principle is the same, exploiting the finite memory of the automaton, but the approach is slightly different. Instead of looking at two different words, we look at one single, very long word.

Lemma 3.4 (Pumping Lemma for Regular Languages): Let be a regular language. Then there exists a constant (the “pumping length”) such that every word with can be decomposed into three parts, , satisfying three conditions:

  1. The prefix and the pumpable part are short: .
  2. The pumpable part is not empty: .
  3. The word can be “pumped”: For all , the word is either always in or never in . Formally:

The Proof Idea

The proof is beautifully intuitive.

  1. Assume is regular. This means there exists a DFA with a finite number of states. Let’s set the pumping length to be the number of states in this automaton, .
  2. Take a long word. Choose any word with length .
  3. The Pigeonhole Principle again. As the automaton processes the first symbols of , it passes through configurations (the start configuration plus one for each symbol read). Since there are only states, it must visit at least one state twice within this prefix.
  4. Finding the loop. This repeated state creates a loop in the computation path.
  5. Decomposition. We can now decompose the word into :
    • is the prefix that takes the automaton from the start state to the first time it enters the repeated state.
    • is the substring that takes the automaton around the loop and back to the repeated state.
    • is the rest of the word.
  6. Checking the conditions:
    • (i) : The loop must occur within the first symbols, so this holds.
    • (ii) : A loop requires at least one transition, so at least one symbol is read.
    • (iii) Pumping: We can traverse the loop zero times (skipping it), once (the original word), twice, or any number of times. Since the rest of the computation () starts from the same state in every case, the final outcome (accept or reject) must be the same for all pumped versions of the word.

Applying the Pumping Lemma: The Game

Using the Pumping Lemma to prove a language is not regular is like playing a game against an adversary.

  1. The Adversary’s Move: The Pumping Lemma states that if is regular, there exists a pumping length . To prove it’s not regular, we must show that for any possible the adversary gives us, we can win.
  2. Our Move: The lemma says for all words with . We only need to find one clever counterexample word (that depends on ) to break the lemma.
  3. The Adversary’s Move: The lemma says there exists a decomposition . We must show that for all possible decompositions that satisfy conditions (i) and (ii), we can find a contradiction.
  4. Our Move: We show that for any such decomposition, condition (iii) fails. We find a pumping factor (often or ) such that has a different membership status in than the original word .

Example 1: is not regular

Proof by Contradiction:

  1. Assume is regular. By the Pumping Lemma, there exists a pumping length .
  2. We choose a word. We need a word with . A strategic choice is .
  3. The adversary gives a decomposition. The lemma guarantees that can be split into such that:
    • (i)
    • (ii)
  4. We find the contradiction. From condition (i), we know that the substring must be part of the initial block of ‘s. This means and consist only of ‘s. From condition (ii), we know contains at least one . So, for some where .
  5. Now we pump. Let’s choose . The new word is . Since , the number of ‘s is not equal to the number of ‘s. Therefore, .
  6. This is a contradiction. The original word was in , but the pumped word is not. This violates condition (iii) of the Pumping Lemma.
  7. Conclusion: Our initial assumption was wrong. is not regular.

Example 2: is not regular

Proof by Contradiction:

  1. Assume is regular. There exists a pumping length .
  2. We choose a word. Let’s pick . This word is in and .
  3. The adversary gives a decomposition where and .
  4. We find the contradiction. The substring must be a block of zeros, so . From the conditions, we know .
  5. Let’s pump with . The new word is .
  6. Is the length of this new word a perfect square?
    • The current length is .
    • The next perfect square after is .
    • We know . Therefore, .
    • The length lies strictly between two consecutive perfect squares, so it cannot be a perfect square itself. Thus, .
  7. This is a contradiction. but .
  8. Conclusion: is not regular.

A Third Technique: Kolmogorov Complexity

Our third method for proving non-regularity is perhaps the most elegant. It connects the finite memory of automata directly to the information content of strings.

Theorem 3.1: Let be a regular language. For any prefix , consider the “suffix language” . Then there exists a constant such that for all :

In simpler terms: for any regular language, the words that can follow a given prefix to form a valid string cannot be too complex. Their Kolmogorov complexity is bounded by the logarithm of their rank.

Proof Idea

If is regular, there is a DFA for it. For any prefix , the automaton ends up in some state . The language is then the set of all words that drive the automaton from state to an accepting state. This means is itself a regular language, accepted by an automaton that is identical to but with as its start state.

We can write a program that finds the -th word in . This program needs:

  1. The description of the automaton (constant size).
  2. The description of the start state (constant size).
  3. The integer (size ).

The program simulates starting from state on all possible strings in canonical order, counting until it finds the -th one that is accepted. The length of this program gives us the upper bound on .

Application: is not regular (again)

Proof by Contradiction:

  1. Assume is regular. Then Theorem 3.1 holds.
  2. Consider a class of prefix languages. For every , let’s look at the prefix . The corresponding suffix language is: By the definition of , the only word that can follow to form a valid string is . So, .
  3. Find the first word. For each of these languages, the first word is . () So, for , we can apply the theorem.
  4. Apply the theorem. According to Theorem 3.1, there is a constant such that for all :
  5. The Contradiction. This implies that all strings of the form have a constant, bounded Kolmogorov complexity. But this is absurd. The set is an infinite set of distinct strings. There can only be a finite number of programs of length less than or equal to . Therefore, there must be strings whose complexity is much larger than .
  6. Conclusion: Our assumption that is regular leads to a contradiction. is not regular.