[next] [prev] [prev-tail] [tail] [up]

2.4 Limitations of Finite-Memory Programs

      A Pumping Lemma for Regular Languages
      Applications of the Pumping Lemma
      A Generalization to the Pumping Lemma

It can be intuitively argued that there are computations that finite-memory programs cannot carry out, because of the limitations imposed on the amount of memory the programs can use. For instance, it can be argued that { aⁿbⁿ | n 0 } is not recognizable by any finite-memory program. The reasoning here is that upon reaching the first b in a given input, the program must remember how many a's it read. Moreover, the argument continues that each finite-memory program has an upper bound on the number of values that it can record, whereas no such bound exists on the number of a's that the inputs can contain. As a result, one can conclude that each finite-memory program can recognize only a finite number of strings in the set { aⁿbⁿ | n 0 }.

The purposes of this section are to show that there are computations that cannot be carried out by finite-memory programs, and to provide formal tools for identifying such computations. The proofs rely on abstractions of the intuitive argument above. However, it should be mentioned that the problem of determining for any given language, whether the language is recognizable by a finite-memory program, can be shown to be undecidable (see Theorem 4.5.6). Therefore, no tool can be expected to provide an algorithm that decides the problem in its general form.

A Pumping Lemma for Regular Languages

The following theorem provides necessary conditions for a language to be decidable by a finite-memory program. The proof of the theorem relies on the observations that the finite-memory programs must repeat a state on long inputs, and that the subcomputations between the repetitions of the states can be pumped.

Theorem 2.4.1 (Pumping lemma for regular languages) Every regular language L has a number m for which the following conditions hold. If w is in L and |w| m, then w can be written as xyz, where xy^kz is in L for each k 0. Moreover, |xy| m, and |y| > 0.

Proof Consider any regular language L. Let M be a finite-state automaton that recognizes L. By Theorem 2.3.1 it can be assumed that M has no transition rules. Denote by m the number of states of M.

On input w = a₁ a_n from L the finite-state automaton M has a computation of the form

The computation goes through some sequence p₀, p₁, . . . , p_n of n + 1 states, where p₀ is the initial state of M and p_n is an accepting state of M. In each move of the computation exactly one input symbol is being read.

If the length n of the input is equal at least to the number m of states of M, then the computation consists of m or more moves and some state q must be repeated within the first m moves. That is, if n m then p_i = p_j for some i and j such that 0 i < j m. In such a case, take x = a₁ a_i, y = a_i+1 a_j, and z = a_j+1 a_n.

With such a decomposition xyz of w the above computation of M takes the form

During the computation the state q = p_i = p_j of M is repeated. The string x is consumed before reaching the state q that is repeated. The string y is consumed between the repetition of the state q. The string z is consumed after the repetition of state q.

Consequently, M also has an accepting computation of the form

for each k 0. That is, M has an accepting computation on xy^kz for each k 0, where M starts and ends consuming each y in state q.

The substring y that is consumed between the repetition of state q is not empty, because by assumption M has no transition rules.

Example 2.4.1 Let L be the regular language accepted by the finite-state automaton of Figure 2.4.1.

Figure 2.4.1

A finite-state automaton.

Using the terminology in the proof of the pumping lemma (Theorem 2.4.1), L has the constant m = 3.

On input w = ababaa, the finite-state automaton goes through the sequence q₀, q₁, q₀, q₁, q₀, q₁, q₂ of states. For such an input the pumping lemma provides the decomposition x = , y = ab, z = abaa; and the decomposition x = a, y = ba, z = baa. The first decomposition is due to the first repetition of state q₀; the second is a result of to the first repetition of state q₁.

For each string w of a minimum length 3, the pumping lemma implies a decomposition xyz in which the string y must be either ab or ba or ac. If y = ab, then x = and the repetition of q₀ is assumed. If y = ba, then x = a and the repetition of q₁ is assumed. If y = ac, then x = a and the repetition of q₁ is assumed.

Applications of the Pumping Lemma

For proving that a given language L is not regular, the pumping lemma implies the following schema of reduction to contradiction.

For the purpose of the proof assume that L is a regular language.
Let m denote the constant implied by the pumping lemma for L, under the assumption in (a) that L is regular.
Find a string w in L, whose length is at least m. Require that w implies a k, for each decomposition xyz of w, such that xy^kz is not in L. That is, find a w that implies, by using the pumping lemma, that a string not in L must, in fact, be there.
Use the contradiction in (c) to conclude that the pumping lemma does not apply for L.
Use the conclusion in (d) to imply that the assumption in (a), that L is regular, is false.

It should be emphasized that in the previous schema the pumping lemma implies only the existence of a constant m for the assumed regular language L, and the existence of a decomposition xyz for the chosen string w. This lemma does not provide any information about the specific values of m, x, y, and z besides the restriction that they satisfy the conditions |xy|

m and |y| > 0. The importance for the schema of the condition |xy|

m lies in allowing some limitation on the possible decompositions that are to be considered for the chosen w. The importance of the restriction |y| > 0 is in enabling a proper change in the pumped string.

Example 2.4.2 Consider the nonregular language L = { 0ⁿ1ⁿ | n 0 }. To prove that L is nonregular assume to the contrary that it is regular. From the assumption that L is regular deduce the existence of a fixed constant m that satisfies the conditions of the pumping lemma for L.

Choose the string w = 0^m1^m in L. By the pumping lemma, 0^m1^m has a decomposition of the form xyz, where |xy| m, |y| > 0, and xy^kz is in L for each k 0. That is, the decomposition must be of the form x = 0ⁱ, y = 0^j, and z = 0^m-i-j1^m for some i and j such that j > 0. (Note that the values of i, j, and m cannot be chosen arbitrarily.) Moreover, xy⁰z must be in L. However, xy⁰z = 0^m-j1^m cannot be in L because j > 0. It follows that the pumping lemma does not apply for L, consequently contradicting the assumption that L is regular.

Other choices of w can also be used to show that L is not regular. However, they might result in a more complex analysis. For instance, for w = 0^m-11^m-1 the pumping lemma provides three possible forms of decompositions:

x = 0ⁱ, y = 0^j, z = 0^m-i-j-11^m-1 for some j > 0.
x = 0^m-1-j, y = 0^j1, z = 1^m-2 for some j > 0.
x = 0^m-1, y = 1, z = 1^m-2.

In such a case, each of the three forms of decompositions must be shown to be inappropriate to conclude that the pumping lemma does not apply to w. For (a) the choice of k = 0 provides xy⁰z = 0^m-1-j1^m-1 not in L. For (b) the choice of k = 2 provides xy²z = 0^m-110^j1^m-1 not in L. For (c) the choice of c = 0 provides xy⁰z = 0^m-11^m-2 not in L.

Example 2.4.3 Consider the nonregular language L = { ^rev | is in {a, b}* }. To prove that L is not regular assume to the contrary that it is regular. Then deduce the existence of a fixed constant m that satisfies the conditions of the pumping lemma for L.

Choose w = a^mbba^m in L. By the pumping lemma, a^mbba^m = xyz for some x, y, and z such that |xy| m, |y| > 0 and xy^kz is in L for each k 0. That is, x = aⁱ, y = a^j, and z = a^m-i-jbba^m for some i and j such that j > 0. However, xy⁰z = a^m-jbba^m is not in L, therefore contradicting the assumption that L is regular.

It should be noted that not every choice for w implies the desired contradiction. For instance, consider the choice of a^2m for w. By the pumping lemma, a^2m has a decomposition xyz in which x = aⁱ, y = a^j, and z = a^2m-i-j for some i and j such that j > 0. With such a decomposition, xy^kz = a^2m+(k-1)j is not in L if and only if 2m + (k - 1)j is an odd integer. On the other hand, 2m + (k - 1)j is an odd integer if and only if k is an even number and j is an odd number. However, although k can arbitrarily be chosen to equal any value, such is not the case with j. Consequently, the choice of a^2m for w does not guarantee the desired contradiction.

A Generalization to the Pumping Lemma

The proof of the pumping lemma is based on the observation that a state is repeated in each computation on a "long" input, with a portion of the input being consumed between the repetition. The repetition of the state allows the pumping of the subcomputation between the repetition to obtain new accepting computations on different inputs. The proof of the pumping lemma with minor modifications also holds for the following more general theorem.

Theorem 2.4.2 For each relation R that is computable by a finite-state transducer, there exists a constant m that satisfies the following conditions. If (v, w) is in R and |v| + |w| m, then v can be written as x_vy_vz_v and w can be written as x_wy_wz_w, where (x_vy_v^kz_v, x_wy_w^kz_w) is in R for each k 0. Moreover, |x_vy_v| + |x_wy_w| m, and |y_v| + |y_w| > 0.

A schema, similar to the one that uses the pumping lemma for determining nonregular languages, can utilize Theorem 2.4.2 for determining relations that are not computable by finite-state transducers.

Example 2.4.4 The relation R = { (u, u^rev) | u is in {0, 1}* } is not computable by a finite-state transducer. If R were computable by a finite-state transducer, then there would be a constant m that satisfies the conditions of Theorem 2.4.2 for R. In such a case, since (0^m1^m, 1^m0^m) is in R, then u = 0^m1^m could be written as x_vy_vz_v and u^rev = 1^m0^m could be written as x_wy_wz_w, where x_v = 0^i_v, y_v = 0^j_v, z_v = 0^m-i_v-j_v1^m, x_w = 1^i_w, y_w = 1^j_w, z_w = 1^m-i_w-j_w0^m, and j_v + j_w > 0. Moreover, it would be implied that (x_vy_v⁰z_v, x_wy_w⁰z_w) = (0^m-j_v1^m, 1^m-j_w0^m) must also be in R, which is not the case.

[next] [prev] [prev-tail] [front] [up]