[next] [prev] [prev-tail] [tail] [up]

7.5  Uniform Families of Circuits and Sequential Computations

      From Sequential Time to Circuit Size
      A Modified Version of M
      A Circuit cn for Simulating M
      The Subcircuit MOVE i
      A Uniform Circuits Constructor
      From Circuits Size to Sequential Time
      U_FNC, U_NC, and NC
      Sequential Space and Parallel Time

The size of circuits is a major resource for parallel computations, as is time for sequential computations. The following theorem shows that these two types of resources are polynomially related.

Notation     In what follows DTIME _F (T(n)) will denote the class of functions computable by O(T(n)) time-bounded, deterministic Turing transducers. The class of functions with size complexity SIZE _F (Z(n)) will be denoted O(Z(n)). The class of languages whose characteristic functions are in SIZE _F (Z(n)) will be denoted SIZE (Z(n)) . U_SIZE _F (Z(n)) will denote the class of functions computable by uniform families of circuits of size complexity O(Z(n)). The class of languages whose characteristic functions are in U_SIZE _F (Z(n)) will be denoted U_SIZE (Z(n)) . U_DEPTH _F (D(n)) will denote the class of functions computable by uniform families of circuits of depth complexity O(D(n)), and the class of languages whose characteristic functions are in U_DEPTH _F (D(n)) will be denoted U_DEPTH (D(n)) . U_SIZE _DEPTH _F (Z(n), D(n)) will denote the class of functions computable by uniform families of circuits with simultaneous size complexity Z(n) and depth complexity D(n).

Theorem 7.5.1    If log T(n) is fully space-constructible, then

  U  DTIME _F (T d(n))   =    U  U_SIZE_F (Td(n))
d>=0                     d>=0

The proof of the theorem is implied from the two lemmas below.

 From Sequential Time to Circuit Size

The proof of the first lemma consists of unrolling the hardware of deterministic Turing transducers.

Lemma 7.5.1    If log T(n) is fully space-constructible, then

 DTIME _F (T(n))   (_   U_SIZE_F (T2(n)).

Proof     Consider any T(n) time-bounded, deterministic Turing transducer M = <Q, S, G, D, d, q0, B, F>, where log T(n) is fully space-constructible. With no loss of generality assume that S = {0, 1}. Let m denote the number of auxiliary work tapes of M.

 A Modified Version of M

Assume that D does not contain the symbols a and b. Modify M in the following way.

  1.  Modify each transition rule that provides no output to a rule that provides the output b.
  2.  Remove the transition rules that originate at the accepting states, convert the accepting states into nonaccepting states, add a new nonaccepting state, and add new transition rules that force M to go from the old accepting states to the new state while writing the symbol a. Call the new state an a state.
  3.  For each state q, input symbol c, and auxiliary work-tape symbols b1, . . . , bm on which d(q, c, b1, . . . , bm) is undefined, add the transition rule (q, c, b1, . . . , bm, q, 0, b1, 0, . . . , bm, 0, a) to d. a is assumed to equal a if q is the a state, and a is assumed to equal b if q is not the a state.
The modified M is a deterministic Turing transducer, which on each input has a computation of an unbounded number of moves. On an input on which the original M has i moves, the modified M enters an infinite loop in the i + 1st move. In each move the modified M writes one symbol onto the output tape. The output of the modified M in the i + 1st, i + 2 nd, . . . moves is a if and only if the input is accepted by the original M. Moreover, the output of the original M can be obtained from the string that the modified M writes on the output tape, by removing all the symbols a and all the symbols b.

 A Circuit cn for Simulating M

A circuit cn of the following form can simulate the original M on inputs of length n, by simulating the first t = 2 |~ log (T(n)+1) ~| moves of the modified M on the given input.

The simulation of exactly t = 2 |~ log (T(n)+1) ~| moves of (the modified) M, allows cn to generate outputs of identical length t for all the inputs of length n. Such a uniformity in the length of the outputs is needed because of the circuits' rigidity in the length of their outputs.

The choice of t = 2 |~ log (T(n)+1) ~| instead of T(n) + 1 for the number of moves of M, is made to allow the value to be calculated just by marking a space of size O(log T(n)).

cn assumes some fixed binary representation for the set S  U G  U D  U {a, b, ¢, $, - 1, 0, + 1, PICT, . . . , PICT}  U Q. The elements of the set can be represented by binary strings of identical length k. PICT, . . . , PICT are assumed to be new symbols corresponding to the heads of M.

cn consists of t + 2 subcircuits, referred to as IN, MOVE1, . . . , MOVE t, and OUT, respectively (see Figure 7.5.1).


   
[PICT]
Figure 7.5.1 A circuit cn that computes the function computable by a deterministic Turing transducer M on instances of length n.

IN is a subcircuit which derives the initial (i.e., 0th) configuration

  |            t   t     t   t
(cq0a1 ... an$,B q0B ,...,B q0B ,e)

of M on the given input a1 . . . an. IN uses the values a1, . . . , an of the input nodes x1, . . . , xn; the values of some constant nodes 0; and the values of some constant nodes 1 for obtaining the desired (representation of the) configuration.

The subcircuit MOVEi, 1 <= i <= t, derives the ith configuration

 (ca1... q'... an$,b'11... q'... b'12t,...,b'm 1... q'... b'm 2t,b'm+11 ... b'm+1 i)

of M from the i - 1st configuration

  |
(ca1 ... q... an$,b11 ... q ... b12t,...,bm1 ... q ... bm 2t,bm+1 1... bm+1 i- 1)

of M.

OUT is a subcircuit that extracts the (encoding of the) output b1 . . . bt that M has in the tth configuration. OUT does so by eliminating the symbols that are not in D  U {a, b}, for example, by using AND gates.

 The Subcircuit MOVE i

MOVE i uses components PREFIX _FINDER and SUFFIX _FINDER for determining the transition rule (q, a, b1, . . . , bm, p, d0, c1, d1, . . . , cm, dm, r) that M uses in its ith move (see Figure 7.5.2).


   
[PICT]
Figure 7.5.2 Subcircuit MOVE i for simulating a transition of a deterministic Turing transducer between two configurations.

PREFIX _FINDER determines the prefix (q, a, b1, . . . , bm) of the transition rule from the i - 1st configuration of M. SUFFIX _FINDER determines the suffix (p, d0, c1, d1, . . . , cm, dm, r) of the transition rule from (q, a, b1, . . . , bm). MOVE i uses a component MODIFIER for carrying out the necessary modifications to the i - 1st configuration of M.

PREFIX _FINDER has a component FINDER i, 0 <= i <= m, corresponding to each of the nonoutput tapes of M (see Figure 7.5.3).


   
[PICT]
Figure 7.5.3 A subcircuit PREFIX _FINDER for determining a transition rule of a Turing transducer.

FINDERi determines the symbol that is under the head of the ith tape of M. FINDERi employs a subcircuit LOCAL _FINDER i for each pair of consecutive symbols in the portion of the configuration that corresponds to the ith tape of M. LOCAL _FINDERi outputs (the representation of) the symbol a if its input corresponds to a pair of the form PICTa. Otherwise, the subcircuit LOCAL _FINDER i outputs just 0's. The output of each LOCAL _FINDER i is determined by a table look-up circuit. The outputs of all the LOCAL _FINDER i's are OR ed to obtain the desired output of FINDERi.

SUFFIX _FINDER on input (q, a, b1, . . . , bm) employs a table look-up approach to find (p, d0, c1, d1, . . . , cm, dm, r).

MODIFIER contains one component TAPE _MODIFIER i for each of the nonoutput tapes i of the Turing transducer M, 0 <= i <= m (see Figure 7.5.4).


   
[PICT]
Figure 7.5.4 A subcircuit MODIFIER for modifying a configuration of a Turing transducer.

TAPE _MODIFIERi contains one subcircuit SUBTAPE _MODIFIER for each location in the constructed configuration of the Turing transducer M. A SUBTAPE _MODIFIER that corresponds to location j receives the three symbols U, Y, and V as inputs at locations j - 1, j, and j + 1 in the configuration of M that is being modified. (The only exception occurs when the jth location is a boundary location. In such a case the SUBTAPE _MODIFIER receives only two input values.) In addition, the SUBTAPE _MODIFIER gets as input the modifications (ci and di) that are to be made in the ith tape of M. The SUBTAPE _MODIFIER outputs the symbol Y ' for the jth location in the constructed configuration of M.

 A Uniform Circuits Constructor

IN has size 0. Each FINDER i contains O(T(n)) subcircuits LOCAL _FINDER i, and a constant number of subcircuits OR. Each LOCAL _FINDER i has constant size. Each subcircuit OR has size O(T(n)). Hence, PREFIX _FINDER has size O(T(n)). SUFFIX _ FINDER has constant size, and TAPE _MODIFIER has size O(T(n)). Consequently, cn has size O(T2(n)).

An O(log T(n)) space-bounded, deterministic Turing transducer X can be constructed, to compute { (1n, cn) | n >= 0 } in a brute-force manner.  ***

Example 7.5.1    Let M be the one auxiliary-work-tape deterministic Turing transducer in Figure 7.5.5(a). M has time complexity T(n) = n + 1. For the purpose of the example take M as it is, without modifications. Using the terminology in the proof of Lemma 7.5.1, Q = {q0, q1, . . . , q4}, S = D = {0, 1}, G = {0, 1, B}, m = 1, and k = 4. Choose the following binary representation E: E(0) = 0000, E(1) = 0001, E(¢) = 0010, E($) = 0011, E(B) = 0100, E(a) = 0101, E(b) = 0110, E(PICT) = 0111, E(PICT) = 1000, E(q0) = 1001, E(q1) = 1010, E(q2) = 1011, E(q3) = 1100, E(q4) = 1101, E(-1) = 1110, E(+1) = 1111. Choose n = 3.


   
[PICT]
Figure 7.5.5 (a) A Turing transducer. (b) Corresponding subcircuit IN. (c) Corresponding subcircuit PREFIX _FINDER. (d) Corresponding subcircuit SUFFIX _FINDER.

In such a case, t = 4. The subcircuit IN is given in Figure 7.5.5(b), the subcircuit PREFIX _FINDER is given in Figure 7.5.5(c), and the subcircuit SUFFIX _FINDER is given in Figure 7.5.5(d).  ***

 From Circuits Size to Sequential Time

The previous lemma deals with applying parallelism for simulating sequential computations. The following lemma deals with the simulation of parallel computations by sequential computations.

Lemma 7.5.2    U_SIZE _F (Z(n))  (_  U d>=0DTIME _F (Zd(n)).

Proof     Consider any function Z(n), and any uniform family C = (c0, c1, c2, . . . ) of circuits of size complexity Z(n). Let X be an O(log Z(n)) space-bounded, deterministic Turing transducer that computes the function { (1n, cn) | n >= 0 }. A deterministic Turing transducer M can compute the same function as C in the following manner.

Given an input a1 . . . an, M employs X to determine the representation of the circuit cn. The representation can be found in 2O(log Z(n)) = ZO(1)(n) time because X is O(log Z(n)) space-bounded (see Theorem 5.5.1). Moreover, the representation has length O(Z(n)log Z(n)) because cn has at most Z(n) gates, and each gate (g, t, gL, gR) has a representation of length O(log Z(n)).

Having the representation of cn, the Turing transducer M evaluates the output of each node in cn. M does so by repeatedly scanning the representation of cn for quadruples (g, t, gL, gR), that correspond to nodes gL and gR, whose output values are already known. Having found such a quadruple (g, t, gL, gR), the Turing transducer M evaluates and also records the output value of g. After at most Z(n) iterations, M determines the output values of all the nodes in cn.

Finally, M determines which nodes of cn are the output nodes, and writes out their values.  ***

By Theorem 7.5.1, the time of sequential computations and the size of uniform families of circuits are polynomially related.

Corollary 7.5.1    A problem is solvable in polynomial time if and only if it is solvable by a uniform family of circuits of polynomial size complexity.

 U_FNC, U_NC, and NC

Sequential computations are considered feasible only if they are polynomially time- bounded. Similarly, families of circuits are considered feasible only if they are polynomially size-bounded. As a result, parallelism does not seem to have major influence on problems that are not solvable in polynomial time. On the other hand, for those problems that are solvable in polynomial time, parallelism is of central importance when it can significantly increase computing speed. One such class of problems is that which can be solved by uniform families of circuits, simultaneously having polynomial size complexity and polylog (i.e., O(login) for some i >= 0) depth complexity. This class of problems is denoted U_FNC .

The subclass of U_FNC, which is obtained by restricting the depth complexity of the families of circuits to O(login), is denoted U_FNC i. The subclass of decision problems in U_FNC is denoted U_NC . The subclass of decision problems in U_FNC i is denoted U_NCi.

FNC denotes the class of problems solvable by (not necessarily uniform) families of circuits that simultaneously, have polynomial size complexity and polylog depth complexity. The subclass of decision problems in FNC is denoted NC . The subclass of FNC, obtained by restricting the families of circuits to depth complexity O(login), is denoted FNC i. NC i denotes the class of decision problems in FNC i.

For nonuniform families of circuits the following contrasting theorem holds.

Theorem 7.5.2    NC1 contains undecidable problems.

Proof     Every unary language L over the alphabet {1} can be decided by a family C = (c0, c1, c2, . . . ) of circuits of simultaneous polynomial size complexity and logarithmic depth complexity. Specifically, each cn in C is a table look-up circuit that outputs 1 on a given input a1 . . . an if and only if a1 . . . an = 1n and 1n is in L.

However, a proof by diagonalization implies that the membership problem is undecidable for the unary language { 1i | The Turing machine Mi does not accept the string 1i }.  ***

 Sequential Space and Parallel Time

By Corollary 7.5.1, the definitions above, and the following lemma, the hierarchy shown in Figure 7.5.6 holds.


   
[PICT]
Figure 7.5.6 A hierarchy of decision problems between NLOG and P.

Lemma 7.5.3    NLOG  (_ U_NC2.

Proof     Consider any S(n) = O(log n) space-bounded, nondeterministic Turing machine M = <Q, S, G, d, q0, B, F> with m auxiliary work tapes. With no loss of generality assume that S = {0, 1}. Let a tuple w = (q, i, a, u1, v1, . . . , um, vm) be called a partial configuration of M on input a1 . . . an, if M has a configuration (aqab, u1qv1, . . . , umqvm) with aab = ¢a1 . . . an$ and |a| = i. Let a partial configuration be called an initial partial configuration if it corresponds to an initial configuration. Let a partial configuration be called an accepting partial configuration if it corresponds to an accepting configuration.

Each partial configuration of M requires O(log n) space. The number k of partial configurations w1, . . . , wk that M has on the set of inputs of length n satisfies k = 2O(log n) = nO(1).

Say that M can directly reach partial configuration w' from partial configuration w if w and w' correspond to some configurations _O_w and _O_w' of M, respectively, such that _O_w  |- _O_w' . Say that M can reach partial configuration w' from partial configuration w if w and w' correspond to some configurations _O_w and _O_w' of M, respectively, such that _O_w  |- * _O_w' .

For the given n, the language L(M)  /~\ {0, 1}n is decidable by a circuit cn that consists of  |~ log k ~| + 2 subcircuits, namely, DIRECT, FINAL, and  |~ log k ~| copies of INDIRECT (Figure 7.5.7).


   
[PICT]
Figure 7.5.7 A circuit cn that corresponds to an O(log n) space-bounded, nondeterministic Turing machine.

The structure of cn relies on the observation that the Turing machine M accepts a given input a1 . . . an if and only if M has partial configurations w0, . . . , wt on input a1 . . . an, such that w0 is an initial partial configuration, wt is an accepting partial configuration, and M can directly reach wi from wi-1 for 1 <= i <= t.

DIRECT has a component CHECK i j for each possible pair (wi, wj) of distinct partial configurations of M on the inputs of length n. CHECK i j has the output 1 on a given input a1 . . . an if wi as well as wj are partial configurations of M on input a1 . . . an, and M can directly reach wj from wi. Otherwise, CHECKi j has the output 0.

The component CHECK i j is a table look-up circuit. Specifically, assume that CHECK i j corresponds to the partial configurations wi = (q, l, a, u1, v1, . . . , um, vm) and wj = (^q, ^
 l, â, û1, ^v1, . . . , ûm, ^vm). In such a case, CHECK i j is the constant node 0 when M cannot directly reach wj from wi. On the other hand, when M can directly reach wj from wi, then CHECKi j is a circuit that has the output 1 on input a1 . . . an if and only if the l + 1st symbol in ¢a1 . . . an$ is a and the ^l+ 1st symbol in ¢a1 . . . an$ is â.

Each copy of the subcircuit INDIRECT modifies the values of the "variables" x 1 2x1 3, . . . , x n n-1 in parallel, where the value of x i j is modified by a component called UPDATE i j. Upon reaching the rth INDIRECT the variable x i j holds 1 if and only if M can reach wj from wi in at most 2r moves (through partial configurations of M on the given input), 1 <= r <=  |~ log k ~| . Upon leaving the rth INDIRECT the variable x i j holds 1 if and only if M can reach wj from wi in at most 2r+1 moves. In particular, upon reaching the first INDIRECT, x i j holds the output of CHECK i j. However, upon leaving the last INDIRECT, x i j holds 1 if and only if M can reach wj from wi.

FINAL determines whether M can reach an accepting partial configuration from an initial partial configuration on the given input a1 . . . an, that is, whether x i j is equal to 1 for some initial partial configuration wi and some accepting partial configuration wj.

The subcircuit DIRECT has size O(k2) = nO(1) and constant depth. Each of the subcircuits FINAL and INDIRECT has size no greater than O(k2) = nO(1) and depth no greater than O(log k) = O(log n). As a result, the circuit cn has size of at most O(k2( |~ log k ~| + 2)) = nO(1), and depth of at most O(( |~ log k ~| + 2)log k) = O(log2n). 
***

The containment of DLOG in U_NC and the conjecture that U_NC is properly contained in P, suggest that the P-complete problems can not be solved efficiently by parallel programs. The following theorem provides a tool for detecting problems that can be solved efficiently by parallel programs (e.g., the problems in Exercise 5.1.8). Moreover, the proof of the theorem implies an approach for mechanically obtaining the parallel programs from corresponding nondeterministic sequential programs that solve the problems.

Notation     In what follows, NSPACE _F (S(n)) denotes the set of functions computable by O(S(n)) space-bounded, nondeterministic Turing transducers.

Theorem 7.5.3    NSPACE _F (log n)  (_ U_FNC 2.

Proof     Consider any Turing transducer M = <Q, S, G, D, d, q0, B, F> of space complexity S(n) = O(log n). Assume that M computes some function f. In addition, with no loss of generality assume that S = D = {0, 1}. From M, for each symbol a in S, a Turing machine Ma = <Qa, S, G, da, q0a, B, Fa> can be constructed to accept the language { 1i0x | The ith output symbol of M on input x is a }.

Specifically, on a given input 1i0x, Ma records the value of i in binary on an auxiliary work tape. Then Ma follows the computation of M on input x. During the simulated computation, Ma uses the stored value of i to find the ith symbol in the output of M, while ignoring the output itself. Ma accepts 1i0x if and only if M has an accepting computation on input x with a as the ith symbol in the output.

The function f is computable by a family C = (c0, c1, c2, . . . ) of circuits of the following form. Each cn provides an output y1 . . . y2S(n)+1 of length 2 . 2S(n) on input x1 . . . xn. Each substring y2j-1y2j of the output is equal to 00, 11, or 10, depending on whether the jth symbol in the output of M is 0, 1, or undefined, respectively. y2j-1 is obtained by negating the output of a circuit that simulates Ma for a = 0 on input 1j0x1 . . . xn. y2j is obtained by a circuit that simulates Ma for a = 1 on input 1j0x1 . . . xn.

The result then follows from Lemma 7.5.3 because Ma is a logspace-bounded, Turing machine for a = 0 and for a = 1.  ***

A proof similar to the one provided for the previous theorem can be used to show that NSPACE _F (S(n))  (_  U d>0 U_SIZE _DEPTH _F (2dS(n), S2(n)) for each fully space-constructible function S(n) >= log n. By this containment and a proof similar to that of Exercise 7.5.3, the space requirements of sequential computations and the time requirements of parallel computations are polynomially related.

[next] [prev] [prev-tail] [front] [up]