[next] [prev] [prev-tail] [tail] [up]

7.5 Uniform Families of Circuits and Sequential Computations

      From Sequential Time to Circuit Size
      A Modified Version of M
      A Circuit c_n for Simulating M
      The Subcircuit MOVE_i
      A Uniform Circuits Constructor
      From Circuits Size to Sequential Time
      U_FNC, U_NC, and NC
      Sequential Space and Parallel Time

The size of circuits is a major resource for parallel computations, as is time for sequential computations. The following theorem shows that these two types of resources are polynomially related.

Notation In what follows DTIME _F (T(n)) will denote the class of functions computable by O(T(n)) time-bounded, deterministic Turing transducers. The class of functions with size complexity SIZE _F (Z(n)) will be denoted O(Z(n)). The class of languages whose characteristic functions are in SIZE _F (Z(n)) will be denoted SIZE (Z(n)) . U_SIZE _F (Z(n)) will denote the class of functions computable by uniform families of circuits of size complexity O(Z(n)). The class of languages whose characteristic functions are in U_SIZE _F (Z(n)) will be denoted U_SIZE (Z(n)) . U_DEPTH _F (D(n)) will denote the class of functions computable by uniform families of circuits of depth complexity O(D(n)), and the class of languages whose characteristic functions are in U_DEPTH _F (D(n)) will be denoted U_DEPTH (D(n)) . U_SIZE _DEPTH _F (Z(n), D(n)) will denote the class of functions computable by uniform families of circuits with simultaneous size complexity Z(n) and depth complexity D(n).

Theorem 7.5.1 If log T(n) is fully space-constructible, then

The proof of the theorem is implied from the two lemmas below.

From Sequential Time to Circuit Size

The proof of the first lemma consists of unrolling the hardware of deterministic Turing transducers.

Lemma 7.5.1 If log T(n) is fully space-constructible, then

Proof Consider any T(n) time-bounded, deterministic Turing transducer M = <Q, , , , , q₀, B, F>, where log T(n) is fully space-constructible. With no loss of generality assume that = {0, 1}. Let m denote the number of auxiliary work tapes of M.

A Modified Version of M

Assume that does not contain the symbols a and b. Modify M in the following way.

Modify each transition rule that provides no output to a rule that provides the output b.
Remove the transition rules that originate at the accepting states, convert the accepting states into nonaccepting states, add a new nonaccepting state, and add new transition rules that force M to go from the old accepting states to the new state while writing the symbol a. Call the new state an a state.
For each state q, input symbol c, and auxiliary work-tape symbols b₁, . . . , b_m on which (q, c, b₁, . . . , b_m) is undefined, add the transition rule (q, c, b₁, . . . , b_m, q, 0, b₁, 0, . . . , b_m, 0, ) to . is assumed to equal a if q is the a state, and is assumed to equal b if q is not the a state.

The modified M is a deterministic Turing transducer, which on each input has a computation of an unbounded number of moves. On an input on which the original M has i moves, the modified M enters an infinite loop in the i + 1st move. In each move the modified M writes one symbol onto the output tape. The output of the modified M in the i + 1st, i + 2 nd, . . . moves is a if and only if the input is accepted by the original M. Moreover, the output of the original M can be obtained from the string that the modified M writes on the output tape, by removing all the symbols a and all the symbols b.

A Circuit c_n for Simulating M

A circuit c_n of the following form can simulate the original M on inputs of length n, by simulating the first t = 2^{log (T(n)+1)} moves of the modified M on the given input.

The simulation of exactly t = 2^{log (T(n)+1)} moves of (the modified) M, allows c_n to generate outputs of identical length t for all the inputs of length n. Such a uniformity in the length of the outputs is needed because of the circuits' rigidity in the length of their outputs.

The choice of t = 2^{log (T(n)+1)} instead of T(n) + 1 for the number of moves of M, is made to allow the value to be calculated just by marking a space of size O(log T(n)).

c_n assumes some fixed binary representation for the set {a, b, ¢, $, - 1, 0, + 1, , . . . , } Q. The elements of the set can be represented by binary strings of identical length k. , . . . , are assumed to be new symbols corresponding to the heads of M.

c_n consists of t + 2 subcircuits, referred to as IN, MOVE₁, . . . , MOVE_t, and OUT, respectively (see Figure 7.5.1).

Figure 7.5.1

A circuit c_n that computes the function computable by a deterministic Turing transducer M on instances of length n.

IN is a subcircuit which derives the initial (i.e., 0th) configuration

of M on the given input a₁ a_n. IN uses the values a₁, . . . , a_n of the input nodes x₁, . . . , x_n; the values of some constant nodes 0; and the values of some constant nodes 1 for obtaining the desired (representation of the) configuration.

The subcircuit MOVE_i, 1 i t, derives the ith configuration

of M from the i - 1st configuration

of M.

OUT is a subcircuit that extracts the (encoding of the) output b₁ b_t that M has in the tth configuration. OUT does so by eliminating the symbols that are not in {a, b}, for example, by using AND gates.

The Subcircuit MOVE_i

MOVE_i uses components PREFIX _FINDER and SUFFIX _FINDER for determining the transition rule (q, a, b₁, . . . , b_m, p, d₀, c₁, d₁, . . . , c_m, d_m, ) that M uses in its ith move (see Figure 7.5.2).

Figure 7.5.2

Subcircuit MOVE_i for simulating a transition of a deterministic Turing transducer between two configurations.

PREFIX _FINDER determines the prefix (q, a, b₁, . . . , b_m) of the transition rule from the i - 1st configuration of M. SUFFIX _FINDER determines the suffix (p, d₀, c₁, d₁, . . . , c_m, d_m,

) of the transition rule from (q, a, b₁, . . . , b_m). MOVE_i uses a component MODIFIER for carrying out the necessary modifications to the i - 1st configuration of M.

PREFIX _FINDER has a component FINDER_i, 0 i m, corresponding to each of the nonoutput tapes of M (see Figure 7.5.3).

Figure 7.5.3

A subcircuit PREFIX _FINDER for determining a transition rule of a Turing transducer.

FINDER_i determines the symbol that is under the head of the ith tape of M. FINDER_i employs a subcircuit LOCAL _FINDER_i for each pair of consecutive symbols in the portion of the configuration that corresponds to the ith tape of M. LOCAL _FINDER_i outputs (the representation of) the symbol

if its input corresponds to a pair of the form

. Otherwise, the subcircuit LOCAL _FINDER_i outputs just 0's. The output of each LOCAL _FINDER_i is determined by a table look-up circuit. The outputs of all the LOCAL _FINDER_i's are OR ed to obtain the desired output of FINDER_i.

SUFFIX _FINDER on input (q, a, b₁, . . . , b_m) employs a table look-up approach to find (p, d₀, c₁, d₁, . . . , c_m, d_m, ).

MODIFIER contains one component TAPE _MODIFIER_i for each of the nonoutput tapes i of the Turing transducer M, 0 i m (see Figure 7.5.4).

Figure 7.5.4

A subcircuit MODIFIER for modifying a configuration of a Turing transducer.

TAPE _MODIFIER_i contains one subcircuit SUBTAPE _MODIFIER for each location in the constructed configuration of the Turing transducer M. A SUBTAPE _MODIFIER that corresponds to location j receives the three symbols U, Y, and V as inputs at locations j - 1, j, and j + 1 in the configuration of M that is being modified. (The only exception occurs when the jth location is a boundary location. In such a case the SUBTAPE _MODIFIER receives only two input values.) In addition, the SUBTAPE _MODIFIER gets as input the modifications (c_i and d_i) that are to be made in the ith tape of M. The SUBTAPE _MODIFIER outputs the symbol Y ' for the jth location in the constructed configuration of M.

A Uniform Circuits Constructor

IN has size 0. Each FINDER_i contains O(T(n)) subcircuits LOCAL _FINDER_i, and a constant number of subcircuits OR. Each LOCAL _FINDER_i has constant size. Each subcircuit OR has size O(T(n)). Hence, PREFIX _FINDER has size O(T(n)). SUFFIX _ FINDER has constant size, and TAPE _MODIFIER has size O(T(n)). Consequently, c_n has size O(T²(n)).

An O(log T(n)) space-bounded, deterministic Turing transducer X can be constructed, to compute { (1ⁿ, c_n) | n 0 } in a brute-force manner.

Example 7.5.1 Let M be the one auxiliary-work-tape deterministic Turing transducer in Figure 7.5.5(a). M has time complexity T(n) = n + 1. For the purpose of the example take M as it is, without modifications. Using the terminology in the proof of Lemma 7.5.1, Q = {q₀, q₁, . . . , q₄}, = = {0, 1}, = {0, 1, B}, m = 1, and k = 4. Choose the following binary representation E: E(0) = 0000, E(1) = 0001, E(¢) = 0010, E($) = 0011, E(B) = 0100, E(a) = 0101, E(b) = 0110, E() = 0111, E() = 1000, E(q₀) = 1001, E(q₁) = 1010, E(q₂) = 1011, E(q₃) = 1100, E(q₄) = 1101, E(-1) = 1110, E(+1) = 1111. Choose n = 3.

Figure 7.5.5

(a) A Turing transducer. (b) Corresponding subcircuit IN. (c) Corresponding subcircuit PREFIX _FINDER. (d) Corresponding subcircuit SUFFIX _FINDER.

In such a case, t = 4. The subcircuit IN is given in Figure 7.5.5(b), the subcircuit PREFIX _FINDER is given in Figure 7.5.5(c), and the subcircuit SUFFIX _FINDER is given in Figure 7.5.5(d).

From Circuits Size to Sequential Time

The previous lemma deals with applying parallelism for simulating sequential computations. The following lemma deals with the simulation of parallel computations by sequential computations.

Lemma 7.5.2 U_SIZE _F (Z(n)) _d0DTIME _F (Z^d(n)).

Proof Consider any function Z(n), and any uniform family C = (c₀, c₁, c₂, . . . ) of circuits of size complexity Z(n). Let X be an O(log Z(n)) space-bounded, deterministic Turing transducer that computes the function { (1ⁿ, c_n) | n 0 }. A deterministic Turing transducer M can compute the same function as C in the following manner.

Given an input a₁ a_n, M employs X to determine the representation of the circuit c_n. The representation can be found in 2^{O(log Z(n))} = Z^O(1)(n) time because X is O(log Z(n)) space-bounded (see Theorem 5.5.1). Moreover, the representation has length O(Z(n)log Z(n)) because c_n has at most Z(n) gates, and each gate (g, t, g_L, g_R) has a representation of length O(log Z(n)).

Having the representation of c_n, the Turing transducer M evaluates the output of each node in c_n. M does so by repeatedly scanning the representation of c_n for quadruples (g, t, g_L, g_R), that correspond to nodes g_L and g_R, whose output values are already known. Having found such a quadruple (g, t, g_L, g_R), the Turing transducer M evaluates and also records the output value of g. After at most Z(n) iterations, M determines the output values of all the nodes in c_n.

Finally, M determines which nodes of c_n are the output nodes, and writes out their values.

By Theorem 7.5.1, the time of sequential computations and the size of uniform families of circuits are polynomially related.

Corollary 7.5.1 A problem is solvable in polynomial time if and only if it is solvable by a uniform family of circuits of polynomial size complexity.

U_FNC, U_NC, and NC

Sequential computations are considered feasible only if they are polynomially time- bounded. Similarly, families of circuits are considered feasible only if they are polynomially size-bounded. As a result, parallelism does not seem to have major influence on problems that are not solvable in polynomial time. On the other hand, for those problems that are solvable in polynomial time, parallelism is of central importance when it can significantly increase computing speed. One such class of problems is that which can be solved by uniform families of circuits, simultaneously having polynomial size complexity and polylog (i.e., O(logⁱn) for some i 0) depth complexity. This class of problems is denoted U_FNC .

The subclass of U_FNC, which is obtained by restricting the depth complexity of the families of circuits to O(logⁱn), is denoted U_FNCⁱ. The subclass of decision problems in U_FNC is denoted U_NC . The subclass of decision problems in U_FNCⁱ is denoted U_NCⁱ.

FNC denotes the class of problems solvable by (not necessarily uniform) families of circuits that simultaneously, have polynomial size complexity and polylog depth complexity. The subclass of decision problems in FNC is denoted NC . The subclass of FNC, obtained by restricting the families of circuits to depth complexity O(logⁱn), is denoted FNCⁱ. NCⁱ denotes the class of decision problems in FNCⁱ.

For nonuniform families of circuits the following contrasting theorem holds.

Theorem 7.5.2 NC¹ contains undecidable problems.

Proof Every unary language L over the alphabet {1} can be decided by a family C = (c₀, c₁, c₂, . . . ) of circuits of simultaneous polynomial size complexity and logarithmic depth complexity. Specifically, each c_n in C is a table look-up circuit that outputs 1 on a given input a₁ a_n if and only if a₁ a_n = 1ⁿ and 1ⁿ is in L.

However, a proof by diagonalization implies that the membership problem is undecidable for the unary language { 1ⁱ | The Turing machine M_i does not accept the string 1ⁱ }.

Sequential Space and Parallel Time

By Corollary 7.5.1, the definitions above, and the following lemma, the hierarchy shown in Figure 7.5.6 holds.

Figure 7.5.6

A hierarchy of decision problems between NLOG and P.

Lemma 7.5.3 NLOG U_NC².

Proof Consider any S(n) = O(log n) space-bounded, nondeterministic Turing machine M = <Q, , , , q₀, B, F> with m auxiliary work tapes. With no loss of generality assume that = {0, 1}. Let a tuple w = (q, i, a, u₁, v₁, . . . , u_m, v_m) be called a partial configuration of M on input a₁ a_n, if M has a configuration (qa, u₁qv₁, . . . , u_mqv_m) with a = ¢a₁ a_n$ and || = i. Let a partial configuration be called an initial partial configuration if it corresponds to an initial configuration. Let a partial configuration be called an accepting partial configuration if it corresponds to an accepting configuration.

Each partial configuration of M requires O(log n) space. The number k of partial configurations w₁, . . . , w_k that M has on the set of inputs of length n satisfies k = 2^{O(log n)} = n^O(1).

Say that M can directly reach partial configuration w' from partial configuration w if w and w' correspond to some configurations _w and _w' of M, respectively, such that _w _w'. Say that M can reach partial configuration w' from partial configuration w if w and w' correspond to some configurations _w and _w' of M, respectively, such that _w * _w'.

For the given n, the language L(M) $/~\$ {0, 1}ⁿ is decidable by a circuit c_n that consists of log k + 2 subcircuits, namely, DIRECT, FINAL, and log k copies of INDIRECT (Figure 7.5.7).

Figure 7.5.7

A circuit c_n that corresponds to an O(log n) space-bounded, nondeterministic Turing machine.

The structure of c_n relies on the observation that the Turing machine M accepts a given input a₁ a_n if and only if M has partial configurations w₀, . . . , w_t on input a₁ a_n, such that w₀ is an initial partial configuration, w_t is an accepting partial configuration, and M can directly reach w_i from w_i-1 for 1

DIRECT has a component CHECK_{i j} for each possible pair (w_i, w_j) of distinct partial configurations of M on the inputs of length n. CHECK_{i j} has the output 1 on a given input a₁ a_n if w_i as well as w_j are partial configurations of M on input a₁ a_n, and M can directly reach w_j from w_i. Otherwise, CHECK_{i j} has the output 0.

The component CHECK_{i j} is a table look-up circuit. Specifically, assume that CHECK_{i j} corresponds to the partial configurations w_i = (q, l, a, u₁, v₁, . . . , u_m, v_m) and w_j = (, , â, û₁, ₁, . . . , û_m, _m). In such a case, CHECK_{i j} is the constant node 0 when M cannot directly reach w_j from w_i. On the other hand, when M can directly reach w_j from w_i, then CHECK_{i j} is a circuit that has the output 1 on input a₁ a_n if and only if the l + 1st symbol in ¢a₁ a_n$ is a and the + 1st symbol in ¢a₁ a_n$ is â.

Each copy of the subcircuit INDIRECT modifies the values of the "variables" x_{1 2}, x_{1 3}, . . . , x_{n n-1} in parallel, where the value of x_{i j} is modified by a component called UPDATE_{i j}. Upon reaching the rth INDIRECT the variable x_{i j} holds 1 if and only if M can reach w_j from w_i in at most 2^r moves (through partial configurations of M on the given input), 1 r log k. Upon leaving the rth INDIRECT the variable x_{i j} holds 1 if and only if M can reach w_j from w_i in at most 2^r+1 moves. In particular, upon reaching the first INDIRECT, x_{i j} holds the output of CHECK_{i j}. However, upon leaving the last INDIRECT, x_{i j} holds 1 if and only if M can reach w_j from w_i.

FINAL determines whether M can reach an accepting partial configuration from an initial partial configuration on the given input a₁ a_n, that is, whether x_{i j} is equal to 1 for some initial partial configuration w_i and some accepting partial configuration w_j.

The subcircuit DIRECT has size O(k²) = n^O(1) and constant depth. Each of the subcircuits FINAL and INDIRECT has size no greater than O(k²) = n^O(1) and depth no greater than O(log k) = O(log n). As a result, the circuit c_n has size of at most O(k²(log k + 2)) = n^O(1), and depth of at most O((log k + 2)log k) = O(log²n).

The containment of DLOG in U_NC and the conjecture that U_NC is properly contained in P, suggest that the P-complete problems can not be solved efficiently by parallel programs. The following theorem provides a tool for detecting problems that can be solved efficiently by parallel programs (e.g., the problems in Exercise 5.1.8). Moreover, the proof of the theorem implies an approach for mechanically obtaining the parallel programs from corresponding nondeterministic sequential programs that solve the problems.

Notation In what follows, NSPACE _F (S(n)) denotes the set of functions computable by O(S(n)) space-bounded, nondeterministic Turing transducers.

Theorem 7.5.3 NSPACE _F (log n) U_FNC².

Proof Consider any Turing transducer M = <Q, , , , , q₀, B, F> of space complexity S(n) = O(log n). Assume that M computes some function f. In addition, with no loss of generality assume that = = {0, 1}. From M, for each symbol a in , a Turing machine M_a = <Q_a, , , _a, q_0a, B, F_a> can be constructed to accept the language { 1ⁱ0x | The ith output symbol of M on input x is a }.

Specifically, on a given input 1ⁱ0x, M_a records the value of i in binary on an auxiliary work tape. Then M_a follows the computation of M on input x. During the simulated computation, M_a uses the stored value of i to find the ith symbol in the output of M, while ignoring the output itself. M_a accepts 1ⁱ0x if and only if M has an accepting computation on input x with a as the ith symbol in the output.

The function f is computable by a family C = (c₀, c₁, c₂, . . . ) of circuits of the following form. Each c_n provides an output y₁ y_2^S(n)+1 of length 2 2^S(n) on input x₁ x_n. Each substring y_2j-1y_2j of the output is equal to 00, 11, or 10, depending on whether the jth symbol in the output of M is 0, 1, or undefined, respectively. y_2j-1 is obtained by negating the output of a circuit that simulates M_a for a = 0 on input 1^j0x₁ x_n. y_2j is obtained by a circuit that simulates M_a for a = 1 on input 1^j0x₁ x_n.

The result then follows from Lemma 7.5.3 because M_a is a logspace-bounded, Turing machine for a = 0 and for a = 1.

A proof similar to the one provided for the previous theorem can be used to show that NSPACE _F (S(n)) _d>0 U_SIZE _DEPTH _F (2^dS(n), S²(n)) for each fully space-constructible function S(n) log n. By this containment and a proof similar to that of Exercise 7.5.3, the space requirements of sequential computations and the time requirements of parallel computations are polynomially related.

[next] [prev] [prev-tail] [front] [up]