[next] [prev] [prev-tail] [tail] [up]

7.6 Uniform Families of Circuits and PRAM's

      From PRAM's to Uniform Families of Circuits
      The Structure of c_n
      The Complexity of c_n
      From Uniform Families of Circuits to PRAM's
      The Simulation of Gate g_i by Processor M_i
      The Identification of Gate g_i by processor M_i

This section shows that uniform families of circuits and PRAM's are polynomially related in the resources they require. As a corollary, U_FNC is exactly the class of problems that can be solved by the PRAM's that have polynomial space complexity and polylog time complexity.

Notation In what follows, PROCESSORS_TIME _F (Z(n), T(n)) denotes the set of functions that can be computed by the PRAM's having both O(Z(n)) size complexity and O(T(n)) time complexity (under the logarithmic cost criterion).

From PRAM's to Uniform Families of Circuits

The proof of the following theorem consists of showing how the hardware of any given PRAM can be unrolled to obtain a corresponding uniform family of circuits. The degenerated case in which PRAM's are restricted to being RAM's has been considered in Lemma 7.5.1.

Theorem 7.6.1 If log T(n) and log Z(n) are fully space-constructible, log Z(n) T(n), and n O, then

Proof Consider any PRAM = <M, X, Y, A> of size complexity Z(n) and time complexity T(n). By Theorem 7.2.1 it can be assumed that is a CREW PRAM. Consider any n and let m = Z(n) and t = T(n). The computations of on inputs of length n can be simulated by the circuit c_n of Figure 7.6.1.

Figure 7.6.1

A circuit for simulating a computation of a PRAM.

The Structure of c_n

The circuit c_n has an underlying structure similar to the circuit c_n in the proof of Lemma 7.5.1 (see Figure 7.5.1). It consists of t + 2 subcircuits, namely, IN, STEP₁, . . . , STEP_t, and OUT. IN considers a given input of length n as an encoding of some input (v₁, . . . , v_N) of , and determines the initial configuration of . STEP_i determines the configuration that reaches after its ith step. OUT extracts the output of from the output of STEP_t.

Each configuration of is assumed to have the form (i₁, X(i₁), i₂, X(i₂), . . . ; ₁, Y(₁), ₂, Y(₂), . . . ; ₁, A(₁), ₂, A(₂), . . . ; ₁, V₁(₁), ₂, V₁(₂), . . . ; . . . ; ₁, V_m(₁), ₂, V_m(₂), . . . ), where V_i(j) is assumed to be the value of the jth local variable of processor M_i.

STEP_i consists of three layers, namely, READ, SIMULATE, and WRITE. The READ layer simulates the reading, from the input cells and shared memory cells, that takes place during the ith step of the simulated computation. The SIMULATE layer simulates the internal computation that takes place during the ith step by the processors M₁, . . . , M_m. The WRITE layer simulates the writing, to the output cells and shared memory cells, that takes place during the ith step of the simulated computation.

With no loss of generality, it is assumed that in each step processor M_i reads from the input cell X(V_i(1)) into V_i(1), and from the shared memory cell A(V_i(2)) into V_i(2). Similarly, it is assumed that in each step M_i writes the value of V_i(3) into the output cell Y(V_i(4)), and the value of V_i(5) into A(V_i(6)).

SIMULATE contains a subcircuit SIM _RAM for each of the processors M₁, . . . , M_m. The internal computation of processor M_j is simulated by a SIM _RAM whose input is (i₁, V_j(i₁), i₂, V_j(i₂), . . .). With no loss of generality it is assumed that the index j of M_j is stored in V_j(7).

The Complexity of c_n

The circuits IN, READ, WRITE, and OUT can each simulate an O(log (nZ(n)T(n))) space-bounded, deterministic Turing transducer that carries out the desired task. The simulations can be as in the proof of Lemma 7.5.3. Hence, each of these circuits has size no greater than (nZ(n)T(n))^O(1) (Z(n)T(n))^O(1) and depth no greater than (log (nZ(n)T(n)))^O(1) T^O(1)(n). SIM _RAM can simulate a processor M_i indirectly as in the proof of Lemma 7.5.1, through a deterministic Turing transducer equivalent to M_i. Hence, each SIM _RAM has size no greater than T^O(1)(n).

From Uniform Families of Circuits to PRAM's

The previous theorem considered the simulation of PRAM's by uniform families of circuits. The next theorem considers simulations in the other direction.

Theorem 7.6.2

Proof Consider any uniform family C = (c₀, c₁, c₂, . . . ) of circuits with size complexity Z(N) and depth complexity D(n). Let T = <Q, , , , q₀, B, F> be an S(n) = O(log Z(n)) space-bounded, deterministic Turing transducer that computes { (1ⁿ, c_n) | n 0 }. From T a CREW PRAM = <M, X, Y, A> of size complexity Z^O(1)(n) and time complexity D(n)log ^O(1)Z(n) can be constructed to simulate the computations of C in a straightforward manner.

The Simulation of Gate g_i by Processor M_i

Specifically, for each gate g_i in c_n, the PRAM employs a corresponding processor M_i and a corresponding shared memory cell A(i). The processor M_i is used for simulating the operation of g_i, and the cell A(i) is used for recording the outcome of the simulation.

At the start of each simulation, M_i initializes the value of A(i) to 2, as an indication that the output of g_i is not available yet. Then M_i waits until its operands become available, that is, until its operands reach values that differ from 2. M_i has the input cell X(j) as an operand if g_i gets an input from the jth input node x_j. M_i has the shared memory cell A(j) as an operand if g_i gets an input from the jth gate g_j. When its operands become available, M_i performs on them the same operation as does g_i. M_i stores the result in Y(j), if g_i is the jth output node of c_n. Otherwise, M_i stores the result in A(i).

The Identification of Gate g_i by processor M_i

Before the start of a simulation of c_n the PRAM determines for each gate g_i in c_i, what the type t is in {¬, $\/$ , $/\$ } of g_i, and which are the predecessors g_L and g_R of g_i. does so by determining in parallel the output of T on input 1ⁿ, and communicating each substring of the form (g_i) and each substring of the form (g_i, t, g_L, g_R) in the output to the corresponding processor M_i.

determines the output of T by employing a group B₁, . . . , B_{O(Z(n)log Z(n))} of processors. The task of processor B_j is to determine the jth symbol in the output of T.

B_j, in turn, employs a processor B_ja for each symbol a in the output alphabet of T. The task of B_ja is to notify B_j whether the jth symbol in the output of T is the symbol a. B_ja does so by simulating a log Z(n) space-bounded Turing machine M_T that accepts the language { 1ⁿ | a is the jth symbol in the output of T }. The simulation is performed in parallel by a group of processors that uses an approach similar to that described in the proof of Lemma 7.5.3.

Once the output of T is determined, each processor B_j that holds the symbol "(" communicates the string "(g_i )" that is held by B_j, . . . , B_{j+|(g_i )|-1} to the corresponding processor M_i of .

Finally, each processor M_i that has been communicated to with a string of the form (g_i, t, g_L, g_R) communicates with its predecessors to determine the input nodes of c_n.

[next] [prev] [prev-tail] [front] [up]