The study of sequential computations has been conducted through abstract models, namely RAM's and Turing transducers. RAM's turned out to be useful for designing such computations, and Turing transducers were helpful in analyzing them. Viewing parallel computations as generalizations of sequential ones, calls for similar generalizations to the associated models. A generalization along such lines to RAM's is introduced below.
A parallel random access machine, or PRAM, is a system <M, X, Y, A>, of
infinitely many RAM's M1, M2, . . . , infinitely many input cells X(1), X(2), . . . ,
infinitely many output cells Y(1), Y(2), . . . , and infinitely many shared memory cells
A(1), A(2), . . . Each Mi is called a processor of
. All the processors M1, M2, . . . are
assumed to be identical, except for the ability of each Mi to recognize its own index
i.
At the start of a computation, is presented with some N input values that are stored
in X(1), . . . , X(N), respectively. At the end of the computation the output values are
stored in Y(1), . . . , Y(K), K
0. During the computation
uses m processors
M
1, . . . , Mm, where m depends only on the input. It is assumed that each of the
processors is aware of the number N of the given input values and of the number m of
processors.
Each step in a computation consists of the five following phases, carried in parallel by all the processors.
The depth of a computation of a PRAM, and its depth and size complexity are defined with respect to the length n of the inputs in a similar way to that for parallel programs. The time requirement of a computation of a PRAM and its time complexity, under the uniform and logarithmic cost criteria, are defined in the obvious way.
When no confusion arises, because of the obvious relationship between the number N of values in the input and the length n of the input, N and n are used interchangeably.
Example 7.2.1
COMMON and PRIORITY PRAM's, = <M, X, Y, A> similar to the parallel
programs
i = <P, X, Y> of Example 7.1.1, can be used to solve the problem Q of
selecting the smallest element in a given set S of cardinality N. The communication
between the processors can be carried indirectly through the shared memory cells
A(1), A(2), . . .
Specifically, each message to Mi is stored in A(i). Where the PRAM's are similar to
1, no problem arises because all the messages are identical. Alternatively, where the
PRAM's are similar to
2 or
3, no problem arises because no write conflicts occur.
By definition, each CREW PRAM is also a COMMON PRAM, and each COM- MON PRAM is also a PRIORITY PRAM. The following result shows that each PRIORITY PRAM can be simulated by a CREW PRAM.
Theorem 7.2.1
Each PRIORITY PRAM of size complexity Z(n) and depth complexity D(n) has an
equivalent CREW PRAM
' of size complexity Z2(n) and depth complexity
D(n)log Z(n).
Proof
Consider any PRIORITY PRAM = <M, X, Y, A>.
' = <M ', X, Y, A> can be a
CREW PRAM of the following form, whose processors are denoted M1 1, . . . , M1 m,
M
2 1, . . . , M2 m, . . . , Mm 1, . . . , Mm m.
On a given input, ' simulates the computation of
.
' uses the processors
M
1 1, M2 2, . . . , Mm m for simulating the respective processors M1, M2, . . . , Mm of
. Similarly,
' records in the shared memory cell A(m2 + i) the values that
records in the
shared memory cell A(i), i
1. The main difference arises when a write is to be
simulated.
In such a case, each Mj j communicates to each Mi i the address of the cell where Mj wants to write. Each Mi i then determines from those addresses, if it has the highest priority of writing into its designated cell, and accordingly decides whether to perform the write operation. Mi i employs the processors Mi 1, . . . , Mi i-1 and the shared memory cells A(i, 1), . . . , A(i, i - 1) for such a purpose, where A(i1, i2) stands for A((i1 - 1)m + i2).
To resolve the write conflicts each Mj j stores into A(j, j) the address where Mj wants to write (see Figure 7.2.1).
![]()
|
Each processor Mir, 1 r < i, starts by reading the addresses stored in A(r, r) and in
A(i, i), and storing 1 in A(i, r) if and only if A(r, r) = A(i, i). Then the processors
Mi1, . . . , Mi
(i-1)/2
are employed in parallel to determine in O(log i) steps whether the
value 1 appears in any of the shared memory cells A(i, 1), . . . , A(i, i - 1). At each step the
number of "active" processors and the number of the active shared memory cells
is reduced by half. At any given step each of the active processors Mir stores
the value 1 in A(i, r) if and only if either A(i, 2r - 1) or A(i, 2r) holds that
value.
Mi i determines that A(i, i) holds an address that differs from those stored in
A(1, 1), . . . , A(i - 1, i - 1) by determining that A(i, 1) holds a value that differs from 1. In
such a case, Mi i determines that Mi has the highest priority for writing in its designated
cell. Otherwise, Mi i determines that Mi does not have the highest priority.