[next] [prev] [prev-tail] [tail] [up]

6.2  Probabilistic Programs That Might Err

      Error Probability of Repeatedly Executed Probabilistic Programs
      Outputs of Probabilistic Programs

For many practical reasons programs might be allowed a small probability of erring on some inputs.

Example 6.2.1    A brute-force algorithm for solving the nonprimality problem takes exponential time (see Example 5.1.3). The program in Figure 6.2.1 is an example of a probabilistic program that determines the nonprimality of numbers in polynomial expected time.


 
read x
y := random({2, . . . ,  V~ -
  x })
if x is divisible by y then
    answer := yes       /* not a prime number */
else answer := no                                  
 
Figure 6.2.1 An undesirable probabilistic program for the nonprimality problem.

The program has zero probability for an error on inputs that are prime numbers. However, for infinitely many nonprime numbers the program has a high probability of giving a wrong answer. Specifically, the probability for an error on a nonprime number m is 1 - s/( V~ m- - 1), where s is assumed to be the number of distinct divisors of m in {2, . . . , V ~ m- }. In particular, the probability for an error reaches the value of 1 - 1/( V~ m - 1) for those numbers m that are a square of a prime number.

The probability of getting a wrong answer for a given number m can be reduced by executing the program k times. In such a case, the number m is declared to be nonprime with full confidence, if in any of k executions the answer yes is obtained. Otherwise, m is determined to be prime with probability of at most (1 - 1/( V~ m - 1))k for an error. With k = c( V~ m- - 1) this probability approaches the value of (1/r )c < 0.37c as m increases, where r is the constant 2.71828 . . . However, such a value for k is forbidingly high, because it is exponential in the length of the representation of m, that is, in log m.

An improved probabilistic program can be obtained by using the following known result.

Result     Let Wm(b) be a predicate that is true if and only if either of the following two conditions holds.

  1.  (bm-1 - 1) mod m/= 0.
  2.  1 < gcd (bt - 1, m) < m for some t and i such that m - 1 = t . 2i.
Then for each integer m >= 2 the conditions below hold.
  1.  m is a prime number if and only if Wm(b) is false for all b such that 2 <= b < m.
  2.  If m is not prime, then the set { b | 2 <= b < m, and Wm(b) holds } is of cardinality (3/4)(m - 1) at least.

The result implies the probabilistic program in Figure 6.2.2.


 
read x
y := random{2, . . . , x - 1}
if Wx(y) then answer := yes
else answer := no             
 
Figure 6.2.2 A good probabilistic program for the nonprimality problem.

For prime numbers m the program always provides the right answers. On the other hand, for nonprime numbers m, the program provides the right answer with probability of at most 1 - (3/4)(m - 1)/(m - 2) <= 1/4 for an error.

The probability for a wrong answer can be reduced to any desired constant e by executing the program for k >= log1/4 e times. That is, the number of times k that the program has to be executed is independent of the input m.

Checking for the condition (bm-1 - 1) mod m/= 0 can be done in polynomial time by using the relation (a + b) mod m = ( (a mod m) + (b mod m) ) mod m and the relation (ab) mod m = ((a mod m)(b mod m)) mod m. Checking for the condition gcd (bt - 1, m) can be done in polynomial time by using Euclid's algorithm (see Exercise 5.1.1). Consequently, the program in Figure 6.2.2 is of polynomial time complexity.  ***

Example 6.2.2    Consider the problem of deciding for any given matrices A, B, and C whether AB/= C. A brute-force algorithm to decide the problem can compute D = AB, and E = D - C, and check whether

E /= ( 0  ... 0 )
   ...  ...  ...
  0  ... 0
A brute-force multiplication of A and B requires O(N3) time (under the uniform cost criteria), if A and B are of dimension N × N. Therefore, the brute-force algorithm for deciding whether AB/= C also takes O(N3) time.

The inequality AB/= C holds if and only if the inequality

(AB - C)(      )
   x^1.
    ..
   ^xN /= (    )
   0.
   ..
   0
holds for some vector
^x = (     )
   ^x1.
    ..
  x^N
Consequently, the inequality AB/= C can be determined by a probabilistic program that determines
  1.  A column vector
    ^x = (     )
   ^x1..
    .
   ^xN
    of random numbers from {-1, 1}.
  2.  The value of the vector ^y = B^x.
  3.  The value of the vector ^z = A^y.
  4.  The value of the vector û = C^x.
  5.  The value of the vector
    ^v = ^z - û
    = A^y - C^x
    = AB^^x - C^x
    = (AB - C)^x
    = (                )
   d1.1  ....  d1.N
    ..    ..   ..
   dN1  ... dN N (     )
   ^x.1
   ..
  ^xN
    = (                      )
   d11 ^x1 + .... + d1N ^xN
            ..
   dN1 ^x1 + ...+dN N ^xN
If some of the entries in the vector
^v = (                      )
   d11 ^x1 + .... + d1N ^xN
            ..
   dN1x^1 + ...+ dN N ^xN
are nonzeros, then the probabilistic program reports that the inequality AB/= C must hold. Otherwise, the program reports that AB = C with probability of at most 1/2 for an error. The program takes O(N2) time.

The analysis of the program for its error probability relies on the following result.

Result     Let d1x1 + . . . + dN xN = 0 be a linear equation with coefficients that satisfy the inequality (d1, . . . , dN )/= (0, . . . , 0). Then the linear equation has at most 2N-1 solutions (^x1, . . . , ^xN ) over {-1, 1}.

Proof     Consider any linear equation d1x1 + . . . + dN xN = 0. With no loss of generality assume that d1 > 0.

If (^x1, . . . , ^xN ) is a solution to the linear equation over {-1, 1} then (-^x1, ^x2, . . . , ^xN ) does not solve the equation. On the other hand, if both (^x1, . . . , x^N ) and (~x1, . . . , ~xN ) solve the equation over {-1, 1}, then the inequality (-^x1, ^x2, . . . , ^xN )/= (-~x1, ~x2, . . . , ~xN ) holds whenever (^x1, . . . , x^N )/= (~x1, . . . , ~xN ).

As a result, each solution to the equation has an associated assignment that does not solve the equation. That is, at most half of the possible assignments over {-1, 1} to (x1, . . . , xN ) can serve as solutions to the equation.  ***

The probabilistic program can be executed k times on any given triplet A, B, and C. In such a case, if any of the executions results in a nonzero vector ^v then AB/= C must hold. Otherwise, AB = C with probability of at most (1/2)k for an error. That is, by repeatedly executing the probabilistic program, one can reduce the error probability to any desired magnitude. Moreover, the resulting error probability of (1/2)k is guaranteed for all the possible choices of matrices A, B, and C.  ***

 Error Probability of Repeatedly Executed Probabilistic Programs

The probabilistic programs in the previous two examples exhibit the property of one-sided error probability. Specifically, their yes answers are always correct. On the other hand, their no answers might sometimes be correct and sometimes wrong. In general, however, probabilistic programs might err on more than one kind of an answer. In such a case, the answer can be arrived at by taking the majority of answers obtained in repeated computations on the given instance.

The following lemma analyzes the probability of error in repeated computations to establish an answer by absolute majority.

Lemma 6.2.1    Consider any probabilistic program P. Assume that P has probability e of providing an output that differs from y on input x. Then P has probability

          N
P(N, e) =  sum  (2N+1 )e2N+1- k(1 - e)k
         k=0  k

of having at least N + 1 computations with outputs that differ from y in any sequence of 2N + 1 computations of P on input x.

Proof     Let P, x, y, and e be as in the statement of the lemma. Let f(N, e, k) denote the probability that, in a sequence of 2N + 1 computations on input x, P will have exactly k computations with an output that is equal to y. Let P(N, e) denote the probability that, in a sequence of 2N + 1 computations on input x, P will have at least N + 1 computations with outputs that differ from y.

By definition,

 P(N, e)  =   (Probabilitythatinexactly2N + 1 computationstheoutput
             willnotequaly)
             +  (Probabilitythatin exactly 2N computationstheoutput
                willnotequaly)
             ...
             + (Probability thatinexactlyN + 1 computationsthe
               outputwillnotequaly)
         =   (Probabilitythatinexactly0 computations theoutputwill
             equaly)
             +  (Probabilitythatin exactly 1computationthe outputwill
                equaly)
             ...
             + (Probability thatinexactlyN computationstheoutput
               willequaly)
             N sum 
         =   k=0 f(N,e,k).

The probability of having the answer y at and only at the i1st,  . . . , ikth computations, in a sequence of 2N + 1 computations of P on input x, is equal to

  i1-1       i2-i1- 1      i3-i2-1         ik-ik- 1-1      2N+1 -ik
e   (1- e)e     (1- e)e      (1 - e)... e       (1- e)e
                      = (1- e)ke2N+1 -k

i1, . . . , ik are assumed to satisfy 1 <= i1 < i2 < . . . < ik <= 2N + 1.

Each collection {C1, . . . , C2N+1} of 2N + 1 computations of P has

                                         -(2N+1)!-
(2N + 1)(2N)(2N - 1)... (2N + 1- k+ 1) = (2N+1-k)!

possible arrangements Ci1 . . . Cik of k computations. In these arrangements only

 --(2N+1)!-- = (2N+1)
k!(2N+1 -k)!      k

satisfy the condition i1 < . . . < ik. Consequently, in each sequence of 2N + 1 computations of P there are (2N+1 )
  k possible ways to obtain the output y for exactly k times.

The result follows because f(N, e, k) = (the number of possible sequences of 2N + 1 computations in which exactly k computations have the output y) times (the probability of having a sequence of 2N + 1 computations with exactly k outputs of y).  ***

In general, we are interested only in probabilistic programs which run in polynomial time, and which have error probability that can be reduced to a desired constant by executing the program for some polynomial number of times. The following theorem considers the usefulness of probabilistic programs that might err.

Theorem 6.2.1    Let P(N, e) be as in the statement of Lemma 6.2.1. Then P(N, e) has the following properties.

  1.  P(N, 1
2) = 1/2 for all N.
  2.  P(N, e) approaches 0 for each constant e < 1/2, when N approaches  oo .
  3.  P(N, 12 - 1N-) approaches a constant that is greater than 1/(2r 6), when N approaches  oo .

Proof    

  1.  The equality P(N, 1
2) = 1/2 for all N is implied from the following relations.
    2P(N, 12) = 2 N sum   (    )( )2N+1 -k(    )k
    2N+k1   12        1- 12
k=0
    =  sum N (   )( )2N+1-k(     )k
    2N+k1  12        1 - 12
k=0
    + 2N sum +1 (    )( )2N+1-k(     )k
      2N+k1  12        1-  12
k=N+1
    = (1 + (1- 1))
 2       2 2N+1
    = 1
  2.  The equality

     2N sum +1(2N+1) = 22N+1
     k=0    k

    implies the following inequality for 0 <= k <= 2N + 1.

     (2N+1) <= 22N+1
       k

    Consequently, the result is implied from the following relations that hold for e = 1/2 - d.

    P(N, e) = N sum   (    )
    2N+k1 e2N+1- k(1 - e)k
k=0
    <= (N + 1)(2N+1 )
  NeN+1(1 - e)N
    <= (N + 1)22N+1eN+1(1 - e)N
    = (N + 1)22N+1(1 - d)
 2 N+1(1 + d)
 2 N
    = 2(1- d)
 2 (N + 1)(1 - 2d)N (1 + 2d)N
    = 2e(N + 1)(1 - 4d2)N
  3.  The result follows from the following relations because (1- 2/N) N/2 approaches 1/r with N.

    P(   1   1)
 N, 2-  N =  sum N (2N+1)(1  1)2N+1-k(   (1   1-))k
      k   2-  N        1-  2 - N
k=0
    =  sum N (2N+1)(1  1)2N+1-k(1   1)k
      k   2-  N        2 + N
k=0
    = (     )
 12 - 1N- 2N+1  sum k=0N (    )
 2N+k1(        )
  11/2/2+--11/N/N- k
    > (1 - 1)
 2   N 2N+1  sum k=0N (2N+1)
   k
    = 1
2(1   1-)
 2 - N 2N+1  sum k=02N+1(2N+1)
   k
    = 1
2(1   1-)
 2 - N 2N+1(1 + 1)2N+1
    = 1
2(    2)
 1 - N 2N+1
    = 1
2((    2)N/2)
  1-  N (4+2/N)
    >= 1
2((    2)N/2)
  1-  N 6
    PICT

According to Theorem 6.2.1(a), a probabilistic program must have an error probability e(x) smaller than 1/2, in order to reduce in the probability of obtaining a wrong solution through repeatedly running the program. According to Theorem 6.2.1(b), error probability e(x) smaller than some constant e < 1/2 allows a reduction to a desired magnitude in speed that is independent from the given input x. On the other hand, by Theorem 6.2.1(c) the required speed of reduction is bounded below by f(x) = 1/( (1/2) - e(x) ), because the probability P(f(x),e(x)) of obtaining a wrong solution through repeatedly running the program for f(x) times is greater than 1/(2r 6). In particular, when f(x) is more than a polynomial in |x|, then the required speed of reduction is similarly greater.

 Outputs of Probabilistic Programs

The previous theorems motivate the following definitions.

A probabilistic program P is said to have an output y on input x, if the probability of P having a computation with output y on input x is greater than 1/2. If no such y exists, then the output of P on input x is undefined.

A probabilistic program P is said to compute a function f(x) if P on each input x has probability 1 - e(x) for an accepting computation with output f(x), where

  1.  e(x) < 1/2 whenever f(x) is defined.
  2.  e(x) is undefined whenever f(x) is undefined.
e(x) is said to be the error probability of P. The error probability e(x) is said to be a bounded-error probability if there exists a constant e < 1/2 such that e(x) <= e for all x on which e(x) is defined. P is said to be a bounded-error probabilistic program if it has a bounded-error probability.

By the previous discussion it follows that the probabilistic programs, which have both polynomial time complexity and bounded-error probability, are "good" programs.

[next] [prev] [prev-tail] [front] [up]