8.4 A Review of Probability Theory

Let W denote a finite collection of mutually exclusive statements about the world. By e = 2W we denote the set of all events. An empty set f, a subset of every set by definition is called the impossible event, since the outcome of a random selection can never be an element of f. On the other hand, the set W itself always contains the actual outcome, therefore it is called the certain event. If A and B are events, then so are the union of A and B, A»B, and the complements of A( ) and
B( ) , respectively. For example, the event A»B occurs if and only if A occurs or B occurs. We call the pair (W, e) the sample space. Define a function P: e Æ [0, 1] to be a probability if it could be induced in the way described, i.e. if it satisfies the following conditions which are well known as the Kolmogorov axioms,

(1) P(A) ³ 0 for all A W

(2) P(W) = 1

(3) For A, B W , from A«B = f follows

P(A»B) = P(A) + P(B)

P(A) or P(B) is known as the prior probability of A or B occurring. The prior probability of an event is not conditioned on the occurrence of any other event. Suppose it is noted in an experiment that for 1² i ² n, the event Ai occurred ki times. Then under the conventional evaluation, called maximum likelihood evaluation:

Pm(Ai) =  

but under an alternative evaluation, called the Bayesian evaluation:

Pb(Ai) =  

under this evaluation, we implicitly assume that each event has already occurred once even before the experiment commenced. When Ski Æ • ,

Pm(Ai) = Pb(Ai)

Nevertheless,

0 < Pb(Ai) < 1 .

Let P(A‘B) denote the probability of even A occurring conditioned on event B having already occurred. P(A‘B) is known as the posterior probability of A subject to B, or the conditional probability of A given B.

For a single event A, A W the following hold

P(A) ² 1

P( ) = 1 - P(A)

For A B and A, B W, P(A) ² P(B) (monotonicity)

P(A»B) ² P(A) + P(B), (subadditivity)

P(B« ) = P(B) - P(A), (subtractivity)

P(A»B) = P(A) + P(B) - P(A«B)

Finally, for a number of events {Ai‘i=1, ..., n }

= S1 - S2 + S3 - S4 + ... + (-1)n-1 Sn

where S1 =  

S2 =

S3 =

Sn = P(A1, A2, ..., An)

For conditional probability, P(A‘B), A, B W and P(B) > 0, define,

P(A‘B) =  .

We then have

P(A1«A2« ... «An) =

= P(A1) · P(A2‘A1) · P(A3‘A1«A2) ...

where Ai W

If Ai Π(e, W) for i = 1, ..., n , and

Ai«Aj = f , for i _ j and  Ai = W

and P(Ai) > 0 the for any given B Π(e, W)

P(B) =  

_________

This is the complete probability for event B. Therefore the conditional probability can be written as

P(Ai‘B) =  

This is the Bayes formula. A number of different versions of this formula will be discussed.

As P( ) = 1 - P(A) and P(A‘B) =  ,

it can be derived that

P( ‘B) = 1 - P(A‘B)

Definition: Prior odds on event A is

O(A) =  .

Since P( ) = 1 - P(A) , O(A) =  

Therefore P(A) can be represented by its prior odds:

P(A) =  

Definition: Posterior odds on event A conditioned on event B is

O(A‘B) =  .

Similarly,

O(A‘B) =  and thus

P(A‘B) =  

Assume event A is a hypothesis h and event B is a piece of evidence e, with the definition of conditional probability, the following hold:

P(h‘ ) =  ,

P( ‘ ) =  ,

P(h‘e) =  , and

P( ‘e) =  

The odds on h conditioned on e being absent is obtained by:

O(h‘ ) =  =  

=  · O(h)

This is called an odds likelihood formulation of the Bayes theorem. Depending on the context, the following expressions can be used synonymously: e does not occur, e is absent, e does not exist and e is false.

Similarly,

O(h‘e) =  · O(h) .

This called an odds likelihood formulation of the Bayes theorem. The following expressions can be used synonymously: e occurs, e is present, e exists and e is true.

For a hypothesis, supported by multiple pieces of evidences, by generalizing the above, we have

O(h‘ 1« 2« ... « k«ek+1« ... «em) =  · O(h)

when all evidences are mutually independent on h and  ,

O(h‘ 1« 2« ... « k«ek+1« ... «em) =  ·  · O(h)