**8.4 A Review of
Probability Theory**

**L**et W denote a finite collection of mutually
exclusive statements about the world. By e = 2W we denote the set
of all events. An empty set f, a subset of every set by
definition is called the impossible event, since the outcome of a
random selection can never be an element of f. On the other hand,
the set W itself always contains the actual outcome, therefore it
is called the certain event. If A and B are events, then so are
the union of A and B, A»B, and the complements of A( ) and

B( ) , respectively. For example, the event A»B occurs if
and only if A occurs or B occurs. We call the pair (W, e) the
sample space. Define a function P: e Æ [0, 1] to be a
probability if it could be induced in the way described, i.e. if
it satisfies the following conditions which are well known as the
Kolmogorov axioms,

(1) P(A) ³ 0 for all A W

(2) P(W) = 1

(3) For A, B W , from A«B = f follows

P(A»B) = P(A) + P(B)

P(A) or P(B) is known as the prior probability of A or B occurring. The prior probability of an event is not conditioned on the occurrence of any other event. Suppose it is noted in an experiment that for 1² i ² n, the event Ai occurred ki times. Then under the conventional evaluation, called maximum likelihood evaluation:

Pm(Ai) =

but under an alternative evaluation, called the Bayesian evaluation:

Pb(Ai) =

under this evaluation, we implicitly assume that each event has already occurred once even before the experiment commenced. When Ski Æ • ,

Pm(Ai) = Pb(Ai)

Nevertheless,

0 < Pb(Ai) < 1 .

Let P(A‘B) denote the probability of even A occurring conditioned on event B having already occurred. P(A‘B) is known as the posterior probability of A subject to B, or the conditional probability of A given B.

For a single event A, A W the following hold

P(A) ² 1

P( ) = 1 - P(A)

For A B and A, B W, P(A) ² P(B) (monotonicity)

P(A»B) ² P(A) + P(B), (subadditivity)

P(B« ) = P(B) - P(A), (subtractivity)

P(A»B) = P(A) + P(B) - P(A«B)

Finally, for a number of events {Ai‘i=1, ..., n }

= S1 - S2 + S3 - S4 + ... + (-1)n-1 Sn

where S1 =

S2 =

S3 =

Sn = P(A1, A2, ..., An)

For conditional probability, P(A‘B), A, B W and P(B) > 0, define,

P(A‘B) = .

We then have

P(A1«A2« ... «An) =

= P(A1) · P(A2‘A1) · P(A3‘A1«A2) ...

where Ai W

If Ai Œ (e, W) for i = 1, ..., n , and

Ai«Aj = f , for i _ j and Ai = W

and P(Ai) > 0 the for any given B Œ (e, W)

P(B) =

_________

This is the complete probability for event B. Therefore the conditional probability can be written as

P(Ai‘B) =

This is the Bayes formula. A number of different versions of this formula will be discussed.

As P( ) = 1 - P(A) and P(A‘B) = ,

it can be derived that

P( ‘B) = 1 - P(A‘B)

Definition: Prior odds on event A is

O(A) = .

Since P( ) = 1 - P(A) , O(A) =

Therefore P(A) can be represented by its prior odds:

P(A) =

Definition: Posterior odds on event A conditioned on event B is

O(A‘B) = .

Similarly,

O(A‘B) = and thus

P(A‘B) =

Assume event A is a hypothesis h and event B is a piece of evidence e, with the definition of conditional probability, the following hold:

P(h‘ ) = ,

P( ‘ ) = ,

P(h‘e) = , and

P( ‘e) =

The odds on h conditioned on e being absent is obtained by:

O(h‘ ) = =

= · O(h)

This is called an odds likelihood formulation of the Bayes theorem. Depending on the context, the following expressions can be used synonymously: e does not occur, e is absent, e does not exist and e is false.

Similarly,

O(h‘e) = · O(h) .

This called an odds likelihood formulation of the Bayes theorem. The following expressions can be used synonymously: e occurs, e is present, e exists and e is true.

For a hypothesis, supported by multiple pieces of evidences, by generalizing the above, we have

O(h‘ 1« 2« ... « k«ek+1« ... «em) = · O(h)

when all evidences are mutually independent on h and ,

O(h‘ 1« 2« ... « k«ek+1« ... «em) = · · O(h)