8.4 A Review of Probability Theory
Let W denote a finite collection of mutually
exclusive statements about the world. By e = 2W we denote the set
of all events. An empty set f, a subset of every set by
definition is called the impossible event, since the outcome of a
random selection can never be an element of f. On the other hand,
the set W itself always contains the actual outcome, therefore it
is called the certain event. If A and B are events, then so are
the union of A and B, A»B, and the complements of A( ) and
B( ) , respectively. For example, the event A»B occurs if
and only if A occurs or B occurs. We call the pair (W, e) the
sample space. Define a function P: e Æ [0, 1] to be a
probability if it could be induced in the way described, i.e. if
it satisfies the following conditions which are well known as the
Kolmogorov axioms,
(1) P(A) ³ 0 for all A
W
(2) P(W) = 1
(3) For A, B
W , from A«B = f follows
P(A»B) = P(A) + P(B)
P(A) or P(B) is known as the prior probability of A or B occurring. The prior probability of an event is not conditioned on the occurrence of any other event. Suppose it is noted in an experiment that for 1² i ² n, the event Ai occurred ki times. Then under the conventional evaluation, called maximum likelihood evaluation:
Pm(Ai) =
but under an alternative evaluation, called the Bayesian evaluation:
Pb(Ai) =
under this evaluation, we implicitly assume that each event has already occurred once even before the experiment commenced. When Ski Æ ,
Pm(Ai) = Pb(Ai)
Nevertheless,
0 < Pb(Ai) < 1 .
Let P(AB) denote the probability of even A occurring conditioned on event B having already occurred. P(AB) is known as the posterior probability of A subject to B, or the conditional probability of A given B.
For a single event A, A W the following
hold
P(A) ² 1
P( ) = 1 - P(A)
For A B and A, B
W, P(A) ² P(B)
(monotonicity)
P(A»B) ² P(A) + P(B), (subadditivity)
P(B« ) = P(B) - P(A), (subtractivity)
P(A»B) = P(A) + P(B) - P(A«B)
Finally, for a number of events {Aii=1, ..., n }
= S1 - S2 + S3 - S4 + ... + (-1)n-1 Sn
where S1 =
S2 =
S3 =
Sn = P(A1, A2, ..., An)
For conditional probability,
P(AB), A, B W and P(B) > 0, define,
P(AB) = .
We then have
P(A1«A2« ... «An) =
= P(A1) · P(A2A1) ·
P(A3A1«A2) ...
where Ai W
If Ai (e, W) for i = 1, ..., n , and
Ai«Aj = f , for i _ j and Ai = W
and P(Ai) > 0 the for any given B (e, W)
P(B) =
_________
This is the complete probability for event B. Therefore the conditional probability can be written as
P(AiB) =
This is the Bayes formula. A number of different versions of this formula will be discussed.
As P( ) = 1 - P(A) and P(AB) = ,
it can be derived that
P( B) = 1 - P(AB)
Definition: Prior odds on event A is
O(A) = .
Since P( ) = 1 - P(A) , O(A) =
Therefore P(A) can be represented by its prior odds:
P(A) =
Definition: Posterior odds on event A conditioned on event B is
O(AB) = .
Similarly,
O(AB) = and thus
P(AB) =
Assume event A is a hypothesis h and event B is a piece of evidence e, with the definition of conditional probability, the following hold:
P(h ) = ,
P( ) = ,
P(he) = , and
P( e) =
The odds on h conditioned on e being absent is obtained by:
O(h ) = =
= · O(h)
This is called an odds likelihood formulation of the Bayes theorem. Depending on the context, the following expressions can be used synonymously: e does not occur, e is absent, e does not exist and e is false.
Similarly,
O(he) = · O(h) .
This called an odds likelihood formulation of the Bayes theorem. The following expressions can be used synonymously: e occurs, e is present, e exists and e is true.
For a hypothesis, supported by multiple pieces of evidences, by generalizing the above, we have
O(h 1« 2« ... « k«ek+1« ... «em) = · O(h)
when all evidences are mutually independent on h and ,
O(h 1« 2« ... « k«ek+1« ... «em) = · · O(h)