Probability_Calculus

Worlds and Degrees of Belief

We can assign a ‘degree of belief’ (probability) to each world
- The probability of a sentence, \(\alpha\), is the sum of probabilities where \(\alpha\) is true.

\[ P(\alpha)=\sum_{w_i\models\alpha}P(w_i)\]

Can use this idea to create joint probability distribution tables
- This representation however is exponential in variables, therefore we adapt these distributions through graphical models

Properties of Belief

\[ 0\leq P(\alpha)\leq 1\]

\[ P(\alpha) = 0\text{ when }\alpha\text{ is inconsistent} \]

\[ P(\alpha) = 1\text{ when }\alpha\text{ is valid} \]

\[ P(\alpha) + P(\neg\alpha) = 1\]

\[ P(\alpha\lor\beta) = P(\alpha)+P(\beta)-P(\alpha\land\beta) \]

Updating Belief

In terms of worlds:

\[ P(w|\beta)=\begin{cases} 0,\quad \text{if }w\models\neg\beta\\\frac{P(w)}{P(\beta)},\quad\text{if }w\models\beta\end{cases}\]

In terms of sentences (Bayes Theorem):

\[ P(\alpha|\beta) = \frac{P(\alpha\land\beta)}{P(\beta)}\]

Quantifying Uncertainty

We use entropy to measure degree of uncertainty
- We use the convention \(0\log_2(0) = 0\)

Entropy

The entropy of a variable \(X\) is:

\[ H(X) = -\sum_xP(X)log_2P(X)\]

This can be extended to sets of variables:

\[ H(\textbf{X}) = -\sum_xP(\textbf{X})log_2P(\textbf{X}) \]

Conditional Entropy

Conditional entropy quantifies the uncertainity of X after observing Y
- Note \(H(X|Y)\leq H(X)\) as uncertainity can only remain equal or decrease once more information is known
- However, for a specific value it may increase \(H(X|y)\leq H(X)\)

\[ H(X|Y) = \sum_yP(y)log_2P(X|y) \]

\[ H(X|y) = -\sum_xP(X|y)log_2P(X|y) \]

Independence

If observing one variable does not change belief in another, they are considered independent.

\[ P(\alpha|\beta) = P(\alpha),\quad P(\alpha\land\beta)=P(\alpha)P(\beta)\] \[ \alpha\perp\beta,\quad\beta\perp\alpha\]

Conditional Independence

Independence can be seen as a dynamic notion
- 2 independent events may become dependent after some evidence
- 2 dependent events may become independent after some evidence
\(\alpha\) is conditionally independent of \(\beta\) given event \(\gamma\) iff:

\[ P(\alpha|\beta\land\gamma) = P(\alpha|\gamma)\]

\[ P(\alpha\land\beta|\gamma) = P(\alpha|\gamma)P(\beta|\gamma)\]

\[ \alpha\perp\beta|\gamma,\quad I_P(\alpha,\gamma,\beta)\]

Note that this is a symmetric property

\[ \alpha\perp\beta|\gamma\equiv \beta\perp\alpha|\gamma\]

Mutual Information

Independence is a special case of the generalised notion of mutual information
Mutual information quantifies the impact of observing one variable on the uncertainty of another
- Will be 0 of the variables are independent

\[ MI(X;Y) = \sum_{x,y}P(x,y)\log_2\left(\frac{P(x,y)}{P(x)P(y)}\right)\]

\[ MI(X;Y) = H(X)-H(X|Y) = H(Y)-H(Y|X)\]

Conditional Mutual Information

\[ MI(X;Y|Z) = \sum_{x,y,z}P(x,y,z)\log_2\left(\frac{P(x,y|z)}{P(x|z)P(y|z)}\right)\]

\[ MI(X;Y|Z) = H(X|Z)-H(X|Y,Z) = H(Y|Z)-H(Y|X,Z)\]

Conditional Probability

Bayes Theorem

\[ P(A|B) = \frac{P(A,B)}{P(B)}\]

Bayes Theorem for Multiple Variables

Can extend Bayes Theorem to multiple variables.
- Example for 3 variables:

\[ P(A,B|C) = \frac{P(A,B,C)}{P(C)}\]

\[ P(A|B,C) = \frac{P(A,B,C)}{P(B,C)}\]

Conditional Probability for Multiple Variables

Can easily manipulate conditional probabilities:
Shifting the conditional bar to the left:

\[ P(A,B|C,D) = P(A|B,C,D)*P(B|C,D)\]

Shifting the conditional bar to the right:

\[ P(A,B|C,D) = \frac{P(A,B,C|D)}{P(C|D)}\]

Changing the order of conditioned variables:
- Can change the order of a joint probability distribution as you please if it is not conditioned.

\[ P(A,B|C,D) = \frac{P(A,B,C,D)}{P(C,D)} = \frac{P(D,A,B,C)}{P(C,D)} = \frac{P(D|A,B,C)*P(A,B,C)}{P(C,D)}\]

Chain Rule

Repeated applications of Bayes conditioning

\[ P(\alpha_1\land...\land\alpha_n)=P(\alpha_1|\alpha_2\land...\land\alpha_n)*P(\alpha_2|\alpha_3\land\alpha_4\land...\land\alpha_n)*...*P(\alpha_n)\]

Example:

\[ P(A,B,C) = P(A)*P(B|A)*P(C|B,A)\]

Law of Total Probability

Example of probability distribution marginalisation

\[ P(\alpha) = \sum^n_{i=1}P(\alpha\land\beta_i)\]