Context

Let’s start with an student’s quality of letter recommendation example that involves just 3 random variables, what we have a student who has,

  • by a variable representing his intelligence: could be high or low
  • the student is taking a class. The class might be difficult or not so; this random variable: easy and hard
  • then, there is the grade: assume A, B, and C

Joint Distribution

Now here’s an example, joint distribution over this over this set of three random variables. So this is an example of P of I, D, G.

We have total \(2 \times 2 \times 3 = 12\) joint combinations

Conditioning

Assume that we observe that the student got an A.

\(\rightarrow\) an assignment to the variable G which is G1.

And that immediately eliminates all possible assignments, not consistent with the \(G_1\) observations.

\(\rightarrow\) a reduced probability distribution, called reduction

\(\Longrightarrow\) Unnormalized measures summed up not equal 1 so renormalize to get 1 of sum

Marginalization

What is \(\sum_I P(I,D)\)?

\(\sum_I P(I,D)\) = \(P(D)\)

As a concrete example of marginalization, say you have throw two 6-sided dice, \(D_1\) and \(D_2\). This defines a joint probability distribution \(P(D_1, D_2)\). The probability that \(D_2\) =1 is equals to

\(\sum_{i=1}^6 P(D_1 = i, D_2 = 1)\), since \(D_1\) can only take on the values 1 to 6.