Here is a little Greek-gods-and-weather DAG written up in R, with an example of how to take “forward samples” from the full joint distribution on its variables.
flip = function(p) runif(1) < p
rain.sim = function(i) {
flip = function(p) runif(1) < p
Zeus.angry = flip(.3)
Zeus.crying = flip(.2)
thunder = (Zeus.angry && flip(.7)) || flip(.3)
rain = (Zeus.angry && flip(.8)) || (Zeus.crying && flip(.99))
forgetful.gardener = flip(.4)
sprinkler = forgetful.gardener && flip(.7)
wet.grass = (rain && flip(.8)) || (sprinkler && flip(.9)) || flip(.1)
return(c(Zeus.angry = Zeus.angry, Zeus.crying = Zeus.crying, thunder = thunder, rain=rain,
forgetful.gardener = forgetful.gardener, sprinkler = sprinkler, wet.grass = wet.grass))
}
sapply(1:10, FUN=rain.sim)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## Zeus.angry FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Zeus.crying FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## thunder FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## rain FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## forgetful.gardener TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## sprinkler TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## wet.grass TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [,10]
## Zeus.angry TRUE
## Zeus.crying FALSE
## thunder FALSE
## rain TRUE
## forgetful.gardener FALSE
## sprinkler FALSE
## wet.grass TRUE
thunder and wet.grass? If you hear thunder while in bed at night, have you learned something about how likely it is that the grass will be wet in the morning? Why or why not? They are not (unconditionally) independent. Yes. If one hears thunder, he knows that it is more likely that Zeus is angry, which makes it more likely that it is now raining, which in turn means that it is more likely that the grass will be wet in the morning.
No. thunder and wet.grass are conditionally independent given rain, since the only path between thunder and wet.grass passes rain, which is the parent of wet.grass. Hence rain blocks the information flow and therefore learning that there was thunder in addition to that it was raining does not influence the probability of wet grass.
Here’s an example of how to estimate probabilities from this DAG using Monte Carlo sampling:
query = function(variable.name, n.sims = 500) {
sims = sapply(1:n.sims, FUN=rain.sim)
proportion = length(which(sims[variable.name,] == TRUE))/n.sims
return(proportion)
}
query('Zeus.angry')
## [1] 0.3
query('thunder')
## [1] 0.44
query() function to find out its approximate unconditional probability. (Or, more precisely, its approximate probability conditional on this model.) Make sure that you surround the variable name in quotes (single or double), both here and when working with the conditional.query() function below.query('Zeus.angry')
## [1] 0.328
query('Zeus.crying')
## [1] 0.208
query('thunder')
## [1] 0.44
query('rain')
## [1] 0.36
query('forgetful.gardener')
## [1] 0.418
query('sprinkler')
## [1] 0.234
query('wet.grass')
## [1] 0.542
Most often, what we’re doing with a DAG is observing some ‘downstream’ variables (effects) and trying to infer the values of some ‘upstream’ variables (causes). This means that the unconditional probability is not really what’s of interest, but rather the conditional probability given our observations. Here’s an example of how to write a conditional query function that will estimate conditional probability using rejection sampling - a simple and reliable, but frequently inefficient, method of Monte Carlo approximation.
conditional.query = function(query.variable, conditioners, n.sims=1000) {
samples = c()
while (length(samples) < n.sims) {
Zeus.angry = flip(.3)
Zeus.crying = flip(.2)
thunder = (Zeus.angry && flip(.7)) || flip(.3)
rain = (Zeus.angry && flip(.8)) || (Zeus.crying && flip(.99))
forgetful.gardener = flip(.4)
sprinkler = forgetful.gardener && flip(.7)
wet.grass = (rain && flip(.8)) || (sprinkler && flip(.9)) || flip(.1)
truth.values = c(Zeus.angry = Zeus.angry, Zeus.crying = Zeus.crying, thunder = thunder, rain=rain,
forgetful = forgetful.gardener, sprinkler = sprinkler, wet.grass = wet.grass)
query.index = which(query.variable == names(truth.values))
conditioner.indices = which(names(truth.values) %in% conditioners)
if (all(truth.values[conditioner.indices] == T)) {
samples = c(samples, truth.values[query.index])
}
# accept sample only if all conditioning statements are true
# repeat until you have n.sims-many samples
}
return(length(which(samples == T)) / n.sims) # proportion 'true' among accepted samples
}
query('wet.grass')
## [1] 0.584
conditional.query('wet.grass', c('thunder'))
## [1] 0.635
conditional.query('wet.grass', c('thunder', 'Zeus.angry'))
## [1] 0.791
conditional.query('wet.grass', c('Zeus.angry'))
## [1] 0.807
We can see that \(P(\mathbf{wet\ grass}|\mathbf{thunder})\) is greater than \(P(\mathbf{wet\ grass})\), because knowing thunder increases the probability of Zeus.angry, which in turn increases the probability of rain and ultimately wet.grass. For the same reason, we can see \(P(\mathbf{wet\ grass}|\mathbf{Zeus\ angry})\) is also greater than \(P(\mathbf{wet\ grass})\).
However, Zeus.angry is the parent of thunder and thus blocks the information flow from thunder to wet.grass, i.e., thunder and wet.grass are conditionally independent given Zeus.angry. Therefore we can see that \(P(\mathbf{wet\ grass}|\mathbf{thunder},\mathbf{Zeus\ angry})\) is roughly the same as \(P(\mathbf{wet\ grass}|\mathbf{Zeus\ angry})\), i.e., knowing thunder in addition to Zeus.angry does not give more information about wet.grass.
conditional.query() affected as you increase the number of items in the conditioners vector that you feed to this function? Using a for-loop (or something fancier if you want), test this question systematically for conditioners vectors with lengths from 0 to 7. Use system.time() to time the results.Ideally, for each variable \(X\) and number \(i\), we want to calculate the average time it takes to calculate the conditional probability of \(X\) given \(i\) variables. However, this will take too long (e.g., when \(i\)=3, there are \(7 \choose 3\)=35 different sets of 3 variables, and there are 7 possibilities for \(X\).) Instead, we will just randomly fix a variable and the order in which each variable is added to the conditioner, and see how the runtime changes. We do this process for 3 times.
rain.vars <- c('Zeus.angry', 'Zeus.crying', 'thunder', 'rain', 'forgetful.gardener', 'sprinkler', 'wet.grass')
track.time <- function(variables){
variables <- sample(variables, 7) # permutate the variables
times <- rep(0, 8)
# unconditional probability of the last variable
times[1] <- system.time(conditional.query(variables[7], c()))['elapsed']
# probability of the last variable, condition on the first i variables
for (i in 1:7){
times[i+1] <- system.time(conditional.query(variables[7], variables[1:i]))['elapsed']
}
names(times) <- paste(0:7, c('unconditional', variables))
return(times)
}
track.time(rain.vars) # variable 7 conditioned on the first i variables
## 0 unconditional 1 Zeus.crying 2 Zeus.angry
## 0.108 0.263 0.807
## 3 rain 4 wet.grass 5 sprinkler
## 0.780 0.858 2.743
## 6 forgetful.gardener 7 thunder
## 2.554 3.353
track.time(rain.vars) # variable 7 conditioned on the first i variables
## 0 unconditional 1 forgetful.gardener 2 Zeus.angry
## 0.074 0.081 0.167
## 3 thunder 4 wet.grass 5 Zeus.crying
## 0.207 0.265 1.017
## 6 rain 7 sprinkler
## 1.018 3.313
track.time(rain.vars) # variable 7 conditioned on the first i variables
## 0 unconditional 1 Zeus.crying 2 forgetful.gardener
## 0.075 0.259 0.236
## 3 wet.grass 4 rain 5 thunder
## 0.274 0.265 0.614
## 6 sprinkler 7 Zeus.angry
## 1.699 3.404
We can see that as the number of items in the conditioners vector increases, it generally takes longer to do the query.
conditional.query() using a while-loop, why the result in (1f) obtains.As the number of items in the conditioners vector increases, the unconditional probability of the conditioning statements decreases, since more constraints need to be satisfied. This means that a random Sample (by Sample I mean the collection of all the random variables; a sample is the value of the queried variable in a Sample) generated from the graphical model is less likely to satisfy the constraints, which means that on average it will take more Samples before the conditional.query() has collected enough samples. Therefore it takes longer to run conditional.query() when the number of items in the conditioners vector increases.
\[ \begin{aligned} \mathbf{Var}(X) & ≡ \mathbb{E}[(X − \mathbf{E}[X]^2)] \\ & = \mathbb{E}[X^2 − 2 * X * \mathbf{E}[X] + \mathbf{E}[X]^2] \\ & = \mathbb{E}[X^2] − \mathbb{E}[2 * X * \mathbf{E}[X]] + \mathbb{E}[\mathbf{E}[X]^2] \quad \text{ (linearity)} \\ & = \mathbb{E}[X^2] − 2\mathbf{E}[X]\mathbb{E}[X] + \mathbf{E}[X]^2 \quad \text{ (linearity, $\mathbf{E}[X]$ is a constant)} \\ & = \mathbb{E}[X^2] − 2\mathbf{E}[X]^2 + \mathbf{E}[X]^2 \\ & = \mathbb{E}[X^2] − \mathbf{E}[X]^2 \end{aligned} \]
I do not know whether the typesetting difference between \(\mathbb{E}\) and \(\mathbf{E}\) is intentional. Here I treat \(\mathbb{E}\) as the process of calculating the expectation and use \(\mathbf{E}\) to denote the result of this process, which kind of makes it clearer that \(\mathbf{E}[X]\) is just a constant (like \(2\)).
Of course, this distinction is not really necessary.
First of all, similar to the previous exercise, we show that \(\mathbf{Cov}(X, Y)=\mathbb{E}[XY] − \mathbf{E}[X]\mathbf{E}[Y]\):
\[ \begin{aligned} \mathbf{Cov}(X, Y) & ≡ \mathbb{E}[(X − \mathbf{E}[X])(Y - \mathbf{E}[Y])]\\ & = \mathbb{E}[XY − X * \mathbf{E}[Y] − \mathbf{E}[X] * Y + \mathbf{E}[X]\mathbf{E}[Y]] \\ & = \mathbb{E}[X^2] − \mathbf{E}[Y]\mathbb{E}[X] - \mathbf{E}[X]\mathbb{E}[Y] + \mathbb{E}[\mathbf{E}[X]\mathbf{E}[Y]] \\ & = \mathbb{E}[X^2] − \mathbf{E}[Y]\mathbf{E}[X] - \mathbf{E}[X]\mathbf{E}[Y] + \mathbf{E}[X]\mathbf{E}[Y] \\ & = \mathbb{E}[XY] − \mathbf{E}[X]\mathbf{E}[Y] \quad \text{ (!)} \end{aligned} \] Next, we calculate \(\mathbb{E}[XY]\) by definition \[ \begin{aligned} \mathbb{E}[XY] & ≡ \sum_{x,y} xyP(X = x, Y = y) \\ & = \sum_{x,y} xyP(X = x)P(Y = y) \quad \text{ (indenpendence of $X$ and $Y$)}\\ & = \sum_x\sum_y xP(X = x) \times yP(Y = y) \\ & = \left(\sum_x x P(X = x)\right) \times \left(\sum_y yP(Y = y)\right) \quad \text{ (constant wrt each other)}\\ & = \mathbf{E}[X]\mathbf{E}[Y] \end{aligned} \]
Now put this back to (!), we have \[ \begin{aligned} \mathbf{Cov}(X, Y) & = \mathbb{E}[XY] − \mathbf{E}[X]\mathbf{E}[Y] \\ & = \mathbf{E}[X]\mathbf{E}[Y] −\mathbf{E}[X]\mathbf{E}[Y] \\ & = 0 \end{aligned} \] which finishes the proof.