1

flip = function(p) runif(1) < p
rain.sim = function(i) {
  flip = function(p) runif(1) < p
  Zeus.angry = flip(.3)
  Zeus.crying = flip(.2)
  thunder = (Zeus.angry && flip(.7)) || flip(.3)
  rain = (Zeus.angry && flip(.8)) || (Zeus.crying && flip(.99))
  forgetful.gardener = flip(.4)
  sprinkler = forgetful.gardener && flip(.7)
  wet.grass = (rain && flip(.8)) || (sprinkler && flip(.9)) || flip(.1)
    return(c(Zeus.angry = Zeus.angry, Zeus.crying = Zeus.crying, thunder = thunder, 
             forgetful.gardener = forgetful.gardener, sprinkler = sprinkler, rain = rain, wet.grass = wet.grass))
}
conditions = sapply(1:100000, FUN=rain.sim)
conditions = t(conditions)
head(conditions)
##      Zeus.angry Zeus.crying thunder forgetful.gardener sprinkler  rain
## [1,]       TRUE       FALSE    TRUE              FALSE     FALSE  TRUE
## [2,]      FALSE        TRUE   FALSE               TRUE      TRUE  TRUE
## [3,]      FALSE        TRUE    TRUE               TRUE      TRUE  TRUE
## [4,]       TRUE       FALSE    TRUE              FALSE     FALSE  TRUE
## [5,]      FALSE       FALSE   FALSE              FALSE     FALSE FALSE
## [6,]      FALSE       FALSE   FALSE              FALSE     FALSE FALSE
##      wet.grass
## [1,]      TRUE
## [2,]      TRUE
## [3,]      TRUE
## [4,]      TRUE
## [5,]      TRUE
## [6,]     FALSE

(a)

(b)
What is the relationship between thunder and wet.grass? If you hear thunder while in bed at night, have you learned something about how likely it is that the grass will be wet in the morning? Why or why not?
If we want to know whether knowledge of thunder affects the probability of wet.grass, we must determine whether the P(wet.grass) = P(wet.grass|thunder).
First we can show that the knowledge of thunder affects the probability of anger. The probability of anger (P(anger)) is .3. However, the P(anger|thunder) is .447.

This can be proven mathematically.

P.anger = .3

P.thunder = P.anger * .7 OR .3
P.thunder = .3 * .7 OR .3
P.thunder = .21 OR .3

In non-mutually-exclusive events, you can add the probabilities of the two events together, but you must subtract the intersect. P.thunder = .21 + .3 - P(anger intersect .3)

P.thunder = .21 + .3 - ((.3*.7) * .3)
P.thunder
## [1] 0.447

P.thunder (.447) matches the distribution of thunder = true in conditions.

thunder.true = conditions[which(conditions[,3] == T),]
nrow(thunder.true)/nrow(conditions)
## [1] 0.4473

The probability of anger given thunder can be determined using the following formula for dependent events:

P(A|B) = (P(A) * P(B|A))/P(B) P(anger|thunder) = (P(anger) * P(thunder|anger))/P(thunder)

P.thunder.given.anger = .7
P.anger.given.thunder = (P.anger * P.thunder.given.anger)/P.thunder
P.anger.given.thunder
## [1] 0.4697987

As P.anger.given.thunder (.470) is much higher than P.anger (.3), it is clear that the probability of anger is influenced by the knowledge of thunder. As anger is a parent to wet.grass, the probability of wet.grass will also change given this new information.

Unfortunately, and I don’t know why, P.anger.given.thunder does not match the distribution of anger = true in the subset of conditions where thunder = true.

anger.given.thunder = thunder.true[which(thunder.true[,1] ==T),]
anger.given.thunder = thunder.true[which(thunder.true[,1] ==T),]
nrow(anger.given.thunder)/nrow(thunder.true)
## [1] 0.5309188

(c)
What if you also saw rain through the window - should learning about whether there was thunder still influence your best guess about the probability of wet grass?

If rain (a direct parent to wet.grass) is known, then thunder (a non-parent to any of wet.grasses other parents) no longer has any influence on the probability of wet.grass.

(d)
For each variable, run the query() function to find out its approximate unconditional probability. (Or, more precisely, its approximate probability conditional on this model.) Make sure that you surround the variable name in quotes (single or double), both here and when working with the conditional.query() function below.

query = function(variable.name, n.sims = 100000) {
    sims = sapply(1:n.sims, FUN=rain.sim)
    proportion = length(which(sims[variable.name,] == TRUE))/n.sims
    return(proportion)
}
P.Zeus.angry.q = query('Zeus.angry')
P.Zeus.crying.q = query('Zeus.crying')
P.thunder.q = query('thunder')
P.gardener.q = query('forgetful.gardener')
P.sprinkler.q = query('sprinkler')
P.rain.q = query('rain')
P.wet.grass.q = query('wet.grass')
P.Zeus.angry.q
## [1] 0.30021
P.Zeus.crying.q
## [1] 0.20181
P.thunder.q
## [1] 0.44809
P.gardener.q
## [1] 0.40215
P.sprinkler.q
## [1] 0.28003
P.rain.q
## [1] 0.39087
P.wet.grass.q
## [1] 0.53489

(e)
What is the estimated P(wet grass)? What about P(wet grass|thunder)?, P(wet grass|thunder,Zeus angry)? $P(|)? Explain, in terms of the conditional independence relationships encoded in the DAG you drew above, why the approximate probabilities are distributed in this way.

P(wet.grass) = P.wet.grass.q

P.wet.grass.q
## [1] 0.53489

P(wetgrass|thunder)

conditional.query = function(query.variable, conditioners, n.sims=1000) {
    samples = c()
    while (length(samples) < n.sims) {
      Zeus.angry = flip(.3)
      Zeus.crying = flip(.2)
      thunder = (Zeus.angry && flip(.7)) || flip(.3)
      rain = (Zeus.angry && flip(.8)) || (Zeus.crying && flip(.99))
      forgetful.gardener = flip(.4)
      sprinkler = forgetful.gardener && flip(.7)
      wet.grass = (rain && flip(.8)) || (sprinkler && flip(.9)) || flip(.1)
      truth.values = c(Zeus.angry = Zeus.angry, Zeus.crying = Zeus.crying, thunder = thunder, 
                       forgetful = forgetful.gardener, sprinkler = sprinkler, rain = rain, wet.grass = wet.grass)
      query.index = which(query.variable == names(truth.values))
      conditioner.indices = which(names(truth.values) %in% conditioners)
      if (all(truth.values[conditioner.indices] == T)) { 
        samples = c(samples, truth.values[query.index])
      }
      # accept sample only if all conditioning statements are true
      # repeat until you have n.sims-many samples
    } 
    return(length(which(samples == T)) / n.sims) # proportion 'true' among accepted samples
}
P.wet.grass.given.thunder.cq = conditional.query('wet.grass','thunder')
P.wet.grass.given.thunder.cq
## [1] 0.592

P(wet.grass|thunder,anger)

P.wet.grass.given.thunder.and.anger.cq = conditional.query('wet.grass',c('thunder','Zeus.angry'))
P.wet.grass.given.thunder.and.anger.cq
## [1] 0.787

To further substantiate my answer in c, we can see that the P(wet.grass|anger) is the same as P(wet.grass|thunder,anger), because anger is an ancestor of wet.grass and thunder is not.

P(wet.grass|anger)

P.wet.grass.given.anger.cq = conditional.query('wet.grass','Zeus.angry')
P.wet.grass.given.anger.cq
## [1] 0.783

(f)
How is the performance (runtime) of conditional.query() affected as you increase the number of items in the conditioners vector that you feed to this function? Using a for-loop (or something fancier if you want), test this question systematically for conditioners vectors with lengths from 0 to 7. Use system.time() to time the results.

weather.conditions=c('','Zeus.angry','Zeus.crying','thunder','forgetful.gardener','sprinkler','rain','wet.grass')

times=c()
for (i in 1:8) {times=c(times, system.time(conditional.query('wet.grass',weather.conditions[1:i])))}
elapsed.times=times[c(3,8,13,18,23,28,33,38)]
elapsed.times
## elapsed elapsed elapsed elapsed elapsed elapsed elapsed elapsed 
##   0.148   0.246   0.903   1.457   1.108   3.879   3.752   4.090

(g)
Explain, in terms of (a) the unconditional probability of the conditioning statements and (b) the implementation of conditional.query() using a while-loop, why the result in (1f) obtains.

Conditional.query is a function which uses a while-loop to run the rain.sim function until the requested number of samples are found where the conditioners are true (in this case 1000). If the unconditional probability of the conditioner is low, then it will take longer for the system to find 1000 samples where that conditioner is true. This is why we see big leaps in the elapsed time in f when Zeus.crying and sprinkler are added to the set of conditioners. These conditions are the most infrequent and thus slow the system down.

The unconditional probabilities of the conditioning statements were formumlated in C, and are copied here:

P.Zeus.angry.q
## [1] 0.30021
P.Zeus.crying.q
## [1] 0.20181
P.thunder.q
## [1] 0.44809
P.gardener.q
## [1] 0.40215
P.sprinkler.q
## [1] 0.28003
P.rain.q
## [1] 0.39087
P.wet.grass.q
## [1] 0.53489

If we order the conditioners in descending order of their unconditional probability, we could run the system.time query above and should see an ascending order of elapsed times.

weather.conditions=c('','wet.grass','thunder','forgetful.gardener','rain','Zeus.angry','sprinkler','Zeus.crying')

times=c()
for (i in 1:8) {times=c(times, system.time(conditional.query('wet.grass',weather.conditions[1:i])))}
elapsed.times=times[c(3,8,13,18,23,28,33,38)]
elapsed.times
## elapsed elapsed elapsed elapsed elapsed elapsed elapsed elapsed 
##   0.157   0.215   0.297   0.280   0.330   0.415   0.993   3.979

Multivariate Probability Practice
1
Show from the definition of the variance as Var(X)≡𝔼[(X−E(X))2] that it can equivalently be written as Var(X)=𝔼[X2]−E[X]2, which we stated without proof in Section 2.9.2.

VAR(X) = E[(X-E[X])2] #original equation
VAR(X) = E[X2-2XE[X]+E[X]2] #expand the quadratic
VAR(X) = E[X2]-E[2]E[X]E[E[X]]+E[E[X]2] #apply E to each component
VAR(X) = E[X2]-2E[X]E[X]+E[X]2 #simplify (E[E[X]]=E[X])
VAR(X) = E[X2]-E[X]2 #reduce -2E[X]2+1E[X]2 = -1E[X]2

2
Use linearity of the expectation [section 3.3.1] to prove that if two discrete random variables X and Y are conditionally independent given your state of knowledge, then Cov(X,Y)=0. (Hint: you can rewrite P(X=x,Y=y)=∑x,y[x×p(X=x)×y×p(Y=y)] as [∑xx×p(X=x)]×[∑yy×p(Y=y)], since X and p(X=x) are constant with respect to y, and vice versa.)

There are many ways to rewrite the covariance equation. In Levy 3.3.2, we are offered equation A:

A: Cov(X,Y)=E[(X-E[X])(Y-E[Y])]

This can be simplified to equation B:

A: Cov(X,Y)=E[(X-E[X])(Y-E[Y])]
Cov(X,Y)=E[XY-E[X]Y-E[Y]X+E[X]E[Y]]
Cov(X,Y)=E[XY]-E[X]E[Y]-E[Y]E[X]+E[X]E[Y]
B: Cov(X,Y)=E[XY]-E[X]E[Y]

If two discrete random variables X and Y are conditionally independent given our state of knowledge, then by definition C holds true.

C: P(X|Y)=P(Y|X)

Equation D is another way to write out independence:

D: P(X,Y)=P(X)P(Y)

If E(XY) = P(X,Y) and E[X]E[y] = P(X)P(Y), then the Covariance(X,Y) MUST equal zero.

B: Cov(X,Y)=E[XY]-E[X]E[Y]
0 = E[XY]-E[X]E[Y]
E[XY]=E[X]E[Y]

This will only hold true if X and Y are independent.