Part I - Inference in DAGs using rejection sampling
Question 1:
“wetgrass” is conditionally independent of “thunder” given knowledge of its parents “rain” and “sprinkler.” If, however, both “rain” and “sprinkler” are unknowns, knowledge about “thunder” does have an affect on the probability of “wet.grass” because “wet.grass” and “thunder” share a common ancestor in “Zeus.angry”. Presumably, if the only information we have is “thunder” we can make an upstream inference re: the probability of “Zeus.angry”. We can then reason downstream given our updated knowledge about “Zeus.angry” to the parent of “wet.grass” (“rain”). In this way, given no other information, knowledge about “thunder” affects the likelihood of “wet.grass”.
If we saw “rain” through the window learning about “thunder” tells us no new information (because “wet.grass” is conditionally ind. of “thunder” given knowledge of its parent “rain”).
*see code below:
flip = function(p) runif(1) < p
#Coded DAG model
rain.sim = function(i) {
Zeus.angry = flip(.3)
Zeus.crying = flip(.2)
thunder = (Zeus.angry && flip(.7)) || flip(.3)
rain = (Zeus.angry && flip(.8)) || (Zeus.crying && flip(.99))
forgetful.gardner = flip(.4)
sprinkler = forgetful.gardner && flip(.7)
wet.grass = (rain && flip(.8)) || (sprinkler && flip(.9)) || flip(.1)
return (c(Zeus.angry = Zeus.angry, Zeus.crying = Zeus.crying,
thunder = thunder, forgetful.gardner = forgetful.gardner, rain = rain,
sprinkler = sprinkler, wet.grass = wet.grass))
}
#Query function using Monte-Carlo
query = function(variable.name, n.sims=500) {
sims = sapply(1:n.sims, FUN=rain.sim)
return(length(which(sims[variable.name,] == T)) / n.sims)
}
query('Zeus.angry')
## [1] 0.316
query('Zeus.crying')
## [1] 0.182
query('forgetful.gardner')
## [1] 0.402
query('thunder')
## [1] 0.442
query('rain')
## [1] 0.398
query('sprinkler')
## [1] 0.26
query('wet.grass')
## [1] 0.572
#Conditional query function using rejection sampling
conditional.query = function(query.var, conditioners, n.sims=500) {
#vector to hold our samples
samples = c()
while (length(samples) < n.sims) {
#DAG modeled here
Zeus.angry = flip(.3)
Zeus.crying = flip(.2)
thunder = (Zeus.angry && flip(.7)) || flip(.3)
rain = (Zeus.angry && flip(.8)) || (Zeus.crying && flip(.99))
forgetful.gardner = flip(.4)
sprinkler = forgetful.gardner && flip(.7)
wet.grass = (rain && flip(.8)) || (sprinkler && flip(.9)) || flip(.1)
#vector of truth values for each state
truth.values = c(Zeus.angry = Zeus.angry, Zeus.crying = Zeus.crying,
thunder = thunder, rain = rain, forgetful.gardner = forgetful.gardner,
sprinkler = sprinkler, wet.grass = wet.grass)
#query.index to find index of the states we're looking for
query.index = which(query.var == names(truth.values))
#conditioner.indices to find indices for conditioners
conditioner.indices = which(names(truth.values) %in% conditioners)
#If we have an instance of all the conditioner states being true then
#save the state of our query var into samples
if (all(truth.values[conditioner.indices] == T)) {
samples = c(samples, truth.values[query.index])
}
}
#Get the proportion of T instances in our total sample space
#in which the conditioners are true
return(length(which(samples==T)) / n.sims)
}
query('wet.grass')
## [1] 0.56
As we learn more information, for example, that there is “thunder”, we can update our model, inferring upstream about “Zeus.angry” which in turn modifies the probability of “rain”, which is a parent of “wet.grass”. We know that P(“thunder”|“Zeus.angry”) = 0.7. Applying Bayes’ rule in which “Zues.angry” == A and “thunder” == B we can model: P(A|B) == [P(B|A)P(A)] / P(B) == [P(B|A)P(A)] / [P(B|A)P(A)+P(B|~A)P(~A) == (.7*.3)/((.7*.3)+(.3*.7)) = .5
conditional.query('Zeus.angry', 'thunder')
## [1] 0.482
So the likelihood of “Zeus.angry” increases (from .3 to ~.5) if we know “thunder”. This, in turn, increases the probability of “wet.grass” via “Zeus.angry’s” daughter, “rain”, which is a parent of “wet.grass”:
conditional.query('wet.grass', 'thunder')
## [1] 0.606
However, given “Zeus.angry”, “thunder” now gives us no new information. So we should see two things. First, adding the condition “Zeus.angry” should increased our likelihood of “wet.grass”. Second, the conditional probability P(“wet.grass”|“Zeus.angry”) should be unaffected by adding P(“wet.grass”|“Zeus.angry”, “thunder”) because “thunder” contains no new information.
conditional.query('wet.grass', 'Zeus.angry')
## [1] 0.782
conditional.query('wet.grass', c('Zeus.angry', 'thunder'))
## [1] 0.788
–> Runtime is not only a function of the number of condtions, but also our starting state, the numer of immediate neighbors (and shared ancestors). Our two lowest runtimes across all condition quantities are “Zeus.angry” and “Zeus.crying” while the longer runtimes tend to be “wet.grass” and “rain” (“forgetful.gardner” is also quite high) as the number of conditioners increases. Could it be that “forgetful.gardner’s” (relatively) long runtime has to do with its relative isolation (it is connected to a single neighbor, which is also only connected to a single neighbor) creating a kind of processing bottleneck?
#Vector of possible conditions
possible.conditions = c("Zeus.angry", "Zeus.crying", "thunder", "rain", "forgetful.gardner", "sprinkler", "wet.grass")
#Function returns a vector of run times
#Params-----1st: the state we're interested in i.e. "wet.grass"
#Params-----2nd: the vector of other possible states
query.timer = function(query.item, conditioners) {
#Vector we'll return which holds runtimes
r.times = c()
count = 1 #iterator
#Remove the state we're interested in from conditions vector
remove = c(query.item)
new.conditioners = setdiff(conditioners, remove)
#Runtime for unconditional probability first
r.times = c(r.times, system.time(query(query.item))["elapsed"])
query(query.item)
#Loop through remaining conditional probabilities incrementing number of conditions
len = length(new.conditioners)
while(count <= len) {
r.times = c(r.times, system.time(conditional.query(query.item,
new.conditioners[1:count]))["elapsed"])
#--For debugging: print(conditional.query(query.item, new.conditioners[1:count]))--#
count = count + 1
}
return(r.times)
}
time.matrix = matrix(ncol=7, nrow=0)
for (n in 1:length(possible.conditions)) {
time.matrix = rbind(time.matrix, query.timer(possible.conditions[n], possible.conditions))
}
rownames(time.matrix) = possible.conditions
colnames(time.matrix) = 1:7
time.matrix
## 1 2 3 4 5 6 7
## Zeus.angry 0.015 0.110 0.350 0.249 0.519 0.815 0.794
## Zeus.crying 0.012 0.076 0.094 0.129 0.220 0.329 0.419
## thunder 0.013 0.076 0.323 0.306 0.773 1.016 1.143
## rain 0.014 0.078 0.316 0.365 0.956 1.334 1.528
## forgetful.gardner 0.015 0.078 0.316 0.580 0.504 1.534 1.382
## sprinkler 0.012 0.108 0.424 0.419 0.414 0.925 1.025
## wet.grass 0.012 0.076 0.315 0.535 0.399 0.969 1.376
#Transposing cols and rows for compatibility with matplot()
time.matrix = t(time.matrix)
matplot(time.matrix, type = c("b"), pch=16, col = 1:7, xlab="Number of conditions", ylab="time 'elapsed'", main = "Run time by starting state and number of conditions")
legend("topleft", legend = colnames(time.matrix), col=1:7, pch=16)
Part II - Multivariate Probability Space
Question 1 (Levy Exercise 3.1): \[\mathbf{Var}(X) = \mathbb{E}[(X - \mathbb{E}(X))^2]\] By expanding the equation we get: \[= \mathbb{E}[(X - \mathbb{E}(X))(X - \mathbb{E}(X))]\] \[= \mathbb{E}[(X^2 - 2X\mathbb{E}(X) + (\mathbb{E}(X))^2]\] \[= \mathbb{E}(X^2) - \mathbb{E}(2X\mathbb{E}(X)) + \mathbb{E}(\mathbb{E}(X))^2\] \[= \mathbb{E}(X^2) - 2\mathbb{E}(X)\mathbb{E}(X)) + \mathbb{E}(\mathbb{E}(X))^2\] \[= \mathbb{E}(X^2) - 2(\mathbb{E}(X))^2 + (\mathbb{E}(X))^2\] \[= \mathbb{E}(X^2) - (\mathbb{E}(X))^2 + (\mathbb{E}(X))^2 + (\mathbb{E}(X))^2\] \[= \mathbb{E}(X^2) - (\mathbb{E}(X))^2\]
Question 2 (Levy Exercise 3.2): \[\mathbf{Var}(X) = \mathbb{E}[(X - \mathbb{E}(X))(Y - \mathbb{E}(Y))]\] We can restate the covariance definition as: \[= \mathbb{E}(XY) - \mathbb{E}(X * \mathbb{E}(Y)) - \mathbb{E}(Y * \mathbb{E}(X)) + \mathbb{E}(X)\mathbb{E}(Y)\] \[= \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)\] However, since X and Y are conditionally independent, \(\mathbb{E}(XY) == \mathbb{E}(X)\mathbb{E}(Y)\), so we end up getting:
\[\mathbb{E}(X)\mathbb{E}(Y) - \mathbb{E}(X)\mathbb{E}(Y) = 0\]