Part I - Inference in DAGs using rejection sampling

Question 1:

  1. “wetgrass” is conditionally independent of “thunder” given knowledge of its parents “rain” and “sprinkler.” If, however, both “rain” and “sprinkler” are unknowns, knowledge about “thunder” does have an affect on the probability of “wet.grass” because “wet.grass” and “thunder” share a common ancestor in “Zeus.angry”. Presumably, if the only information we have is “thunder” we can make an upstream inference re: the probability of “Zeus.angry”. We can then reason downstream given our updated knowledge about “Zeus.angry” to the parent of “wet.grass” (“rain”). In this way, given no other information, knowledge about “thunder” affects the likelihood of “wet.grass”.

  2. If we saw “rain” through the window learning about “thunder” tells us no new information (because “wet.grass” is conditionally ind. of “thunder” given knowledge of its parent “rain”).

  3. *see code below:

flip = function(p) runif(1) < p
#Coded DAG model
rain.sim = function(i) {
  Zeus.angry = flip(.3)
  Zeus.crying = flip(.2)
  thunder = (Zeus.angry && flip(.7)) || flip(.3)
  rain = (Zeus.angry && flip(.8)) || (Zeus.crying && flip(.99))
  forgetful.gardner = flip(.4)
  sprinkler = forgetful.gardner && flip(.7)
  wet.grass = (rain && flip(.8)) || (sprinkler && flip(.9)) || flip(.1)
  return (c(Zeus.angry = Zeus.angry, Zeus.crying = Zeus.crying,
  thunder = thunder, forgetful.gardner = forgetful.gardner, rain = rain,
  sprinkler = sprinkler, wet.grass = wet.grass))
}
 
#Query function using Monte-Carlo
query = function(variable.name, n.sims=500) {
  sims = sapply(1:n.sims, FUN=rain.sim)
  return(length(which(sims[variable.name,] == T)) / n.sims)
}
query('Zeus.angry')
## [1] 0.316
query('Zeus.crying')
## [1] 0.182
query('forgetful.gardner')
## [1] 0.402
query('thunder')
## [1] 0.442
query('rain')
## [1] 0.398
query('sprinkler')
## [1] 0.26
query('wet.grass')
## [1] 0.572
#Conditional query function using rejection sampling
conditional.query = function(query.var, conditioners, n.sims=500) {
  #vector to hold our samples
  samples = c()

  while (length(samples) < n.sims) {
    #DAG modeled here
    Zeus.angry = flip(.3)
    Zeus.crying = flip(.2)
    thunder = (Zeus.angry && flip(.7)) || flip(.3)
    rain = (Zeus.angry && flip(.8)) || (Zeus.crying && flip(.99))
    forgetful.gardner = flip(.4)
    sprinkler = forgetful.gardner && flip(.7)
    wet.grass = (rain && flip(.8)) || (sprinkler && flip(.9)) || flip(.1)

    #vector of truth values for each state
    truth.values = c(Zeus.angry = Zeus.angry, Zeus.crying = Zeus.crying,
    thunder = thunder, rain = rain, forgetful.gardner = forgetful.gardner,
    sprinkler = sprinkler, wet.grass = wet.grass)

    #query.index to find index of the states we're looking for
    query.index = which(query.var == names(truth.values))
    #conditioner.indices to find indices for conditioners
    conditioner.indices = which(names(truth.values) %in% conditioners)
    #If we have an instance of all the conditioner states being true then
    #save the state of our query var into samples
    if (all(truth.values[conditioner.indices] == T)) {
      samples = c(samples, truth.values[query.index])
    }
  }
  #Get the proportion of T instances in our total sample space
  #in which the conditioners are true
  return(length(which(samples==T)) / n.sims)
}
  1. The conditional probability of “wet.grass” increases as we add more information. The unconditional probability of “wet.grass” is simply:
query('wet.grass')
## [1] 0.56

As we learn more information, for example, that there is “thunder”, we can update our model, inferring upstream about “Zeus.angry” which in turn modifies the probability of “rain”, which is a parent of “wet.grass”. We know that P(“thunder”|“Zeus.angry”) = 0.7. Applying Bayes’ rule in which “Zues.angry” == A and “thunder” == B we can model: P(A|B) == [P(B|A)P(A)] / P(B) == [P(B|A)P(A)] / [P(B|A)P(A)+P(B|~A)P(~A) == (.7*.3)/((.7*.3)+(.3*.7)) = .5

conditional.query('Zeus.angry', 'thunder')
## [1] 0.482

So the likelihood of “Zeus.angry” increases (from .3 to ~.5) if we know “thunder”. This, in turn, increases the probability of “wet.grass” via “Zeus.angry’s” daughter, “rain”, which is a parent of “wet.grass”:

conditional.query('wet.grass', 'thunder')
## [1] 0.606

However, given “Zeus.angry”, “thunder” now gives us no new information. So we should see two things. First, adding the condition “Zeus.angry” should increased our likelihood of “wet.grass”. Second, the conditional probability P(“wet.grass”|“Zeus.angry”) should be unaffected by adding P(“wet.grass”|“Zeus.angry”, “thunder”) because “thunder” contains no new information.

conditional.query('wet.grass', 'Zeus.angry')
## [1] 0.782
conditional.query('wet.grass', c('Zeus.angry', 'thunder'))
## [1] 0.788
  1. Unsurprisingly, the more conditions we add to the computation, the longer our runtime. Let’s look quickly to see if the run times change depending on the state we’re interested in making predictions about – that is, if our state is more upstream (“Zeus.angry”“) or downstream (”wet.grass“) along with the number of conditions we consider

–> Runtime is not only a function of the number of condtions, but also our starting state, the numer of immediate neighbors (and shared ancestors). Our two lowest runtimes across all condition quantities are “Zeus.angry” and “Zeus.crying” while the longer runtimes tend to be “wet.grass” and “rain” (“forgetful.gardner” is also quite high) as the number of conditioners increases. Could it be that “forgetful.gardner’s” (relatively) long runtime has to do with its relative isolation (it is connected to a single neighbor, which is also only connected to a single neighbor) creating a kind of processing bottleneck?

#Vector of possible conditions
possible.conditions = c("Zeus.angry", "Zeus.crying", "thunder", "rain", "forgetful.gardner", "sprinkler", "wet.grass")
#Function returns a vector of run times
  #Params-----1st: the state we're interested in i.e. "wet.grass"
  #Params-----2nd: the vector of other possible states
query.timer = function(query.item, conditioners) {
  #Vector we'll return which holds runtimes
  r.times = c()
  count = 1 #iterator
  
  #Remove the state we're interested in from conditions vector
  remove = c(query.item)
  new.conditioners = setdiff(conditioners, remove)
  
  #Runtime for unconditional probability first
  r.times = c(r.times, system.time(query(query.item))["elapsed"])
  query(query.item)
  
  #Loop through remaining conditional probabilities incrementing number of conditions
  len = length(new.conditioners)
  while(count <= len) {
    r.times = c(r.times, system.time(conditional.query(query.item, 
                                                       new.conditioners[1:count]))["elapsed"])
    #--For debugging: print(conditional.query(query.item, new.conditioners[1:count]))--#
    count = count + 1
  }
  return(r.times)
}

time.matrix = matrix(ncol=7, nrow=0)
for (n in 1:length(possible.conditions)) {
  time.matrix = rbind(time.matrix, query.timer(possible.conditions[n], possible.conditions))
}
rownames(time.matrix) = possible.conditions
colnames(time.matrix) = 1:7
time.matrix
##                       1     2     3     4     5     6     7
## Zeus.angry        0.015 0.110 0.350 0.249 0.519 0.815 0.794
## Zeus.crying       0.012 0.076 0.094 0.129 0.220 0.329 0.419
## thunder           0.013 0.076 0.323 0.306 0.773 1.016 1.143
## rain              0.014 0.078 0.316 0.365 0.956 1.334 1.528
## forgetful.gardner 0.015 0.078 0.316 0.580 0.504 1.534 1.382
## sprinkler         0.012 0.108 0.424 0.419 0.414 0.925 1.025
## wet.grass         0.012 0.076 0.315 0.535 0.399 0.969 1.376
#Transposing cols and rows for compatibility with matplot()
time.matrix = t(time.matrix)
matplot(time.matrix, type = c("b"), pch=16, col = 1:7, xlab="Number of conditions", ylab="time 'elapsed'", main = "Run time by starting state and number of conditions")
legend("topleft", legend = colnames(time.matrix), col=1:7, pch=16)

  1. Our function conditional.query() is a kind of brute force approach to generating a vector of Truth-functional values. conditional.query() populates T or F values in our vector “samples” only in instances in which all the other conditions were true. So if we’re interested in the value of “wet.grass” given “Zeus.angry” and “rain”, conditional.query() will only add the value of “wet.grass” if both “Zeus.angry” and “rain” evaluate to true. This form of iteration takes a really long time, especially as we add more conditions, because that means there are more opportunites for one of the conditions to evaluate to false (meaning we don’t add anything to our samples vector). Once we’ve finally reached n instances of all our conditions evaluating T, we can stop iterating in the while loop and take the proportion of T values (for the state we’re interested) with the length of the vector. This is to say, State A is T X% of the time all of its conditions are also T.

Part II - Multivariate Probability Space

Question 1 (Levy Exercise 3.1): \[\mathbf{Var}(X) = \mathbb{E}[(X - \mathbb{E}(X))^2]\] By expanding the equation we get: \[= \mathbb{E}[(X - \mathbb{E}(X))(X - \mathbb{E}(X))]\] \[= \mathbb{E}[(X^2 - 2X\mathbb{E}(X) + (\mathbb{E}(X))^2]\] \[= \mathbb{E}(X^2) - \mathbb{E}(2X\mathbb{E}(X)) + \mathbb{E}(\mathbb{E}(X))^2\] \[= \mathbb{E}(X^2) - 2\mathbb{E}(X)\mathbb{E}(X)) + \mathbb{E}(\mathbb{E}(X))^2\] \[= \mathbb{E}(X^2) - 2(\mathbb{E}(X))^2 + (\mathbb{E}(X))^2\] \[= \mathbb{E}(X^2) - (\mathbb{E}(X))^2 + (\mathbb{E}(X))^2 + (\mathbb{E}(X))^2\] \[= \mathbb{E}(X^2) - (\mathbb{E}(X))^2\]

Question 2 (Levy Exercise 3.2): \[\mathbf{Var}(X) = \mathbb{E}[(X - \mathbb{E}(X))(Y - \mathbb{E}(Y))]\] We can restate the covariance definition as: \[= \mathbb{E}(XY) - \mathbb{E}(X * \mathbb{E}(Y)) - \mathbb{E}(Y * \mathbb{E}(X)) + \mathbb{E}(X)\mathbb{E}(Y)\] \[= \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)\] However, since X and Y are conditionally independent, \(\mathbb{E}(XY) == \mathbb{E}(X)\mathbb{E}(Y)\), so we end up getting:

\[\mathbb{E}(X)\mathbb{E}(Y) - \mathbb{E}(X)\mathbb{E}(Y) = 0\]