Exercises

  1. Levy ch.1, pp.34-6, exercise 2.2, modified: “Give an example in words - involving language understanding, and one that was not specifically discussed in class- where two events \(A\) and \(B\) are conditionally independent given some state of knowledge \(C\), but when another piece of knowledge \(D\) is learned, \(A\) and \(B\) lose conditional independence.”

Imagine we hear a Spanish translation of the sentence fragment “The mouse was chased by …” Let \(A\) be the event that the noun of the following NP is masculine and \(B\) be the event that the noun of the following NP is plural. Given the state of knowledge at this point, \(C\), \(A\) and \(B\) are conditionally independent.

However, suppose we further learn \(D\) that the article of this NP starts with ‘l,’ \(A\) and \(B\) lose conditional independence, because while \(p(B | C,D)<1\) (the noun could still be either singular or plural),
\(p(B|A, C, D)=1\) (if the noun is masculine and the definite article starts with ‘l,’ then it has to be the plural article ‘los,’ since the singular ‘el’ does not start with ‘l.’)

  1. Levy ch.1, pp.34-6, exercise 2.3:

The probability of drawing any particular permutation of ‘tea’ is \(\pi_t\cdot\pi_e\cdot\pi_a\). Since there are \(3!\) different permutations, the probability of being able to spell ‘tea’ is \(3!\cdot\pi_t\cdot\pi_e\cdot\pi_a\).

Similarly, the probability of drawing any particular permutation of ‘tee’ is \(\pi_t\cdot\pi_e\cdot\pi_e\), but there are only 3 different permutations (‘tee’, ‘ete’ and ‘eet’). Hence the probability of being able to spell ‘tee’ is \(3\cdot\pi_t\cdot\pi_e\cdot\pi_e\).

p.e <- 0.126 
p.t <- 0.099
p.a <- 0.082
factorial(3) * p.t * p.e * p.a  # probability of 'tea'
## [1] 0.006137208
3 * p.t * p.e * p.e  # probability of 'tee'
## [1] 0.004715172

The specification is to make sure that drawing out letters does not affect the probability distribution of the next letter, since there are infinitely many copies. If there is only one copy (or finitely many copies for that matter), then sampling (drawing letters) without replacement means that the probability distribution changes, however slightly, after drawing each letter. In this case, since we do not know the total number of letters in the copy, we will not be able to compute the exact probability. However, since the number of each letter is all very large, sampling a couple of letters will not change the probability distribution much. Hence the exact probability should be very close to what we obtained above.

  1. Re-do both parts of the previous exercise using a simulation in R instead of mathematical reasoning. That is, think carefully about the generative process by which this example proceeds, and write a program which implements a model of this process and uses it to generate many draws of 3 letters. Use the sample() function, and estimate probabilities as the proportion of samples that have the desired property as your best estimate of the probability. Hints: you can learn about sample() by typing ?sample into the console. Make sure that you think carefully about what vector you are sampling from. You may fill in letters for which Levy does not specify frequencies in 2.5.2 with a generic ‘other’ value. Also, make sure that you think carefully about whether to set sample(..., replace=TRUE) or sample(..., replace = FALSE) when answering each sub-question.

Alice in Wonderland has around 120000 letters, but to illustrate the point that sampling with or without replacement does not make much difference in this case, we pretend that it only has 1/10 of the letters, i.e., 12000. This enables us to run more samples to better estimate the probabilities.

We will use 'o' for any letter other than 'e', 't' and 'a'.

N <- 12000
n.e <- N*p.e
n.t <- N*p.t
n.a <- N*p.a
vec.alice <- rep('o', N)
vec.alice[1 : n.e] <- 'e'
vec.alice[(n.e + 1) : (n.e + n.t)] <- 't'
vec.alice[(n.e + n.t + 1) : (n.e + n.t + n.a)] <- 'a'

sample.letters <- function(vec, n, replace=FALSE) {
  # sample n letters from vec, concatenate the result vector to make it a string
  return(paste0(sample(x=vec, size=n, replace), collapse=''))
}

spell.tea <- function(s){
  # whether the letters in the string can spell 'tea'
  return(s=='tea' || s=='tae' || s=='eta' || s=='eat' || s=='ate' || s=='aet')
}
spell.tee <- function(s){
  # whether the letters in the string can spell 'tee'
  return(s=='tee' || s=='ete' || s=='eet')
}

First we consider sampling from infinite copies, which effectively means sampling with replacement.

n.samples <- 50000
samples.rep <- sapply(1:n.samples, 
                      function(i) sample.letters(vec.alice, 3, replace=TRUE)) 
p.tea.rep <- sum(sapply(samples.rep, spell.tea)) / n.samples 
p.tea.rep  # expected 0.0061
## [1] 0.00568
p.tee.rep <- sum(sapply(samples.rep, spell.tee)) / n.samples 
p.tee.rep  # expected 0.0047
## [1] 0.0042

Now we consider sampling from only one copy, i.e., without replacement.

samples.norep <- sapply(1:n.samples, 
                      function(i) sample.letters(vec.alice, 3, replace=FALSE)) 
p.tea.norep <- sum(sapply(samples.norep, spell.tea)) / n.samples 
p.tea.norep  # expected roughly 0.0061
## [1] 0.00596
p.tee.norep <- sum(sapply(samples.norep, spell.tee)) / n.samples 
p.tee.norep  # expected roughly 0.0047
## [1] 0.0049
  1. Levy ch.1, pp.34-6, exercise 2.10: “For adult female native speakers of American English, the distribution of first-formant frequencies for the vowel [E] is reasonably well modeled as a normal distribution with mean 608Hz and standard deviation 77.5Hz. What is the probability that the first-formant frequency of an utterance of [E] for a randomly selected adult female native speaker of American English will be between 555Hz and 697Hz?” Show the R code that you used to calculate the answer. (Hint: look back at Monday’s class notes, specifically the part about using R to find the cumulative probability of continuous distributions.)
pnorm(697, mean=608, sd=77.5) - pnorm(555, mean=608, sd=77.5)
## [1] 0.6275673
  1. Design a simulation implementing Pearl’s rain/sprinkler/wet grass as discussed in class. Make sure that the samples that you generate assign a truth-value to each variable - rain, sprinkler, and wet grass - and that they have the dependency structure assumed: rain and sprinkler are uncaused, occurring with some fixed probability (say, both are flip(.3)); and wet grass occurs if and only if: either it rained or the sprinkler was on. You’ll want to begin your code by defining the ‘coin flip’ function.
flip = function(p) runif(1,0,1) < p
sim = function(i) {
  rain = flip(0.3)
  sprinkler = flip(0.3)
  wet.grass = rain || sprinkler
  return(c(rain, sprinkler, wet.grass)) # what you return should be a non-trivial vector
}
samples = sapply(1:1000, FUN=sim)

Each column is the result of a simulation and each row records the values of a random variable (row 1 is for rain, row 2 for sprinkler and row 3 for wet.grass).

rownames(samples)=c("rain", "sprinkler", "wet.grass")  
colnames(samples)=paste('sample', 1:1000) # create a vector ("sample 1", "sample 2", ... "sample 1000")
samples.wet.grass <- samples[, which(samples["wet.grass", ])]
dim(samples.wet.grass)
## [1]   3 483
mean(samples.wet.grass["rain", ]) # proportion of rain
## [1] 0.5652174
mean(samples.wet.grass["sprinkler", ]) # proportion of sprinkler
## [1] 0.610766
samples.wet.grass.rain <- samples[, which(samples["wet.grass", ] & samples["rain", ])]
mean(samples.wet.grass.rain["sprinkler", ]) # proportion of sprinkler
## [1] 0.3113553

We can see that the proportion of these samples in which sprinkler is true drops down to the base level (around 0.3). The reason is that the knowledge of raining fully explains away the observation of wet grass. We have no additional information beyond what we already know a priori about whether the sprinkler was on or not, since either way would be equally consistent with our observation.