Imagine we hear a Spanish translation of the sentence fragment “The mouse was chased by …” Let \(A\) be the event that the noun of the following NP is masculine and \(B\) be the event that the noun of the following NP is plural. Given the state of knowledge at this point, \(C\), \(A\) and \(B\) are conditionally independent.
However, suppose we further learn \(D\) that the article of this NP starts with ‘l,’ \(A\) and \(B\) lose conditional independence, because while \(p(B | C,D)<1\) (the noun could still be either singular or plural),
\(p(B|A, C, D)=1\) (if the noun is masculine and the definite article starts with ‘l,’ then it has to be the plural article ‘los,’ since the singular ‘el’ does not start with ‘l.’)
The probability of drawing any particular permutation of ‘tea’ is \(\pi_t\cdot\pi_e\cdot\pi_a\). Since there are \(3!\) different permutations, the probability of being able to spell ‘tea’ is \(3!\cdot\pi_t\cdot\pi_e\cdot\pi_a\).
Similarly, the probability of drawing any particular permutation of ‘tee’ is \(\pi_t\cdot\pi_e\cdot\pi_e\), but there are only 3 different permutations (‘tee’, ‘ete’ and ‘eet’). Hence the probability of being able to spell ‘tee’ is \(3\cdot\pi_t\cdot\pi_e\cdot\pi_e\).
p.e <- 0.126
p.t <- 0.099
p.a <- 0.082
factorial(3) * p.t * p.e * p.a # probability of 'tea'
## [1] 0.006137208
3 * p.t * p.e * p.e # probability of 'tee'
## [1] 0.004715172
The specification is to make sure that drawing out letters does not affect the probability distribution of the next letter, since there are infinitely many copies. If there is only one copy (or finitely many copies for that matter), then sampling (drawing letters) without replacement means that the probability distribution changes, however slightly, after drawing each letter. In this case, since we do not know the total number of letters in the copy, we will not be able to compute the exact probability. However, since the number of each letter is all very large, sampling a couple of letters will not change the probability distribution much. Hence the exact probability should be very close to what we obtained above.
sample() function, and estimate probabilities as the proportion of samples that have the desired property as your best estimate of the probability. Hints: you can learn about sample() by typing ?sample into the console. Make sure that you think carefully about what vector you are sampling from. You may fill in letters for which Levy does not specify frequencies in 2.5.2 with a generic ‘other’ value. Also, make sure that you think carefully about whether to set sample(..., replace=TRUE) or sample(..., replace = FALSE) when answering each sub-question.Alice in Wonderland has around 120000 letters, but to illustrate the point that sampling with or without replacement does not make much difference in this case, we pretend that it only has 1/10 of the letters, i.e., 12000. This enables us to run more samples to better estimate the probabilities.
We will use 'o' for any letter other than 'e', 't' and 'a'.
N <- 12000
n.e <- N*p.e
n.t <- N*p.t
n.a <- N*p.a
vec.alice <- rep('o', N)
vec.alice[1 : n.e] <- 'e'
vec.alice[(n.e + 1) : (n.e + n.t)] <- 't'
vec.alice[(n.e + n.t + 1) : (n.e + n.t + n.a)] <- 'a'
sample.letters <- function(vec, n, replace=FALSE) {
# sample n letters from vec, concatenate the result vector to make it a string
return(paste0(sample(x=vec, size=n, replace), collapse=''))
}
spell.tea <- function(s){
# whether the letters in the string can spell 'tea'
return(s=='tea' || s=='tae' || s=='eta' || s=='eat' || s=='ate' || s=='aet')
}
spell.tee <- function(s){
# whether the letters in the string can spell 'tee'
return(s=='tee' || s=='ete' || s=='eet')
}
First we consider sampling from infinite copies, which effectively means sampling with replacement.
n.samples <- 50000
samples.rep <- sapply(1:n.samples,
function(i) sample.letters(vec.alice, 3, replace=TRUE))
p.tea.rep <- sum(sapply(samples.rep, spell.tea)) / n.samples
p.tea.rep # expected 0.0061
## [1] 0.00568
p.tee.rep <- sum(sapply(samples.rep, spell.tee)) / n.samples
p.tee.rep # expected 0.0047
## [1] 0.0042
Now we consider sampling from only one copy, i.e., without replacement.
samples.norep <- sapply(1:n.samples,
function(i) sample.letters(vec.alice, 3, replace=FALSE))
p.tea.norep <- sum(sapply(samples.norep, spell.tea)) / n.samples
p.tea.norep # expected roughly 0.0061
## [1] 0.00596
p.tee.norep <- sum(sapply(samples.norep, spell.tee)) / n.samples
p.tee.norep # expected roughly 0.0047
## [1] 0.0049
pnorm(697, mean=608, sd=77.5) - pnorm(555, mean=608, sd=77.5)
## [1] 0.6275673
flip(.3)); and wet grass occurs if and only if: either it rained or the sprinkler was on. You’ll want to begin your code by defining the ‘coin flip’ function.flip = function(p) runif(1,0,1) < p
sapply()-based simulation methods discussed in class on Mondaysim = function(i) {
rain = flip(0.3)
sprinkler = flip(0.3)
wet.grass = rain || sprinkler
return(c(rain, sprinkler, wet.grass)) # what you return should be a non-trivial vector
}
samples = sapply(1:1000, FUN=sim)
sapply(1:m, ...) is a \(n \times m\) matrix - i.e., one with \(n\) rows and \(m\) columns. What do the columns represent? What does each row represent?Each column is the result of a simulation and each row records the values of a random variable (row 1 is for rain, row 2 for sprinkler and row 3 for wet.grass).
rownames() and colnames() to add informative row and column names to the matrix of samples. [Hint: what does paste('sample', 1:1000) do?]rownames(samples)=c("rain", "sprinkler", "wet.grass")
colnames(samples)=paste('sample', 1:1000) # create a vector ("sample 1", "sample 2", ... "sample 1000")
which() to define a new matrix with only the samples in which your observation was true: wet.grass == TRUE. What are the dimensions of this matrix? What is the proportion of these samples in which each of rain and sprinkler is true?samples.wet.grass <- samples[, which(samples["wet.grass", ])]
dim(samples.wet.grass)
## [1] 3 483
mean(samples.wet.grass["rain", ]) # proportion of rain
## [1] 0.5652174
mean(samples.wet.grass["sprinkler", ]) # proportion of sprinkler
## [1] 0.610766
which() to select the subset in which rain and wet.grass are BOTH true. What is the proportion of these samples in which sprinkler is true? On an intuitive level, why is this?samples.wet.grass.rain <- samples[, which(samples["wet.grass", ] & samples["rain", ])]
mean(samples.wet.grass.rain["sprinkler", ]) # proportion of sprinkler
## [1] 0.3113553
We can see that the proportion of these samples in which sprinkler is true drops down to the base level (around 0.3). The reason is that the knowledge of raining fully explains away the observation of wet grass. We have no additional information beyond what we already know a priori about whether the sprinkler was on or not, since either way would be equally consistent with our observation.