Question 1

Context: Imagine I am interpreting the utterance “I’m hot!” from a person named Bill. State A - Bill is feeling overheated. State B - Bill is feeling lucky. Knowledge C - Bill and I are in the U.S. States A and B are conditionally independent given C because knowledge of C does nothing to alter the liklihood of A or B. Knowledge D - Bill and I are in a casino playing poker and Bill has won three consecutive pots. Given knowledge C, A and B are conditionally independent. Given knowedge D, A and B are dependent - the likelihood of Bill feeling lucky (State B) considering knowledge D dramatically increases, consequently decreasing the likelihood of State A.

Question 2

  1. The probability that we will randomly select ‘t’, ‘e’, and ‘a’ is the same as the product of their individual probabilities.
t = .099
e = .126
a = .082
prob.tea = t * e * a
prob.tea
## [1] 0.001022868

However, given that order is of no consequence, there are six ways we might pick any of these letters: tea tae eat eta aet ate Therefore, the probability that we make three selections that allow us spell “tea” is 6 * prob.

6 * prob.tea #Probability of spelling tea
## [1] 0.006137208

We can follow the same steps to calculate the probability of spelling “tee.” However, in this case there are only 3 different combinations of the letters: tee ete eet

prob.tee = t * e * e
3 * prob.tee #Probability of spelling tee
## [1] 0.004715172
  1. It was necessary for us to have infinite copies of the text because that meant our RVs were independent RV’s (selecting ‘t’ had no effect on the probability of selecting subsequent letters). However, if we had a single text and selected one letter at a time without replacement, then each RV selection would affect other selections. Frequencies would need to be updated as well as our sample space (if we got to the point that we selected all of a certain letter). So, in this circumstance in which we do not know the overall character frequencies, we would lose the ability to accurately compute the probability of being able to spell “tea” (or any string of chars).

Question 3

Simulating the above question in R we first populate a vector of letters “chars” as well as a vector of letter probabilities “probs”. I’ve initially given each letter a random relative frequency between .0300 and .0302 to approximate the fraction 1/26. Of course we do have the relative frequencies for “t”, “e”, and “a”. So I populate those manually.

chars = letters
probs = runif(26, .0300, .0302)
probs[1] = .082
probs[5] = .126
probs[20] = .099

Now, I keep a “count”" var to count all the occurance of “t”,“e”,“a” permutations sampling 1,000,000 times and counting everytime we get a match in letters.

count = 0
for (i in 1:1000000) {
  selection = paste(sample(chars, size=3, replace=T, prob = probs), collapse="")
    if (selection == "tea" | selection == "tae" | selection == "eat" |
        selection == "eta" | selection == "aet" | selection == "ate")
        count = count + 1
}

Dividing the occurances of letter matches by the sample we get a frequency of:

frequency = count / 1000000
print(frequency)
## [1] 0.006071

We can do the same for “tee” permutations:

count = 0
for (i in 1:1000000) {
  selection = paste(sample(chars, size=3, replace=T, prob = probs), collapse="")
    if (selection == "tee" | selection == "ete" | selection == "eet")
        count = count + 1
}
frequency = count / 1000000
print(frequency)
## [1] 0.004798

Question 4

First, we introduce a new normal distribution using our mean = 608 and sd = 77.5. I’ve arbitrarily populated the number of observations “n” in rnorm at 1000. Here’s a histogram plot of our normal distribution for fun.

n.distr = rnorm(1000, mean = 608, sd = 77.5)

hist(n.distr, main = "First-formant frequencies for the vowel [E]\nfor female Native American English Speakers", xlab = "first-formant frequencies", ylab = "Occurances")

Now, we can feed this into into a Cummulative Distribution Function (“c.distr”) using ecdf(), again plotting for fun…

c.distr = ecdf(n.distr)
plot(c.distr, main = "Cummulative Probability\nfor the vowels [E]\nfemale Native American English Speakers", xlab = "first-formant frequencies", ylab = "Probability")

Since we have have our Cummulative Probability Distribution we can access the probability of a given frequency using c.distr which will return the probability of a given frequency argument.

c.distr(555)
## [1] 0.275
c.distr(697)
## [1] 0.876
#Which is to say there's a probability of c.distr(555) of randomly selecting a frequency under 555Hz and a probabiliy of c.distr(697) of randomly selecting a frequency under 697Hz

Since we’re interested in the probabily of randomly selecting a frequency between 555Hz and 697Hz, we simply take the difference of the values returned at those frequencies:

prob = c.distr(697) - c.distr(555)
prob
## [1] 0.601

Question 5

First we define our flip function, which will use for our “rain” and “sprinkler” states and implment our simulation function “sim” using “flip.” I’ve made two vars to hold our rain and sprinkler probabilities we’ll pass to flip. We assign the matr

flip = function(p) runif(1, 0, 1) < p
rain.prob = .24
sprinkler.prob = .31

sim = function(i) {
  rain = flip(rain.prob)
  sprinkler = flip(sprinkler.prob)
  wetgrass = rain || sprinkler    
  return(c(rain, sprinkler, wetgrass))
}

We generate 10000 samples from our model, storing them in a matrix we’ll call “samples”. Since our “sim” function returns a vector of length 3 (for the truth-functional outcomes of rain, sprinkler and wetgrass) our matrix will be 3 X 100000, with rows 1:3 representing (rain, sprinkler, wetgrass) and each of the columns representing a trial (observation). I name the rows and cols to reflect this. Having the matrix oriented this way doesn’t feel as intuitive as having each row represent an individual trial so I transpose the rows and columns of samples, storing it in a new matrix “samples.transposed”.

samples = sapply(1:10000, FUN=sim)
dim(samples)
## [1]     3 10000
rownames(samples) = c("rain", "sprinkler", "wetgrass")
colnames(samples) = paste('sample', 1:10000)
samples[,1:10]
##           sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7
## rain         FALSE    FALSE     TRUE    FALSE     TRUE    FALSE    FALSE
## sprinkler    FALSE    FALSE    FALSE    FALSE     TRUE     TRUE    FALSE
## wetgrass     FALSE    FALSE     TRUE    FALSE     TRUE     TRUE    FALSE
##           sample 8 sample 9 sample 10
## rain         FALSE    FALSE     FALSE
## sprinkler     TRUE    FALSE     FALSE
## wetgrass      TRUE    FALSE     FALSE
samples.transposed = t(samples)
dim(samples.transposed)
## [1] 10000     3

Here’s a look at our first 10 samples now stored in our “samples.transposed” matrix:

samples.transposed[1:10,]
##            rain sprinkler wetgrass
## sample 1  FALSE     FALSE    FALSE
## sample 2  FALSE     FALSE    FALSE
## sample 3   TRUE     FALSE     TRUE
## sample 4  FALSE     FALSE    FALSE
## sample 5   TRUE      TRUE     TRUE
## sample 6  FALSE      TRUE     TRUE
## sample 7  FALSE     FALSE    FALSE
## sample 8  FALSE      TRUE     TRUE
## sample 9  FALSE     FALSE    FALSE
## sample 10 FALSE     FALSE    FALSE

Now I’m going to pull the indices out of samples.tranposed for the samples in which wetgrass == TRUE. I’m breaking this into two steps for clarity.

wetgrass.indices = which(samples.transposed[, "wetgrass"] == T)

As an aside we can see how many TRUE wetgrass states we have taken the length of this vector. Dividing that by the total samples gives us an approximation of how often we observed wetgrass as a relative frequency.

rfrequency.wetgrass = length(wetgrass.indices)
percent.wetgrass = rfrequency.wetgrass / nrow(samples.transposed)
percent.wetgrass
## [1] 0.4693

We can save all samples in which the grass was wet into another matrix we’ll call “samples.wetgrassT”

samples.wetgrassT = samples.transposed[wetgrass.indices,]
dim(samples.wetgrassT)
## [1] 4693    3

From samples.wetgrassT we can pull data to populate two more vector variables, “rainT” and “sprinklerT” storing all the indices in which “rain” is true and “sprinkler” is true. We can find the proportions of these samples

#Indices in which it rained
rainT = which(samples.wetgrassT[,"rain"], T)
#Proportion of instances in which in rained:
length(rainT) / rfrequency.wetgrass
## [1] 0.5024505
#Indices in which it sprinklered
sprinklerT = which(samples.wetgrassT[,"sprinkler"],T)
#Proportion of instances in which in sprinklered:
length(sprinklerT) / rfrequency.wetgrass
## [1] 0.6535265

To find the instances in which it both rained and sprinklered I introduce a new vector “rain.sprinklerT” and use which() again:

rain.sprinklerT = which(samples.wetgrassT[rainT, "sprinkler"], T)
#Number of times it both rained and sprinklered
length(rain.sprinklerT)
## [1] 732
#Relative frequency that it both rained and sprinklered
length(rain.sprinklerT) / rfrequency.wetgrass
## [1] 0.155977

For fun I also did this using nested for-loops and stored the samples in which it BOTH rained and sprinkled in “bothT.”

bothT = numeric()
for (i in 1:length(rainT)) {
  for (j in 1:length(sprinklerT)) {
        if (rainT[i] == sprinklerT[j]) {
            bothT = c(bothT, rainT[i])
        }
    }
}
#Number of times it both rained and sprinklered
length(bothT)
## [1] 732

Obvioulsy using “which()” is way better…

The last bit of the question is somewhat vague… if the sprinkler was on we’d want to select the subset in which sprinkler==T (wetgrass would of course also be T)… So the proportion of the sample in which sprinkler==T would be 100%… Is it supposed to be the proportion of sprinkler==T sample in which rain is also T? I calculate that here:

length(which(samples.wetgrassT[sprinklerT, "rain"] == T)) / nrow(samples.wetgrassT[sprinklerT,])
## [1] 0.2386697

Notice how close this number is to the original probability that it would rain. This makes sense because rain and sprinklers are independent RV.

rain.prob
## [1] 0.24