Pages 45-6
Exercise 1
The data set warlpiri provides information about the use of the ergative case in Lajamanu Warlpiri. Data were elicited for adults and children of various ages. The questions of interest is to what extent the use of the ergative case marker is predictable from the animacy of the subject, word order, and the age of the speaker (adult versus child). Explore this data set with respect to this issue by means of a mosaic plot.
ergative.xtabs=xtabs(~AgeGroup+WordOrder+CaseMarking+AnimacyOfSubject, data=warlpiri)
mosaicplot(ergative.xtabs, main="Ergative") #I found that there are many ways to make a mosaic plot (depending on the order of the variables), and that this configuration was most informative.
Exercise 2
In Chapter 1 we created a data frame with mean reaction times and mean base frequencies for neologisms in the Dutch suffix -heid. Reconstruct the data frame heid2. Both reaction times and frequencies are logarithmically transformed. Use exp() to undo these transformations and make a scatterplot of the averaged reaction times (MeanRT) against the frequency of the base (BaseFrequency). Compare this scatterplot with a scatterplot using the log-transformed values.
heid2=aggregate(heid$RT,list(heid$Word),mean)
colnames(heid2)=c("Word","MeanRT")
items=heid[,c("Word","BaseFrequency")]
items=unique(items)
heid2=merge(heid2,items,by.x="Word",by.y="Word")
heid2$ExpMeanRT=exp(heid2$MeanRT)
heid2$ExpBaseFrequency=exp(heid2$BaseFrequency)
par(mfrow=c(1,2))
plot(heid2$ExpMeanRT,heid2$ExpBaseFrequency,xlab="Base Frequency",ylab="Mean Reaction Time")
plot(heid2$MeanRT,heid2$BaseFrequency,xlab="Log Base Frequency",ylab="Log Mean Reaction Time")
Exercise 3
The data set moby is a character vector with the text of Melville’s Moby Dick. In this exercise, we consider whether Ziph’s law holds for Moby Dick. According to Zipf’s law [Zipf, 1949], the frequency of a word is inversely proportional to its rank in a numerically sorted list. The word with the highest frequency has rank 1, the word with the one but highest frequency has rank 2, etc. If Zipf’s law holds, a plot of log frequency against log rank should reveal a straight line. We make a table of word frequencies with table() – we cannot use xtabs(), because words is a vector and xtabs() expects a data frame – and sort the frequencies in reverse numerical order.
moby.table=table(moby)
moby.table=sort(moby.table,decreasing=TRUE)
moby.table[1:5]
## moby
## the of and a to
## 13717 6512 6008 4551 4514
We now have the word frequencies. We use the colon operator and length(), which returns the length of a vector, to construct the corresponding lengths.
ranks=1:length(moby.table)
ranks[1:5]
## [1] 1 2 3 4 5
Make a scatterplot of log frequency against log rank.
MobyLogFrequency=log(moby.table)
MobyLogRank=log(ranks)
par(mfrow=c(1,1))
plot(MobyLogRank,MobyLogFrequency,xlab="Log Rank",ylab="LogFrequency")
Exercise 4
The column labeled Trial in the data set lexdec specifies, for each subject, the trial number of the responses. For a given subject, the first trial in the experiment has trial number 1, the second has trial number 2, etc. Use xylowess.fnc() to explore the possibility that the subjects proceeded through the experiment in different ways, some revealing effects of learning, and others effects of fatigue.
lexdec$DiffFromMeanRT=lexdec$meanRT-lexdec$RT #Here, I normalized the reaction times, based on the mean reaction time for that lexical item by all participants. A positive value here indicates a faster than expected RT, whereas a negative value represents a slower than expected RT.
xylowess.fnc(DiffFromMeanRT~Trial|Subject, data=lexdec) #This plot shows difference from mean reaction time over trials for each subject. If the fit line has a positive slope, it indicates that the subject is getting faster than expected, whereas if the fit line has a negative slope, this may mean that the subject was getting tired and answering slower than expected. One could also look at accuracy over trial time, but this data wasn't immediately available from the dataset.
###Homework 2###
Levy ch.1, pp.34-6, exercise 2.2, modified: “Give an example in words - involving language understanding, and one that was not specifically discussed in class- where two events \(A\) and \(B\) are conditionally independent given some state of knowledge \(C\), but when another piece of knowledge \(D\) is learned, \(A\) and \(B\) lose conditional independence.”
In Russian, vowel quality of neighboring syllables \(A\) and \(B\) are conditionally independent given that they are seperated by a word boundary (knowledge \(C\)), however if word \(A\) is a preposition, the vowel qualities \(A\) and \(B\) are no longer conditonally independent.
Levy ch.1, pp.34-6, exercise 2.3:
+ “You obtain infinitely many copies of the text Alice in Wonderland and decide to play a word game with it. You cut apart each page of each copy into individual letters, throw all the letters in a bag, shake the bag, and draw three letters at random from the bag. What is the probability that you will be able to spell ‘tea’? What about ‘tee’? [Hint: see Section 2.5.2; perhaps peek at Section A.8 as well.]”
Since we have infinite copies of the text, we can believe that the distribution frequencies measured in one copy of the text are now reliable probabilities. We can also assume that if one letter is drawn from the bag, the resulting probabilities are unaltered. The probablity that we would draw ‘t’ (t) is 0.099, that we would draw ‘e’ (e) is 0.126, and that we would draw ‘a’ (a) is 0.082. The probability of these three independent events occuring is (t) * (e) * (a) = 0.099 * 0.126 * 0.082 = 0.00102 multiplied by six, given the six permutations possible (TEA,TAE,ATE,AET,EAT,ETA) = 0.00612.
The probability of drawing ‘tee’ would be (t) * (e) * (e) = 0.099 * 0.126 * 0.126 = 0.00157 multiplied by three given the three permutations possible (TEE,ETE,EET) is 0.00472.
In order to solve the above problem, it was necessary that there be an infinite number of copies of the text. If I had only one copy of the text, I would be able to determine the distributional frequency of the characters in the text, which would give me some idea to the probability, but the exact probability would still be unknown. This is because frequency =/= probability.
Sara Kessler and I worked together on this problem.
Re-do both parts of the previous exercise using a simulation in R instead of mathematical reasoning. That is, think carefully about the generative process by which this example proceeds, and write a program which implements a model of this process and uses it to generate many draws of 3 letters. Use the sample() function, and estimate probabilities as the proportion of samples that have the desired property as your best estimate of the probability. Hints: you can learn about sample() by typing ?sample into the console. Make sure that you think carefully about what vector you are sampling from. You may fill in letters for which Levy does not specify frequencies in 2.5.2 with a generic ‘other’ value. Also, make sure that you think carefully about whether to set sample(..., replace=TRUE) or sample(..., replace = FALSE) when answering each sub-question.
Given infinite copies of the text Alice in Wonderland, we can simulate 10000 draws from that infinite bag using the following formula:
aT=1
aE=2
aA=3
aother=0
AliceInf = function(z) {
trial=sort(sample(c(aT,aE,aA,aother), size=3, replace=TRUE, prob=c(0.099,.126,.082,1-.099-.126-.082)))
return(trial)
}
AliceInf.Samples=sapply(1:10000, FUN=AliceInf)
AliceInf.Samples=t(AliceInf.Samples)
colnames(AliceInf.Samples) <- paste('Choice',1:3,sep="")
rownames(AliceInf.Samples) <- paste('Sample',1:10000,sep="")
head(AliceInf.Samples)
## Choice1 Choice2 Choice3
## Sample1 0 0 0
## Sample2 0 0 0
## Sample3 0 0 1
## Sample4 0 0 2
## Sample5 0 0 0
## Sample6 0 0 1
aTEA2.True=AliceInf.Samples[which(AliceInf.Samples[,"Choice1"]==1 & AliceInf.Samples[,"Choice2"]==2 & AliceInf.Samples[,"Choice3"]==3),]
#intersect
aT.True=AliceInf.Samples[which(AliceInf.Samples[,"Choice1"]==1),]
aTE.True=aT.True[which(aT.True[,"Choice2"]==2),]
aTEA.True=aTE.True[which(aTE.True[,"Choice3"]==3),]
aTEA.True
## Choice1 Choice2 Choice3
## Sample250 1 2 3
## Sample264 1 2 3
## Sample679 1 2 3
## Sample732 1 2 3
## Sample836 1 2 3
## Sample839 1 2 3
## Sample1053 1 2 3
## Sample1148 1 2 3
## Sample1164 1 2 3
## Sample1235 1 2 3
## Sample1681 1 2 3
## Sample1741 1 2 3
## Sample1748 1 2 3
## Sample1848 1 2 3
## Sample1944 1 2 3
## Sample1974 1 2 3
## Sample2023 1 2 3
## Sample2621 1 2 3
## Sample2814 1 2 3
## Sample2944 1 2 3
## Sample2946 1 2 3
## Sample3245 1 2 3
## Sample3443 1 2 3
## Sample3709 1 2 3
## Sample3846 1 2 3
## Sample3977 1 2 3
## Sample4090 1 2 3
## Sample4292 1 2 3
## Sample4585 1 2 3
## Sample4779 1 2 3
## Sample4920 1 2 3
## Sample5318 1 2 3
## Sample5420 1 2 3
## Sample5568 1 2 3
## Sample5648 1 2 3
## Sample5797 1 2 3
## Sample5937 1 2 3
## Sample5991 1 2 3
## Sample6292 1 2 3
## Sample6391 1 2 3
## Sample6789 1 2 3
## Sample7206 1 2 3
## Sample7231 1 2 3
## Sample7474 1 2 3
## Sample7656 1 2 3
## Sample8000 1 2 3
## Sample8452 1 2 3
## Sample8692 1 2 3
## Sample9015 1 2 3
## Sample9283 1 2 3
## Sample9387 1 2 3
## Sample9438 1 2 3
## Sample9552 1 2 3
## Sample9703 1 2 3
## Sample9765 1 2 3
## Sample9994 1 2 3
Proportion.TEA = nrow(aTEA.True)/nrow(AliceInf.Samples)
Proportion.TEA
## [1] 0.0056
aTEE.True=aTE.True[which(aTE.True[,"Choice3"]==2),]
aTEE.True
## Choice1 Choice2 Choice3
## Sample95 1 2 2
## Sample115 1 2 2
## Sample508 1 2 2
## Sample581 1 2 2
## Sample968 1 2 2
## Sample1046 1 2 2
## Sample1075 1 2 2
## Sample1097 1 2 2
## Sample1211 1 2 2
## Sample1475 1 2 2
## Sample1549 1 2 2
## Sample2024 1 2 2
## Sample2300 1 2 2
## Sample2331 1 2 2
## Sample2723 1 2 2
## Sample2787 1 2 2
## Sample3399 1 2 2
## Sample3678 1 2 2
## Sample4527 1 2 2
## Sample4703 1 2 2
## Sample4789 1 2 2
## Sample4857 1 2 2
## Sample5523 1 2 2
## Sample5587 1 2 2
## Sample5853 1 2 2
## Sample6153 1 2 2
## Sample6381 1 2 2
## Sample6382 1 2 2
## Sample6858 1 2 2
## Sample6963 1 2 2
## Sample7078 1 2 2
## Sample7178 1 2 2
## Sample7202 1 2 2
## Sample7237 1 2 2
## Sample7593 1 2 2
## Sample7830 1 2 2
## Sample7857 1 2 2
## Sample7865 1 2 2
## Sample8202 1 2 2
## Sample8267 1 2 2
## Sample8304 1 2 2
## Sample8403 1 2 2
## Sample8857 1 2 2
## Sample9150 1 2 2
## Sample9186 1 2 2
## Sample9187 1 2 2
## Sample9763 1 2 2
## Sample9794 1 2 2
Proportion.TEE = nrow(aTEE.True)/nrow(AliceInf.Samples)
Proportion.TEE
## [1] 0.0048
With just one copy of the text, we cannot use probabilities, but only distributional frequencies. As the text Alice in Wonderland is known to have 27,500 words and the average length of an English word is 5.10 letters, we can estimate that the text Alice in Wonderland contains 140,250 letters. If the probabilities above were calculated from the true frequency of letters in the text, we can estimate that there are 13,885 Ts, 17,672 Es, 11,501 As, and 97,193 other letters in the text. We can simulate 1000 draws from that finite bag using the following formula:
aT=1
aE=2
aA=3
aother=0
Alice1 = function(z) {
trial=sort(sample(c(replicate(13885,aT),replicate(17672,aE),replicate(11501,aA),replicate(97193,aother)),size=3,replace=FALSE))
return(trial)
}
Alice1.Samples=sapply(1:10000, FUN=Alice1)
Alice1.Samples=t(Alice1.Samples)
colnames(Alice1.Samples) <- paste('Choice',1:3,sep="")
rownames(Alice1.Samples) <- paste('Sample',1:10000,sep="")
head(Alice1.Samples)
## Choice1 Choice2 Choice3
## Sample1 0 0 0
## Sample2 0 2 2
## Sample3 0 0 0
## Sample4 0 0 0
## Sample5 0 0 1
## Sample6 0 1 3
aT1.True=Alice1.Samples[which(Alice1.Samples[,"Choice1"]==1),]
aTE1.True=aT1.True[which(aT1.True[,"Choice2"]==2),]
aTEA1.True=aTE1.True[which(aTE1.True[,"Choice3"]==3),]
aTEA1.True
## Choice1 Choice2 Choice3
## Sample72 1 2 3
## Sample88 1 2 3
## Sample276 1 2 3
## Sample317 1 2 3
## Sample454 1 2 3
## Sample462 1 2 3
## Sample534 1 2 3
## Sample558 1 2 3
## Sample846 1 2 3
## Sample1148 1 2 3
## Sample1186 1 2 3
## Sample1608 1 2 3
## Sample1623 1 2 3
## Sample2099 1 2 3
## Sample2109 1 2 3
## Sample2343 1 2 3
## Sample2389 1 2 3
## Sample2644 1 2 3
## Sample3125 1 2 3
## Sample3184 1 2 3
## Sample3207 1 2 3
## Sample3535 1 2 3
## Sample3671 1 2 3
## Sample3808 1 2 3
## Sample3876 1 2 3
## Sample3923 1 2 3
## Sample3930 1 2 3
## Sample3979 1 2 3
## Sample3999 1 2 3
## Sample4193 1 2 3
## Sample4760 1 2 3
## Sample4851 1 2 3
## Sample4890 1 2 3
## Sample5055 1 2 3
## Sample5234 1 2 3
## Sample5252 1 2 3
## Sample5262 1 2 3
## Sample5433 1 2 3
## Sample5448 1 2 3
## Sample5860 1 2 3
## Sample6638 1 2 3
## Sample6715 1 2 3
## Sample6949 1 2 3
## Sample7013 1 2 3
## Sample7505 1 2 3
## Sample7537 1 2 3
## Sample7561 1 2 3
## Sample7710 1 2 3
## Sample7731 1 2 3
## Sample7814 1 2 3
## Sample7824 1 2 3
## Sample8111 1 2 3
## Sample8133 1 2 3
## Sample8151 1 2 3
## Sample8167 1 2 3
## Sample8405 1 2 3
## Sample8528 1 2 3
## Sample8631 1 2 3
## Sample8774 1 2 3
## Sample8851 1 2 3
## Sample9041 1 2 3
## Sample9092 1 2 3
## Sample9154 1 2 3
## Sample9177 1 2 3
## Sample9406 1 2 3
## Sample9517 1 2 3
## Sample9553 1 2 3
## Sample9878 1 2 3
Proportion.TEA1 = nrow(aTEA1.True)/nrow(Alice1.Samples)
Proportion.TEA1
## [1] 0.0068
aTEE1.True=aTE1.True[which(aTE1.True[,"Choice3"]==2),]
aTEE1.True
## Choice1 Choice2 Choice3
## Sample24 1 2 2
## Sample150 1 2 2
## Sample346 1 2 2
## Sample471 1 2 2
## Sample870 1 2 2
## Sample979 1 2 2
## Sample1342 1 2 2
## Sample1482 1 2 2
## Sample1643 1 2 2
## Sample1865 1 2 2
## Sample2581 1 2 2
## Sample2642 1 2 2
## Sample2958 1 2 2
## Sample2994 1 2 2
## Sample3465 1 2 2
## Sample4031 1 2 2
## Sample4043 1 2 2
## Sample4716 1 2 2
## Sample5401 1 2 2
## Sample5645 1 2 2
## Sample5894 1 2 2
## Sample6226 1 2 2
## Sample6595 1 2 2
## Sample6892 1 2 2
## Sample6893 1 2 2
## Sample6916 1 2 2
## Sample6947 1 2 2
## Sample7092 1 2 2
## Sample7450 1 2 2
## Sample7891 1 2 2
## Sample8080 1 2 2
## Sample8219 1 2 2
## Sample8344 1 2 2
## Sample8412 1 2 2
## Sample8541 1 2 2
## Sample8801 1 2 2
## Sample9090 1 2 2
## Sample9181 1 2 2
## Sample9298 1 2 2
## Sample9481 1 2 2
## Sample9803 1 2 2
Proportion.TEE1 = nrow(aTEE1.True)/nrow(Alice1.Samples)
Proportion.TEE1
## [1] 0.0041
Levy ch.1, pp.34-6, exercise 2.10: “For adult female native speakers of American English, the distribution of first-formant frequencies for the vowel [E] is reasonably well modeled as a normal distribution with mean 608Hz and standard deviation 77.5Hz. What is the probability that the first-formant frequency of an utterance of [E] for a randomly selected adult female native speaker of American English will be between 555Hz and 697Hz?” Show the R code that you used to calculate the answer. (Hint: look back at Monday’s class notes, specifically the part about using R to find the cumulative probability of continuous distributions.)
pnorm(q=697, mean=608, sd=77.5) - pnorm(q=555, mean=608, sd=77.5) #probablility of B - probablity of A to find the probability of the space under the normal distribution A and B.
## [1] 0.6275673
Sara Kessler and I worked together on this problem.
Design a simulation implementing Pearl’s rain/sprinkler/wet grass as discussed in class. Make sure that the samples that you generate assign a truth-value to each variable - rain, sprinkler, and wet grass - and that they have the dependency structure assumed: rain and sprinkler are uncaused, occurring with some fixed probability (say, both are flip(.3)); and wet grass occurs if and only if: either it rained or the sprinkler was on. You’ll want to begin your code by defining the ‘coin flip’ function.
flip = function(p) runif(1,0,1) < p
sapply()-based simulation methods discussed in class on Monday, starting with this schematic code chunk:sim = function(i) {
rain = flip(.3)
sprinkler = flip(.3)
wet.grass = rain|sprinkler
return(trial=c(rain,sprinkler,wet.grass))
}
samples=sapply(1:10000, FUN=sim)
sapply(1:m, ...) is a \(n \times m\) matrix - i.e., one with \(n\) rows and \(m\) columns. What do the columns represent? What does each row represent?The sim function returned a matrix of \(3\) rows by \(1000\) columns. The rows represent the truth values of (1) whether it rained, (2) whether the sprinkler was on, and (3) whether or not the grass is wet, based on the values in rows (1) and (2). Each column represents one trial of the sim function. In order to make this matrix easier to read, I transposed the rows and columns to make a long matrix instead of a wide matrix.
samples=t(samples) #transpose to rows and columns
head(samples)
## [,1] [,2] [,3]
## [1,] FALSE FALSE FALSE
## [2,] TRUE FALSE TRUE
## [3,] FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE
## [5,] TRUE FALSE TRUE
## [6,] TRUE FALSE TRUE
rownames() and colnames() to add informative row and column names to the matrix of samples. [Hint: what does paste('sample', 1:10000) do?]rownames(samples) <- paste('Sample',1:10000)
colnames(samples) <- c("q.rain","q.sprinkler","q.wet.grass")
head(samples)
## q.rain q.sprinkler q.wet.grass
## Sample 1 FALSE FALSE FALSE
## Sample 2 TRUE FALSE TRUE
## Sample 3 FALSE FALSE FALSE
## Sample 4 FALSE FALSE FALSE
## Sample 5 TRUE FALSE TRUE
## Sample 6 TRUE FALSE TRUE
which() to define a new matrix with only the samples in which your observation was true: wet.grass == TRUE. What are the dimensions of this matrix? What is the proportion of these samples in which each of rain and sprinkler is true?true.wet.grass=samples[which(samples[,"q.wet.grass"]==T),] #new matrix containing the subset of samples where wet.grass is TRUE.
total.true.wet.grass=nrow(true.wet.grass) #The dimensions of this matrix are 523x3.
total.true.wet.grass
## [1] 5101
ncol(true.wet.grass)
## [1] 3
#proportion of true.wet.grass where rain is true
length(which(true.wet.grass[,"q.rain"]==T))/total.true.wet.grass
## [1] 0.5994903
#proportion of true.wet.grass where sprinkler is true
length(which(true.wet.grass[,"q.sprinkler"]==T))/total.true.wet.grass
## [1] 0.5747893
#the fact that the proportions of rain and sprinkler being true given that the grass is known to be wet are NOT 0.3, exemplifies the problem we discussed in class, whereby the probabilities of rain and sprinkler shift right when we look out the window and see that the grass is wet.
which() to select the subset in which sprinkler and wet.grass are BOTH true. What is the proportion of these samples in which rain is true? On an intuitive level, why is this?true.sprinkler=true.wet.grass[which(true.wet.grass[,"q.sprinkler"]==T),] #new matrix where sprinkler and wet.grass are both true
total.true.sprinkler=nrow(true.sprinkler)
true.rain.and.sprinkler=length(which(true.sprinkler[,"q.rain"]==T))/total.true.sprinkler
true.rain.and.sprinkler #proportion of instances where given sprinkler, and given wet.grass, rain is also true.
## [1] 0.303206
This problem is a beautiful example of conditional independence. Rain (condition A) and sprinkler (condition B) are independent variables, which have the same probability of occuring (0.3). Upon the observation of wet grass (condition C), these variables are no longer independent and now have shifted probabilities, approximately (0.6). If, however, it becomes known that condition A or condition B is necessarily true, (condition D), the variables are once again independent and the probability of the other condition happening returns to (0.3).