library(languageR)
library(MASS)
library(lattice)
1. The data set warlpiri (data courtesy Carmel O’Shannessy) provides information about the use of the ergative case in Lajamanu Warlpiri. Data were elicited for adults and children of various ages. The question of interest is to what extent the use of the ergative case marker is predictable from the animacy of the subject, word order, and the age of the speaker (adult versus child). Explore this data set with respect to this issue by means of a mosaic plot. (First construct a contingency table with xtabs(), then supply this contingency table as argument to mosaicplot().)
str(warlpiri)
## 'data.frame': 347 obs. of 9 variables:
## $ Speaker : Factor w/ 27 levels "Sub10","Sub1027",..: 27 27 27 27 27 27 27 27 27 27 ...
## $ Sentence : Factor w/ 343 levels "and_nganayi_kurdu_pawu_ngu_manu_maliki_ji_",..: 50 73 76 145 153 233 248 258 259 260 ...
## $ AgeGroup : Factor w/ 2 levels "adult","child": 2 2 2 2 2 2 2 2 2 2 ...
## $ CaseMarking : Factor w/ 2 levels "ergative","other": 1 1 1 1 1 1 1 1 1 1 ...
## $ WordOrder : Factor w/ 2 levels "subInitial","subNotInitial": 2 1 1 2 1 1 1 2 1 2 ...
## $ AnimacyOfSubject : Factor w/ 2 levels "animate","inanimate": 1 1 1 1 1 2 2 1 1 2 ...
## $ OvertnessOfObject: Factor w/ 2 levels "notOvert","overt": 2 1 1 1 1 1 2 2 1 1 ...
## $ AnimacyOfObject : Factor w/ 2 levels "animate","inanimate": 1 1 1 1 1 1 1 1 2 1 ...
## $ Text : Factor w/ 3 levels "texta","textb",..: 1 2 2 1 1 3 3 3 3 3 ...
(warlpiri.xtabs <- xtabs(~CaseMarking + AnimacyOfSubject + WordOrder + AgeGroup,
data = warlpiri))
## , , WordOrder = subInitial, AgeGroup = adult
##
## AnimacyOfSubject
## CaseMarking animate inanimate
## ergative 102 16
## other 9 5
##
## , , WordOrder = subNotInitial, AgeGroup = adult
##
## AnimacyOfSubject
## CaseMarking animate inanimate
## ergative 38 9
## other 4 4
##
## , , WordOrder = subInitial, AgeGroup = child
##
## AnimacyOfSubject
## CaseMarking animate inanimate
## ergative 64 13
## other 23 4
##
## , , WordOrder = subNotInitial, AgeGroup = child
##
## AnimacyOfSubject
## CaseMarking animate inanimate
## ergative 40 10
## other 3 3
mosaicplot(warlpiri.xtabs, main = "Usage of ergative")
2. In Chapter 1 we created a data frame with mean reaction times and mean base frequencies for neologisms in the Dutch suffix -heid. Reconstruct the data frame heid2. Both reaction times and frequencies are logarithmically transformed. Use exp() to undo these transformations and make a scatterplot of the averaged reaction times (MeanRT) against the frequency of the base (BaseFrequency). Compare this scatterplot with a scatterplot using the log-transformed values.
# reconstruimos heid2 tal y como aparece en pág. 17
heid2 <- aggregate(heid$RT, list(heid$Word), mean)
colnames(heid2) <- c("Word", "MeanRT")
items <- heid[, c("Word", "BaseFrequency")]
items <- unique(items)
(heid2 <- merge(heid2, items, by.x = "Word", by.y = "Word"))
## Word MeanRT BaseFrequency
## 1 aftandsheid 6.705 4.20
## 2 antiekheid 6.542 6.75
## 3 banaalheid 6.588 5.74
## 4 basaalheid 6.586 3.56
## 5 bebrildheid 6.673 3.61
## 6 beschutheid 6.552 4.79
## 7 beuheid 6.637 5.07
## 8 bezweetheid 6.500 4.75
## 9 blusbaarheid 6.691 0.00
## 10 contentheid 6.548 4.50
## 11 coulantheid 6.538 1.79
## 12 dementheid 6.524 3.83
## 13 enormheid 6.453 8.42
## 14 erkendheid 6.466 6.52
## 15 gammelheid 6.578 4.91
## 16 geurloosheid 6.539 2.56
## 17 jofelheid 6.505 3.33
## 18 kalkrijkheid 6.644 3.14
## 19 koketheid 6.587 4.79
## 20 kortafheid 6.508 5.87
## 21 labielheid 6.565 4.38
## 22 lobbigheid 6.660 0.00
## 23 ludiekheid 6.666 4.14
## 24 markantheid 6.594 5.16
## 25 onattentheid 6.610 1.61
## 26 onbelastheid 6.692 3.18
## 27 ondiepheid 6.612 5.66
## 28 ontroerdheid 6.481 5.55
## 29 onwelheid 6.645 3.97
## 30 ovaalheid 6.487 5.86
## 31 pipsheid 6.571 3.33
## 32 pitloosheid 6.542 0.00
## 33 riantheid 6.632 4.53
## 34 royaalheid 6.543 6.09
## 35 saprijkheid 6.687 0.00
## 36 summierheid 6.666 5.30
## 37 tactvolheid 6.583 4.75
## 38 tembaarheid 6.640 0.00
## 39 tilbaarheid 6.638 0.00
## 40 visrijkheid 6.521 3.04
colnames(heid2)
## [1] "Word" "MeanRT" "BaseFrequency"
# renombro algunas columnas
colnames(heid2)[2] <- "logMeanRT"
colnames(heid2)[3] <- "logBaseFrequency"
# añado dos nuevas columnas con los valores en escala natural
heid2$MeanRT <- exp(heid2$logMeanRT)
heid2$BaseFrequency <- exp(heid2$logBaseFrequency)
str(heid2)
## 'data.frame': 40 obs. of 5 variables:
## $ Word : Factor w/ 40 levels "aftandsheid",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ logMeanRT : num 6.71 6.54 6.59 6.59 6.67 ...
## $ logBaseFrequency: num 4.2 6.75 5.74 3.56 3.61 4.79 5.07 4.75 0 4.5 ...
## $ MeanRT : num 816 694 726 725 791 ...
## $ BaseFrequency : num 66.7 854.1 311.1 35.2 37 ...
# pinto los scatterplots
par(mfrow = c(2, 1))
plot(heid2$MeanRT, heid2$BaseFrequency, xlab = "Mean RT", ylab = "Base Frequency")
plot(heid2$logMeanRT, heid2$logBaseFrequency, xlab = "Log Mean RT", ylab = "Log Base Frequency")
3. The data set moby is a character vector with the text of Melville’s Moby Dick. In this exercise, we consider whether Zipf’s law holds for Moby Dick. According to Zipf’s law (Zipf, 1949), the frequency of a word is inversely proportional to its rank in a numerically sorted list. The word with the highest frequency has rank 1, the word with the next highest frequency has rank 2, etc. If Zipf’s law holds, a plot of log frequency against log rank should reveal a straight line. We make a table of word frequencies with table()—we cannot use xtabs(), because words is a vector and xtabs() expects a data frame—and sort the frequencies in reverse numerical order:
moby.table <- table(moby)
moby.table <- sort(moby.table, decreasing = TRUE)
moby.table[1:10]
## moby
## the of and a to in that his it I
## 13717 6512 6008 4551 4514 3908 2982 2457 2209 2122
We now have the word frequencies. We use the colon operator and length(), which returns the length of a vector, to construct the corresponding ranks:
ranks <- 1:length(moby.table)
ranks[1:10]
## [1] 1 2 3 4 5 6 7 8 9 10
Make a scatterplot of log frequency against log rank.
par(mfrow = c(1, 1))
plot(log(ranks), log(moby.table), xlab = "Log Rank", ylab = "Log Frequency")
4. The column labeled Trial in the data set lexdec specifies, for each subject, the trial number of the responses. For a given subject, the first trial in the experiment has trial number 1, the second has trial number 2, etc. Use xylowess.fnc() to explore the possibility that the subjects proceeded through the experiment in different ways, some revealing effects of learning, and others effects of fatigue.
par(mfrow = c(1, 1))
xylowess.fnc(RT ~ Trial | Subject, data = lexdec, xlab = "trial number", ylab = "log reaction time")
# el gráfico sugiere que sujetos como T2 muestran efecto aprendizaje (a
# medida que hacen más pruebas, reducen su tiempo de respuesta); otros
# sujetos como D muestran claramente fatiga, ya que su tiempo de respuesta
# aumenta en las últimas pruebas que realizan.
5. The data set english lists lexical decision and word naming latencies for two age groups. Inspect the distribution of the naming latencies (RTnaming). First plot a histogram for the naming latencies with truehist(). Then plot the density.
par(mfrow = c(2, 1))
truehist(english$RTnaming, col = "lightgrey", xlab = "naming latencies", main = "histogram")
plot(density(english$RTnaming), main = "density for naming latencies")
The voicekey registering the naming responses is sensitive to the different acoustic properties of a word’s initial phoneme. The column Voice specifies whether a word’s initial phoneme was voiced or voiceless. Use bwplot() to make a trellis boxplot for the distribution of the naming latencies across voiced and voiceless phonemes with the age group of the subjects (AgeSubject) as grouping factor.
par(mfrow = c(1, 1))
bwplot(RTnaming ~ Voice | AgeSubject, data = english, groups = levels(english$Voice),
xlab = "Initial phoneme", ylab = "naming latencies")
# ¿coinciden las medianas?
(tapply(english[english$AgeSubject == "old", ]$RTnaming, english[english$AgeSubject ==
"old", ]$Voice, median))
## voiced voiceless
## 6.476 6.500
(tapply(english[english$AgeSubject == "young", ]$RTnaming, english[english$AgeSubject ==
"young", ]$Voice, median))
## voiced voiceless
## 6.130 6.165