library(languageR)
R terminal. Extract the column names from the data frame. Also extract the number of rows.spanishMeta
## Author YearOfBirth TextName PubDate Nwords FullName
## 1 C 1916 X14458gll 1983 2972 Cela
## 2 C 1916 X14459gll 1951 3040 Cela
## 3 C 1916 X14460gll 1956 3066 Cela
## 4 C 1916 X14461gll 1948 3044 Cela
## 5 C 1916 X14462gll 1942 3053 Cela
## 6 M 1943 X14463gll 1986 3013 Mendoza
## 7 M 1943 X14464gll 1992 3049 Mendoza
## 8 M 1943 X14465gll 1989 3042 Mendoza
## 9 M 1943 X14466gll 1982 3039 Mendoza
## 10 M 1943 X14467gll 2002 3045 Mendoza
## 11 V 1936 X14472gll 1965 3037 VargasLLosa
## 12 V 1936 X14473gll 1963 3067 VargasLLosa
## 13 V 1936 X14474gll 1977 3020 VargasLLosa
## 14 V 1936 X14475gll 1987 3016 VargasLLosa
## 15 V 1936 X14476gll 1981 3054 VargasLLosa
colnames(spanishMeta)
## [1] "Author" "YearOfBirth" "TextName" "PubDate" "Nwords"
## [6] "FullName"
nrow(spanishMeta)
## [1] 15
meta for each author. Also calculate the mean publication date of the texts sampled for each author.meta <- spanishMeta
xtabs(~ Author, meta)
## Author
## C M V
## 5 5 5
tapply(meta$PubDate, meta$Author, mean)
## C M V
## 1956.0 1990.2 1974.6
meta by year of birth (YearOfBirth) and the number of words sampled from the texts (Nwords).meta[order(meta$YearOfBirth, meta$Nwords), ]
## Author YearOfBirth TextName PubDate Nwords FullName
## 1 C 1916 X14458gll 1983 2972 Cela
## 2 C 1916 X14459gll 1951 3040 Cela
## 4 C 1916 X14461gll 1948 3044 Cela
## 5 C 1916 X14462gll 1942 3053 Cela
## 3 C 1916 X14460gll 1956 3066 Cela
## 14 V 1936 X14475gll 1987 3016 VargasLLosa
## 13 V 1936 X14474gll 1977 3020 VargasLLosa
## 11 V 1936 X14472gll 1965 3037 VargasLLosa
## 15 V 1936 X14476gll 1981 3054 VargasLLosa
## 12 V 1936 X14473gll 1963 3067 VargasLLosa
## 6 M 1943 X14463gll 1986 3013 Mendoza
## 9 M 1943 X14466gll 1982 3039 Mendoza
## 8 M 1943 X14465gll 1989 3042 Mendoza
## 10 M 1943 X14467gll 2002 3045 Mendoza
## 7 M 1943 X14464gll 1992 3049 Mendoza
(There is ambiguity here. I assume the question means to use Nwords to break ties. If not, we can similarly sort meta twice.)
meta. Sort this vector. Consult the help page for sort() and sort the vector in reverse numerical order. Also sort the row names of meta.pubdates <- meta$PubDate
sort(pubdates)
## [1] 1942 1948 1951 1956 1963 1965 1977 1981 1982 1983 1986 1987 1989 1992
## [15] 2002
sort(pubdates, decreasing=TRUE)
## [1] 2002 1992 1989 1987 1986 1983 1982 1981 1977 1965 1963 1956 1951 1948
## [15] 1942
sort(rownames(meta))
## [1] "1" "10" "11" "12" "13" "14" "15" "2" "3" "4" "5" "6" "7" "8"
## [15] "9"
meta all rows with texts that were published before 1980.meta[pubdates < 1980, ]
## Author YearOfBirth TextName PubDate Nwords FullName
## 2 C 1916 X14459gll 1951 3040 Cela
## 3 C 1916 X14460gll 1956 3066 Cela
## 4 C 1916 X14461gll 1948 3044 Cela
## 5 C 1916 X14462gll 1942 3053 Cela
## 11 V 1936 X14472gll 1965 3037 VargasLLosa
## 12 V 1936 X14473gll 1963 3067 VargasLLosa
## 13 V 1936 X14474gll 1977 3020 VargasLLosa
length(). Recalculate the mean year of publication by means of the functions sum() and length().mean(pubdates)
## [1] 1973.6
sum(pubdates) / length(pubdates)
## [1] 1973.6
(I understand the point of this exercise. Still, it is really weird: dates are not additive, so it makes little sense to calculate the mean…)
composer = data.frame(Author = c("Cela","Mendoza","VargasLLosa"),
Favorite = c("Stravinsky", "Bach", "Villa-Lobos"))
composer
## Author Favorite
## 1 Cela Stravinsky
## 2 Mendoza Bach
## 3 VargasLLosa Villa-Lobos
Add the information in this new data frame to meta with merge().
merge(meta, composer, by.x="FullName", by.y="Author")
## FullName Author YearOfBirth TextName PubDate Nwords Favorite
## 1 Cela C 1916 X14458gll 1983 2972 Stravinsky
## 2 Cela C 1916 X14459gll 1951 3040 Stravinsky
## 3 Cela C 1916 X14460gll 1956 3066 Stravinsky
## 4 Cela C 1916 X14461gll 1948 3044 Stravinsky
## 5 Cela C 1916 X14462gll 1942 3053 Stravinsky
## 6 Mendoza M 1943 X14463gll 1986 3013 Bach
## 7 Mendoza M 1943 X14464gll 1992 3049 Bach
## 8 Mendoza M 1943 X14465gll 1989 3042 Bach
## 9 Mendoza M 1943 X14466gll 1982 3039 Bach
## 10 Mendoza M 1943 X14467gll 2002 3045 Bach
## 11 VargasLLosa V 1936 X14472gll 1965 3037 Villa-Lobos
## 12 VargasLLosa V 1936 X14473gll 1963 3067 Villa-Lobos
## 13 VargasLLosa V 1936 X14474gll 1977 3020 Villa-Lobos
## 14 VargasLLosa V 1936 X14475gll 1987 3016 Villa-Lobos
## 15 VargasLLosa V 1936 X14476gll 1981 3054 Villa-Lobos
warlpiri (data courtesy Carmel O’Shannessy) provides information about the use of the ergative case in Lajamanu Warlpiri. Data were elicited for adults and children of various ages. The question of interest is to what extent the use of the ergative case marker is predictable from the animacy of the subject, word order, and the age of the speaker (adult versus child). Explore this data set with respect to this issue by means of a mosaic plot. (First construct a contingency table with xtabs(), then supply this contingency table as argument to mosaicplot().)warlpiri.xtabs <- xtabs(~ CaseMarking + AnimacyOfSubject + WordOrder + AgeGroup,
warlpiri)
mosaicplot(warlpiri.xtabs, main="Ergative case in Lajamanu Warlpiri")
heid2. Both reaction times and frequencies are logarithmically transformed. Use exp() to undo these transformations and make a scatterplot of the averaged reaction times (MeanRT) against the frequency of the base (BaseFrequency). Compare this scatterplot with a scatterplot using the log-transformed values.heid2 <- aggregate(heid$RT, list(heid$Word), mean)
colnames(heid2) <- c("Word", "MeanRT")
items <- heid[, c("Word", "BaseFrequency")]
items <- unique(items)
heid2 <- merge(heid2, items, by.x = "Word", by.y = "Word")
The scatterplot using the log-transformed values is below on the left, and the one after undoing the log-transformations is on the right:
par(mfrow=c(1,2))
plot(heid2$BaseFrequency, heid2$MeanRT,
xlab="log BaseFrequency", ylab="log MeanRT")
heid2$BaseFrequency <- exp(heid2$BaseFrequency)
heid2$MeanRT <- exp(heid2$MeanRT)
plot(heid2$BaseFrequency, heid2$MeanRT,
xlab="BaseFrequency", ylab="MeanRT")
Since most of the words have frequencies close to 0 and a few words have high frequencies, we can see that if we do not have log-transformations (the right plot), most of the points will be crammed near the y-axis, making it very difficult to see the trend of the data. In contrast, with log-transformations, the points in the left plot better reveal the trend that MeanRT decreases as BaseFrequency increases.
moby is a character vector with the text of Melville’s Moby Dick. In this exercise, we consider whether Zipf’s law holds for Moby Dick. According to Zipf’s law [Zipf, 1949], the frequency of a word is inversely proportional to its rank in a numerically sorted list. The word with the highest frequency has rank 1, the word with the one but highest frequency has rank 2, etc. If Zipf’s law holds, a plot of log frequency against log rank should reveal a straight line. We make a table of word frequencies with table() - we cannot use xtabs(), because words is a vector and xtabs() expects a data frame - and sort the frequencies in reverse numerical order.moby.table = table(moby)
moby.table = sort(moby.table, decreasing = TRUE)
moby.table[1:5]
## moby
## the of and a to
## 13717 6512 6008 4551 4514
We now have the word frequencies. We use the colon operator and length(), which returns the length of a vector, to construct the corresponding ranks.
ranks = 1 : length(moby.table)
ranks[1:5]
## [1] 1 2 3 4 5
Make a scatterplot of log frequency against log rank.
plot(log(ranks), log(moby.table))
We can see that the points are indeed roughly on a straight line, especially in the middle range of the data.
Trial in the data set lexdec specifies, for each subject, the trial number of the responses. For a given subject, the first trial in the experiment has trial number 1, the second has trial number 2, etc. Use xylowess.fnc() to explore the possibility that the subjects proceeded through the experiment in different ways, some revealing effects of learning, and others effects of fatigue.xylowess.fnc(RT ~ Trial | Subject, lexdec)
We can see that as the number of trial increases, some subjects’ RTs decrease (e.g., T2 and J), suggesting effects of learning. On the other hand, some subjects, such as D, clear have increased RTs, showing effects of fatigue.