library(languageR)
The data set spanishMeta contains metadata about fifteen texts sampled from three Spanish authors. Each line in this file provides information on a single text. Later in this book we will consider whether these authors can be distinguished on the basis of the quantitative characteristics of their personal styles (gauged by the relative frequencies of function words and tag trigrams).
1. Display this data frame in the R terminal.
meta <- spanishMeta
# imprimo el dataframe completo
meta
## Author YearOfBirth TextName PubDate Nwords FullName
## 1 C 1916 X14458gll 1983 2972 Cela
## 2 C 1916 X14459gll 1951 3040 Cela
## 3 C 1916 X14460gll 1956 3066 Cela
## 4 C 1916 X14461gll 1948 3044 Cela
## 5 C 1916 X14462gll 1942 3053 Cela
## 6 M 1943 X14463gll 1986 3013 Mendoza
## 7 M 1943 X14464gll 1992 3049 Mendoza
## 8 M 1943 X14465gll 1989 3042 Mendoza
## 9 M 1943 X14466gll 1982 3039 Mendoza
## 10 M 1943 X14467gll 2002 3045 Mendoza
## 11 V 1936 X14472gll 1965 3037 VargasLLosa
## 12 V 1936 X14473gll 1963 3067 VargasLLosa
## 13 V 1936 X14474gll 1977 3020 VargasLLosa
## 14 V 1936 X14475gll 1987 3016 VargasLLosa
## 15 V 1936 X14476gll 1981 3054 VargasLLosa
# la función str permite imprimir la estructura de los dataframes de
# manera compacta: muy útil cuando los dataframes son grandes
str(meta)
## 'data.frame': 15 obs. of 6 variables:
## $ Author : Factor w/ 3 levels "C","M","V": 1 1 1 1 1 2 2 2 2 2 ...
## $ YearOfBirth: int 1916 1916 1916 1916 1916 1943 1943 1943 1943 1943 ...
## $ TextName : Factor w/ 15 levels "X14458gll","X14459gll",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ PubDate : int 1983 1951 1956 1948 1942 1986 1992 1989 1982 2002 ...
## $ Nwords : int 2972 3040 3066 3044 3053 3013 3049 3042 3039 3045 ...
## $ FullName : Factor w/ 3 levels "Cela","Mendoza",..: 1 1 1 1 1 2 2 2 2 2 ...
Extract the column names from the data frame. Also extract the number of rows.
(filas <- rownames(meta))
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"
## [15] "15"
(columnas <- colnames(meta))
## [1] "Author" "YearOfBirth" "TextName" "PubDate" "Nwords"
## [6] "FullName"
2. Calculate how many different texts are available in meta for each author.
levels(meta$FullName)
## [1] "Cela" "Mendoza" "VargasLLosa"
(textosXautor.xtab <- xtabs(~FullName, TextName, data = meta))
## FullName
## Cela Mendoza VargasLLosa
## 5 5 5
Also calculate the mean publication date of the texts sampled for each author.
(fechapubmedia <- tapply(meta$PubDate, meta$FullName, mean))
## Cela Mendoza VargasLLosa
## 1956 1990 1975
3. Sort the rows in meta by year of birth (YearOfBirth) and the number of words sampled from the texts (Nwords).
meta[order(meta$YearOfBirth, meta$Nwords), ]
## Author YearOfBirth TextName PubDate Nwords FullName
## 1 C 1916 X14458gll 1983 2972 Cela
## 2 C 1916 X14459gll 1951 3040 Cela
## 4 C 1916 X14461gll 1948 3044 Cela
## 5 C 1916 X14462gll 1942 3053 Cela
## 3 C 1916 X14460gll 1956 3066 Cela
## 14 V 1936 X14475gll 1987 3016 VargasLLosa
## 13 V 1936 X14474gll 1977 3020 VargasLLosa
## 11 V 1936 X14472gll 1965 3037 VargasLLosa
## 15 V 1936 X14476gll 1981 3054 VargasLLosa
## 12 V 1936 X14473gll 1963 3067 VargasLLosa
## 6 M 1943 X14463gll 1986 3013 Mendoza
## 9 M 1943 X14466gll 1982 3039 Mendoza
## 8 M 1943 X14465gll 1989 3042 Mendoza
## 10 M 1943 X14467gll 2002 3045 Mendoza
## 7 M 1943 X14464gll 1992 3049 Mendoza
4. Extract the vector of publication dates from meta.
(fechaspub <- meta$PubDate)
## [1] 1983 1951 1956 1948 1942 1986 1992 1989 1982 2002 1965 1963 1977 1987
## [15] 1981
Sort this vector. Consult the help page for sort() and sort the vector in reverse numerical order.
sort(fechaspub)
## [1] 1942 1948 1951 1956 1963 1965 1977 1981 1982 1983 1986 1987 1989 1992
## [15] 2002
sort(fechaspub, decreasing = TRUE)
## [1] 2002 1992 1989 1987 1986 1983 1982 1981 1977 1965 1963 1956 1951 1948
## [15] 1942
Also sort the row names of meta.
sort(rownames(meta), decreasing = TRUE)
## [1] "9" "8" "7" "6" "5" "4" "3" "2" "15" "14" "13" "12" "11" "10"
## [15] "1"
# ¡OJO! los números de la columna son cadenas, para ordenarlos
# numéricamente hay que transformarlos a enteros
sort(as.integer(rownames(meta)), decreasing = TRUE)
## [1] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
5. Extract from meta all rows with texts that were published before 1980.
(antes1980 <- meta[meta$PubDate < 1980, ])
## Author YearOfBirth TextName PubDate Nwords FullName
## 2 C 1916 X14459gll 1951 3040 Cela
## 3 C 1916 X14460gll 1956 3066 Cela
## 4 C 1916 X14461gll 1948 3044 Cela
## 5 C 1916 X14462gll 1942 3053 Cela
## 11 V 1936 X14472gll 1965 3037 VargasLLosa
## 12 V 1936 X14473gll 1963 3067 VargasLLosa
## 13 V 1936 X14474gll 1977 3020 VargasLLosa
6. Calculate the mean publication date for all texts. The arithmetic mean is defined as the sum of the observations in a vector divided by the number of elements in the vector. The length of a vector is provided by the function length(). Recalculate the mean year of publication by means of the functions sum() and length().
(fechamedia <- mean(meta$PubDate))
## [1] 1974
(fechamedia2 <- sum(meta$PubDate)/length(meta$PubDate))
## [1] 1974
7. We create a new data frame with fictitious information on each author’s favorite composer with the function data.frame().
(composer <- data.frame(Author = c("Cela", "Mendoza", "VargasLLosa"), Favorite = c("Stravinsky",
"Bach", "Villa-Lobos")))
## Author Favorite
## 1 Cela Stravinsky
## 2 Mendoza Bach
## 3 VargasLLosa Villa-Lobos
Add the information in this new data frame to meta with merge().
(nuevometa <- merge(meta, composer, by.x = "FullName", by.y = "Author"))
## FullName Author YearOfBirth TextName PubDate Nwords Favorite
## 1 Cela C 1916 X14458gll 1983 2972 Stravinsky
## 2 Cela C 1916 X14459gll 1951 3040 Stravinsky
## 3 Cela C 1916 X14460gll 1956 3066 Stravinsky
## 4 Cela C 1916 X14461gll 1948 3044 Stravinsky
## 5 Cela C 1916 X14462gll 1942 3053 Stravinsky
## 6 Mendoza M 1943 X14463gll 1986 3013 Bach
## 7 Mendoza M 1943 X14464gll 1992 3049 Bach
## 8 Mendoza M 1943 X14465gll 1989 3042 Bach
## 9 Mendoza M 1943 X14466gll 1982 3039 Bach
## 10 Mendoza M 1943 X14467gll 2002 3045 Bach
## 11 VargasLLosa V 1936 X14472gll 1965 3037 Villa-Lobos
## 12 VargasLLosa V 1936 X14473gll 1963 3067 Villa-Lobos
## 13 VargasLLosa V 1936 X14474gll 1977 3020 Villa-Lobos
## 14 VargasLLosa V 1936 X14475gll 1987 3016 Villa-Lobos
## 15 VargasLLosa V 1936 X14476gll 1981 3054 Villa-Lobos