This week is what I call “loop functions” in R, which are functions that allow you to execute loop-like behavior in a compact form. These functions typically have the word “apply” in them and are particularly convenient when you need to execute a loop on the command line when using R interactively. These functions are some of the more interesting functions of the R language.
Writing for, while loops is useful when programming but not particularly easy when working interactively on the command line. There are some functions which implement looping to make life easier.
An auxiliary function split is also useful, particularly in conjunction with lapply.
lapply: Loop over a list and evaluate a function on each element
sapply: Same as lapply but try to simplify the result
apply: Apply a function over the margins of an array
tapply: Apply a function over subsets of a vector
mapply: Multivariate version of lapply
lapply takes three arguments: (1) a list X; (2) a function (or the name of a function) FUN; (3) other arguments via its … argument. If X is not a list, it will be coerced to a list using as.list.
lapply always returns a list, regardless of the class of the input.
## $a
## [1] 3
##
## $b
## [1] -0.09775161
## $a
## [1] 2.5
##
## $b
## [1] 0.3634831
##
## $c
## [1] 0.6652225
##
## $d
## [1] 4.976192
## [[1]]
## [1] 5.297485
##
## [[2]]
## [1] 0.2882099 7.7498925
##
## [[3]]
## [1] 8.652340 2.975772 5.403049
##
## [[4]]
## [1] 0.49798448 7.04317793 7.11307719 0.03610844
lapply and friends make heavy use of anonymous functions.
## $a
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## $b
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
#An anonymous function for extracting the first column of each matrix.
lapply(x, function(elt) elt[,1])## $a
## [1] 1 2
##
## $b
## [1] 1 2 3
sapply will try to simplify the result of lapply if possible.
If the result is a list where every element is length 1, then a vector is returned
If the result is a list where every element is a vector of the same length (> 1), a matrix is returned.
If it can’t figure things out, a list is returned
#### ex
## $a
## [1] 2.5
##
## $b
## [1] 0.7113761
##
## $c
## [1] 1.087103
##
## $de
## [1] 5.085903
## a b c de
## 2.5000000 0.7113761 1.0871031 5.0859026
## Warning in mean.default(x): argument is not numeric or logical: returning NA
## [1] NA
apply is used to a evaluate a function (often an anonymous one) over the margins of an array.
It is most often used to apply a function to the rows or columns of a matrix
It can be used with general arrays, e.g. taking the average of an array of matrices
It is not really faster than writing a loop, but it works in one line!
## function (X, MARGIN, FUN, ...)
X is an array
MARGIN is an integer vector indicating which margins should be “retained”.
FUN is a function to be applied
… is for other arguments to be passed to FUN
·
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [1] 5 7 9
## [1] 6 15
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## 25% -0.6757766 -0.1301717 -0.4645866 -0.2652885 -0.2390933 -0.6922377 0.2949940
## 75% 0.1625239 0.9880443 0.4148126 0.7966673 0.6607092 0.8237715 0.9689675
## [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## 25% -0.9211555 -1.4433841 -0.7638743 0.03183095 -0.1327484 0.1416290 -0.5415094
## 75% 0.3238000 0.3993523 0.1504409 0.93155607 1.2156302 0.7694749 0.7011250
## [,15] [,16] [,17] [,18] [,19] [,20]
## 25% 0.0257989 -1.1401260 -1.1654758 -0.5814301 -0.04017409 -0.2467302
## 75% 0.5632047 0.3206481 0.6079709 0.9306677 0.25727992 1.5483479
mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments.
## function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
FUN is a function to apply
… contains arguments to apply over
MoreArgs is a list of other arguments to FUN.
SIMPLIFY indicates whether the result should be simplified
tapply is used to apply a function over subsets of a vector.
## function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
X is a vector
INDEX is a factor or a list of factors (or else they are coerced to factors)
FUN is a function to be applied
… contains other arguments to be passed FUN
simplify, should we simplify the result?
See Exercises: Go
split takes a vector or other objects and splits it into groups determined by a factor or list of factors.
## function (x, f, drop = FALSE, ...)
x is a vector (or list) or data frame
f is a factor (or coerced to one) or a list of factors
drop indicates whether empty factors levels should be dropped
Example
#create data frame
N <- 12
sex <- sample(c("f", "m"), N, replace=TRUE)
group <- sample(rep(c("CG", "WL", "T"), 4), N, replace=FALSE)
age <- sample(18:35, N, replace=TRUE)
IQ <- round(rnorm(N, mean=100, sd=15))
rating <- round(runif(N, min=0, max=6))
(myDf <- data.frame(id=1:N, sex, group, age, IQ, rating))## id sex group age IQ rating
## 1 1 f T 31 88 5
## 2 2 m CG 35 81 4
## 3 3 f WL 32 95 5
## 4 4 m T 29 103 5
## 5 5 f CG 32 98 1
## 6 6 f WL 20 96 4
## 7 7 f CG 24 89 5
## 8 8 m T 18 124 2
## 9 9 m WL 19 90 6
## 10 10 f CG 24 100 0
## 11 11 f WL 34 83 3
## 12 12 m T 18 94 2
Group by the ‘group’ variable
## $CG
## id sex group age IQ rating
## 2 2 m CG 35 81 4
## 5 5 f CG 32 98 1
## 7 7 f CG 24 89 5
## 10 10 f CG 24 100 0
##
## $T
## id sex group age IQ rating
## 1 1 f T 31 88 5
## 4 4 m T 29 103 5
## 8 8 m T 18 124 2
## 12 12 m T 18 94 2
##
## $WL
## id sex group age IQ rating
## 3 3 f WL 32 95 5
## 6 6 f WL 20 96 4
## 9 9 m WL 19 90 6
## 11 11 f WL 34 83 3
## $f.CG
## id sex group age IQ rating
## 5 5 f CG 32 98 1
## 7 7 f CG 24 89 5
## 10 10 f CG 24 100 0
##
## $m.CG
## id sex group age IQ rating
## 2 2 m CG 35 81 4
##
## $f.T
## id sex group age IQ rating
## 1 1 f T 31 88 5
##
## $m.T
## id sex group age IQ rating
## 4 4 m T 29 103 5
## 8 8 m T 18 124 2
## 12 12 m T 18 94 2
##
## $f.WL
## id sex group age IQ rating
## 3 3 f WL 32 95 5
## 6 6 f WL 20 96 4
## 11 11 f WL 34 83 3
##
## $m.WL
## id sex group age IQ rating
## 9 9 m WL 19 90 6
## X name landmass zone area population language religion bars stripes
## 1 1 Afghanistan 5 1 648 16 10 2 0 3
## 2 2 Albania 3 1 29 3 6 6 0 0
## 3 3 Algeria 4 1 2388 20 8 2 2 0
## 4 4 American-Samoa 6 3 0 0 1 1 0 0
## 5 5 Andorra 3 1 0 0 6 0 3 0
## 6 6 Angola 4 2 1247 7 10 5 0 2
## colours red green blue gold white black orange mainhue circles crosses
## 1 5 1 1 0 1 1 1 0 green 0 0
## 2 3 1 0 0 1 0 1 0 red 0 0
## 3 3 1 1 0 0 1 0 0 green 0 0
## 4 5 1 0 1 1 1 0 1 blue 0 0
## 5 3 1 0 1 1 0 0 0 gold 0 0
## 6 3 1 0 0 1 0 1 0 red 0 0
## saltires quarters sunstars crescent triangle icon animate text topleft
## 1 0 0 1 0 0 1 0 0 black
## 2 0 0 1 0 0 0 1 0 red
## 3 0 0 1 1 0 0 0 0 green
## 4 0 0 0 0 1 1 1 0 blue
## 5 0 0 0 0 0 0 0 0 blue
## 6 0 0 1 0 0 1 0 0 red
## botright
## 1 green
## 2 red
## 3 white
## 4 red
## 5 red
## 6 black
## [1] 194 31
#To open a more complete description of the dataset in a separate text file, type viewinfo()
class(flags)## [1] "data.frame"
The lapply() function takes a list as input, applies a function to each element of the list, then returns a list of the same length as the original one.
Since a data frame is really just a list of vectors (you can see this with as.list(flags)), we can use lapply() to apply the class() function to each column of the flags dataset. Let’s see it in action!
## $X
## [1] "integer"
##
## $name
## [1] "character"
##
## $landmass
## [1] "integer"
##
## $zone
## [1] "integer"
##
## $area
## [1] "integer"
##
## $population
## [1] "integer"
##
## $language
## [1] "integer"
##
## $religion
## [1] "integer"
##
## $bars
## [1] "integer"
##
## $stripes
## [1] "integer"
##
## $colours
## [1] "integer"
##
## $red
## [1] "integer"
##
## $green
## [1] "integer"
##
## $blue
## [1] "integer"
##
## $gold
## [1] "integer"
##
## $white
## [1] "integer"
##
## $black
## [1] "integer"
##
## $orange
## [1] "integer"
##
## $mainhue
## [1] "character"
##
## $circles
## [1] "integer"
##
## $crosses
## [1] "integer"
##
## $saltires
## [1] "integer"
##
## $quarters
## [1] "integer"
##
## $sunstars
## [1] "integer"
##
## $crescent
## [1] "integer"
##
## $triangle
## [1] "integer"
##
## $icon
## [1] "integer"
##
## $animate
## [1] "integer"
##
## $text
## [1] "integer"
##
## $topleft
## [1] "character"
##
## $botright
## [1] "character"
he ‘l’ in ‘lapply’ stands for ‘list’. Type class(cls_list) to confirm that lapply() returned a list.
## [1] "list"
As expected, we got a list of length 30 – one element for each variable/column. The output would be considerably more | compact if we could represent it as a vector instead of a list.
You may remember from a previous lesson that lists are most helpful for storing multiple classes of data. In this case, since every element of the list returned by lapply() is a character vector of length one (i.e. “integer” and “vector”), cls_list can be simplified to a character vector. To do this manually, type as.character(cls_list).
## [1] "integer" "character" "integer" "integer" "integer" "integer"
## [7] "integer" "integer" "integer" "integer" "integer" "integer"
## [13] "integer" "integer" "integer" "integer" "integer" "integer"
## [19] "character" "integer" "integer" "integer" "integer" "integer"
## [25] "integer" "integer" "integer" "integer" "integer" "character"
## [31] "character"
sapply() allows you to automate this process by calling lapply() behind the scenes, but then attempting to simplify (hence| the ‘s’ in ‘sapply’) the result for you. Use sapply() the same way you used lapply() to get the class of each column of the flags dataset and store the result in cls_vect.
## [1] "character"
In general, if the result is a list where every element is of length one, then sapply() returns a vector.
If the result is a list where every element is a vector of the same length (> 1), sapply() returns a matrix.
If sapply() can’t figure things out, then it just returns a list, no different from what lapply() would give you.
Columns 11 through 17 of our dataset are indicator variables, each representing a different color. The value of the indicator variable is 1 if the color is present in a country’s flag and 0 otherwise.
herefore, if we want to know the total number of countries (in our dataset) with, for example, the color orange on their flag, we can just add up all of the 1s and 0s in the ‘orange’ column. Try sum(flags$orange to see this.
## [1] 26
Now we want to repeat this operation for each of the colors recorded in the dataset.
flag_colors<-flags[,11:17] #Note the comma before 11:17. This subsetting command tells R that we want all rows, but only columns 11 through 17.
head(flag_colors)## colours red green blue gold white black
## 1 5 1 1 0 1 1 1
## 2 3 1 0 0 1 0 1
## 3 3 1 1 0 0 1 0
## 4 5 1 0 1 1 1 0
## 5 3 1 0 1 1 0 0
## 6 3 1 0 0 1 0 1
To get a list containing the sum of each column of flag_colors, call the lapply() function with two arguments. The first argument is the object over which we are looping (i.e. flag_colors) and the second argument is the name of the function we| wish to apply to each column (i.e. sum). Remember that the second argument is just the name of the function with no parentheses, etc.
## $colours
## [1] 672
##
## $red
## [1] 153
##
## $green
## [1] 91
##
## $blue
## [1] 99
##
## $gold
## [1] 91
##
## $white
## [1] 146
##
## $black
## [1] 52
The result is a list, since lapply() always returns a list. Each element of this list is of length one, so the result can be simplified to a vector by calling sapply() instead of lapply(). Try it now.
## colours red green blue gold white black
## 672 153 91 99 91 146 52
Perhaps it’s more informative to find the proportion of flags (out of 194) containing each color. Since each column is just a bunch of 1s and 0s, the arithmetic mean of each column will give us the proportion of 1s. (If it’s not clear why, think of a simpler situation where you have three 1s and two 0s – (1 + 1 + 1 + 0 + 0)/5 = 3/5 = 0.6).
## colours red green blue gold white black
## 3.4639175 0.7886598 0.4690722 0.5103093 0.4690722 0.7525773 0.2680412
sapply() instead returns a matrix when each element of the list returned by lapply() is a vector of the same length (> 1).
## mainhue circles crosses saltires quarters
## 1 green 0 0 0 0
## 2 red 0 0 0 0
## 3 green 0 0 0 0
## 4 blue 0 0 0 0
## 5 gold 0 0 0 0
## 6 red 0 0 0 0
Each of these columns (i.e. variables) represents the number of times a particular shape or design appears on a country’s | flag. We are interested in the minimum and maximum number of times each shape or design appears.
The range() function returns the minimum and maximum of its first argument, which should be a numeric vector. Use lapply() to apply the range function to each column of flag_shapes.
## $mainhue
## [1] "black" "white"
##
## $circles
## [1] 0 4
##
## $crosses
## [1] 0 2
##
## $saltires
## [1] 0 1
##
## $quarters
## [1] 0 4
Do the same operation, but using sapply() and store the result in a variable called shape_mat.
## mainhue circles crosses saltires quarters
## [1,] "black" "0" "0" "0" "0"
## [2,] "white" "4" "2" "1" "4"
## [1] "matrix" "array"
As we’ve seen, sapply() always attempts to simplify the result given by lapply(). It has been successful in doing so for | each of the examples we’ve looked at so far. Let’s look at an example where sapply() can’t figure out how to simplify the result and thus returns a list, no different from lapply().
When given a vector, the unique() function returns a vector with all duplicate elements removed. In other words, unique() returns a vector of only the ‘unique’ elements. To see how it works, try unique(c(3, 4, 5, 5, 5, 6, 6)).
## [1] 3 4 5 6
## $X
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194
##
## $name
## [1] "Afghanistan" "Albania"
## [3] "Algeria" "American-Samoa"
## [5] "Andorra" "Angola"
## [7] "Anguilla" "Antigua-Barbuda"
## [9] "Argentina" "Argentine"
## [11] "Australia" "Austria"
## [13] "Bahamas" "Bahrain"
## [15] "Bangladesh" "Barbados"
## [17] "Belgium" "Belize"
## [19] "Benin" "Bermuda"
## [21] "Bhutan" "Bolivia"
## [23] "Botswana" "Brazil"
## [25] "British-Virgin-Isles" "Brunei"
## [27] "Bulgaria" "Burkina"
## [29] "Burma" "Burundi"
## [31] "Cameroon" "Canada"
## [33] "Cape-Verde-Islands" "Cayman-Islands"
## [35] "Central-African-Republic" "Chad"
## [37] "Chile" "China"
## [39] "Colombia" "Comorro-Islands"
## [41] "Congo" "Cook-Islands"
## [43] "Costa-Rica" "Cuba"
## [45] "Cyprus" "Czechoslovakia"
## [47] "Denmark" "Djibouti"
## [49] "Dominica" "Dominican-Republic"
## [51] "Ecuador" "Egypt"
## [53] "El-Salvador" "Equatorial-Guinea"
## [55] "Ethiopia" "Faeroes"
## [57] "Falklands-Malvinas" "Fiji"
## [59] "Finland" "France"
## [61] "French-Guiana" "French-Polynesia"
## [63] "Gabon" "Gambia"
## [65] "Germany-DDR" "Germany-FRG"
## [67] "Ghana" "Gibraltar"
## [69] "Greece" "Greenland"
## [71] "Grenada" "Guam"
## [73] "Guatemala" "Guinea"
## [75] "Guinea-Bissau" "Guyana"
## [77] "Haiti" "Honduras"
## [79] "Hong-Kong" "Hungary"
## [81] "Iceland" "India"
## [83] "Indonesia" "Iran"
## [85] "Iraq" "Ireland"
## [87] "Israel" "Italy"
## [89] "Ivory-Coast" "Jamaica"
## [91] "Japan" "Jordan"
## [93] "Kampuchea" "Kenya"
## [95] "Kiribati" "Kuwait"
## [97] "Laos" "Lebanon"
## [99] "Lesotho" "Liberia"
## [101] "Libya" "Liechtenstein"
## [103] "Luxembourg" "Malagasy"
## [105] "Malawi" "Malaysia"
## [107] "Maldive-Islands" "Mali"
## [109] "Malta" "Marianas"
## [111] "Mauritania" "Mauritius"
## [113] "Mexico" "Micronesia"
## [115] "Monaco" "Mongolia"
## [117] "Montserrat" "Morocco"
## [119] "Mozambique" "Nauru"
## [121] "Nepal" "Netherlands"
## [123] "Netherlands-Antilles" "New-Zealand"
## [125] "Nicaragua" "Niger"
## [127] "Nigeria" "Niue"
## [129] "North-Korea" "North-Yemen"
## [131] "Norway" "Oman"
## [133] "Pakistan" "Panama"
## [135] "Papua-New-Guinea" "Parguay"
## [137] "Peru" "Philippines"
## [139] "Poland" "Portugal"
## [141] "Puerto-Rico" "Qatar"
## [143] "Romania" "Rwanda"
## [145] "San-Marino" "Sao-Tome"
## [147] "Saudi-Arabia" "Senegal"
## [149] "Seychelles" "Sierra-Leone"
## [151] "Singapore" "Soloman-Islands"
## [153] "Somalia" "South-Africa"
## [155] "South-Korea" "South-Yemen"
## [157] "Spain" "Sri-Lanka"
## [159] "St-Helena" "St-Kitts-Nevis"
## [161] "St-Lucia" "St-Vincent"
## [163] "Sudan" "Surinam"
## [165] "Swaziland" "Sweden"
## [167] "Switzerland" "Syria"
## [169] "Taiwan" "Tanzania"
## [171] "Thailand" "Togo"
## [173] "Tonga" "Trinidad-Tobago"
## [175] "Tunisia" "Turkey"
## [177] "Turks-Cocos-Islands" "Tuvalu"
## [179] "UAE" "Uganda"
## [181] "UK" "Uruguay"
## [183] "US-Virgin-Isles" "USA"
## [185] "USSR" "Vanuatu"
## [187] "Vatican-City" "Venezuela"
## [189] "Vietnam" "Western-Samoa"
## [191] "Yugoslavia" "Zaire"
## [193] "Zambia" "Zimbabwe"
##
## $landmass
## [1] 5 3 4 6 1 2
##
## $zone
## [1] 1 3 2 4
##
## $area
## [1] 648 29 2388 0 1247 2777 7690 84 19 1 143 31
## [13] 23 113 47 1099 600 8512 6 111 274 678 28 474
## [25] 9976 4 623 1284 757 9561 1139 2 342 51 115 9
## [37] 128 43 22 49 284 1001 21 1222 12 18 337 547
## [49] 91 268 10 108 249 239 132 2176 109 246 36 215
## [61] 112 93 103 3268 1904 1648 435 70 301 323 11 372
## [73] 98 181 583 236 30 1760 3 587 118 333 1240 1031
## [85] 1973 1566 447 783 140 41 1267 925 121 195 324 212
## [97] 804 76 463 407 1285 300 313 92 237 26 2150 196
## [109] 72 637 1221 99 288 505 66 2506 63 17 450 185
## [121] 945 514 57 5 164 781 245 178 9363 22402 15 912
## [133] 256 905 753 391
##
## $population
## [1] 16 3 20 0 7 28 15 8 90 10 1 6 119 9 35
## [16] 4 24 2 11 1008 5 47 31 54 17 61 14 684 157 39
## [31] 57 118 13 77 12 56 18 84 48 36 22 29 38 49 45
## [46] 231 274 60
##
## $language
## [1] 10 6 8 1 2 4 3 5 7 9
##
## $religion
## [1] 2 6 1 0 5 3 4 7
##
## $bars
## [1] 0 2 3 1 5
##
## $stripes
## [1] 3 0 2 1 5 9 11 14 4 6 13 7
##
## $colours
## [1] 5 3 2 8 6 4 7 1
##
## $red
## [1] 1 0
##
## $green
## [1] 1 0
##
## $blue
## [1] 0 1
##
## $gold
## [1] 1 0
##
## $white
## [1] 1 0
##
## $black
## [1] 1 0
##
## $orange
## [1] 0 1
##
## $mainhue
## [1] "green" "red" "blue" "gold" "white" "orange" "black" "brown"
##
## $circles
## [1] 0 1 4 2
##
## $crosses
## [1] 0 1 2
##
## $saltires
## [1] 0 1
##
## $quarters
## [1] 0 1 4
##
## $sunstars
## [1] 1 0 6 22 14 3 4 5 15 10 7 2 9 50
##
## $crescent
## [1] 0 1
##
## $triangle
## [1] 0 1
##
## $icon
## [1] 1 0
##
## $animate
## [1] 0 1
##
## $text
## [1] 0 1
##
## $topleft
## [1] "black" "red" "green" "blue" "white" "orange" "gold"
##
## $botright
## [1] "green" "red" "white" "black" "blue" "gold" "orange" "brown"
Since unique_vals is a list, you can use what you’ve learned to determine the length of each element of unique_vals (i.e.| the number of unique values for each variable). Simplify the result, if possible. Hint: Apply the length() function to each element of unique_vals.
## X name landmass zone area population language
## 194 194 6 4 136 48 10
## religion bars stripes colours red green blue
## 8 5 12 8 2 2 2
## gold white black orange mainhue circles crosses
## 2 2 2 2 8 4 3
## saltires quarters sunstars crescent triangle icon animate
## 2 3 14 2 2 2 2
## text topleft botright
## 2 7 8
Use sapply() to apply the unique() function to each column of the flags dataset to see that you get the same unsimplified| list that you got from lapply().
## $X
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194
##
## $name
## [1] "Afghanistan" "Albania"
## [3] "Algeria" "American-Samoa"
## [5] "Andorra" "Angola"
## [7] "Anguilla" "Antigua-Barbuda"
## [9] "Argentina" "Argentine"
## [11] "Australia" "Austria"
## [13] "Bahamas" "Bahrain"
## [15] "Bangladesh" "Barbados"
## [17] "Belgium" "Belize"
## [19] "Benin" "Bermuda"
## [21] "Bhutan" "Bolivia"
## [23] "Botswana" "Brazil"
## [25] "British-Virgin-Isles" "Brunei"
## [27] "Bulgaria" "Burkina"
## [29] "Burma" "Burundi"
## [31] "Cameroon" "Canada"
## [33] "Cape-Verde-Islands" "Cayman-Islands"
## [35] "Central-African-Republic" "Chad"
## [37] "Chile" "China"
## [39] "Colombia" "Comorro-Islands"
## [41] "Congo" "Cook-Islands"
## [43] "Costa-Rica" "Cuba"
## [45] "Cyprus" "Czechoslovakia"
## [47] "Denmark" "Djibouti"
## [49] "Dominica" "Dominican-Republic"
## [51] "Ecuador" "Egypt"
## [53] "El-Salvador" "Equatorial-Guinea"
## [55] "Ethiopia" "Faeroes"
## [57] "Falklands-Malvinas" "Fiji"
## [59] "Finland" "France"
## [61] "French-Guiana" "French-Polynesia"
## [63] "Gabon" "Gambia"
## [65] "Germany-DDR" "Germany-FRG"
## [67] "Ghana" "Gibraltar"
## [69] "Greece" "Greenland"
## [71] "Grenada" "Guam"
## [73] "Guatemala" "Guinea"
## [75] "Guinea-Bissau" "Guyana"
## [77] "Haiti" "Honduras"
## [79] "Hong-Kong" "Hungary"
## [81] "Iceland" "India"
## [83] "Indonesia" "Iran"
## [85] "Iraq" "Ireland"
## [87] "Israel" "Italy"
## [89] "Ivory-Coast" "Jamaica"
## [91] "Japan" "Jordan"
## [93] "Kampuchea" "Kenya"
## [95] "Kiribati" "Kuwait"
## [97] "Laos" "Lebanon"
## [99] "Lesotho" "Liberia"
## [101] "Libya" "Liechtenstein"
## [103] "Luxembourg" "Malagasy"
## [105] "Malawi" "Malaysia"
## [107] "Maldive-Islands" "Mali"
## [109] "Malta" "Marianas"
## [111] "Mauritania" "Mauritius"
## [113] "Mexico" "Micronesia"
## [115] "Monaco" "Mongolia"
## [117] "Montserrat" "Morocco"
## [119] "Mozambique" "Nauru"
## [121] "Nepal" "Netherlands"
## [123] "Netherlands-Antilles" "New-Zealand"
## [125] "Nicaragua" "Niger"
## [127] "Nigeria" "Niue"
## [129] "North-Korea" "North-Yemen"
## [131] "Norway" "Oman"
## [133] "Pakistan" "Panama"
## [135] "Papua-New-Guinea" "Parguay"
## [137] "Peru" "Philippines"
## [139] "Poland" "Portugal"
## [141] "Puerto-Rico" "Qatar"
## [143] "Romania" "Rwanda"
## [145] "San-Marino" "Sao-Tome"
## [147] "Saudi-Arabia" "Senegal"
## [149] "Seychelles" "Sierra-Leone"
## [151] "Singapore" "Soloman-Islands"
## [153] "Somalia" "South-Africa"
## [155] "South-Korea" "South-Yemen"
## [157] "Spain" "Sri-Lanka"
## [159] "St-Helena" "St-Kitts-Nevis"
## [161] "St-Lucia" "St-Vincent"
## [163] "Sudan" "Surinam"
## [165] "Swaziland" "Sweden"
## [167] "Switzerland" "Syria"
## [169] "Taiwan" "Tanzania"
## [171] "Thailand" "Togo"
## [173] "Tonga" "Trinidad-Tobago"
## [175] "Tunisia" "Turkey"
## [177] "Turks-Cocos-Islands" "Tuvalu"
## [179] "UAE" "Uganda"
## [181] "UK" "Uruguay"
## [183] "US-Virgin-Isles" "USA"
## [185] "USSR" "Vanuatu"
## [187] "Vatican-City" "Venezuela"
## [189] "Vietnam" "Western-Samoa"
## [191] "Yugoslavia" "Zaire"
## [193] "Zambia" "Zimbabwe"
##
## $landmass
## [1] 5 3 4 6 1 2
##
## $zone
## [1] 1 3 2 4
##
## $area
## [1] 648 29 2388 0 1247 2777 7690 84 19 1 143 31
## [13] 23 113 47 1099 600 8512 6 111 274 678 28 474
## [25] 9976 4 623 1284 757 9561 1139 2 342 51 115 9
## [37] 128 43 22 49 284 1001 21 1222 12 18 337 547
## [49] 91 268 10 108 249 239 132 2176 109 246 36 215
## [61] 112 93 103 3268 1904 1648 435 70 301 323 11 372
## [73] 98 181 583 236 30 1760 3 587 118 333 1240 1031
## [85] 1973 1566 447 783 140 41 1267 925 121 195 324 212
## [97] 804 76 463 407 1285 300 313 92 237 26 2150 196
## [109] 72 637 1221 99 288 505 66 2506 63 17 450 185
## [121] 945 514 57 5 164 781 245 178 9363 22402 15 912
## [133] 256 905 753 391
##
## $population
## [1] 16 3 20 0 7 28 15 8 90 10 1 6 119 9 35
## [16] 4 24 2 11 1008 5 47 31 54 17 61 14 684 157 39
## [31] 57 118 13 77 12 56 18 84 48 36 22 29 38 49 45
## [46] 231 274 60
##
## $language
## [1] 10 6 8 1 2 4 3 5 7 9
##
## $religion
## [1] 2 6 1 0 5 3 4 7
##
## $bars
## [1] 0 2 3 1 5
##
## $stripes
## [1] 3 0 2 1 5 9 11 14 4 6 13 7
##
## $colours
## [1] 5 3 2 8 6 4 7 1
##
## $red
## [1] 1 0
##
## $green
## [1] 1 0
##
## $blue
## [1] 0 1
##
## $gold
## [1] 1 0
##
## $white
## [1] 1 0
##
## $black
## [1] 1 0
##
## $orange
## [1] 0 1
##
## $mainhue
## [1] "green" "red" "blue" "gold" "white" "orange" "black" "brown"
##
## $circles
## [1] 0 1 4 2
##
## $crosses
## [1] 0 1 2
##
## $saltires
## [1] 0 1
##
## $quarters
## [1] 0 1 4
##
## $sunstars
## [1] 1 0 6 22 14 3 4 5 15 10 7 2 9 50
##
## $crescent
## [1] 0 1
##
## $triangle
## [1] 0 1
##
## $icon
## [1] 1 0
##
## $animate
## [1] 0 1
##
## $text
## [1] 0 1
##
## $topleft
## [1] "black" "red" "green" "blue" "white" "orange" "gold"
##
## $botright
## [1] "green" "red" "white" "black" "blue" "gold" "orange" "brown"
Occasionally, you may need to apply a function that is not yet defined, thus requiring you to write your own.
Pretend you are interested in only the second item from each element of the unique_vals list that you just created. Since| each element of the unique_vals list is a vector and we’re not aware of any built-in function in R that returns the second element of a vector, we will construct our own function.
Our function has no name and disappears as soon as lapply() is done using it. So-called ‘anonymous functions’ can be very useful when one of R’s built-in functions isn’t an option.
## $X
## [1] 2
##
## $name
## [1] "Albania"
##
## $landmass
## [1] 3
##
## $zone
## [1] 3
##
## $area
## [1] 29
##
## $population
## [1] 3
##
## $language
## [1] 6
##
## $religion
## [1] 6
##
## $bars
## [1] 2
##
## $stripes
## [1] 0
##
## $colours
## [1] 3
##
## $red
## [1] 0
##
## $green
## [1] 0
##
## $blue
## [1] 1
##
## $gold
## [1] 0
##
## $white
## [1] 0
##
## $black
## [1] 0
##
## $orange
## [1] 1
##
## $mainhue
## [1] "red"
##
## $circles
## [1] 1
##
## $crosses
## [1] 1
##
## $saltires
## [1] 1
##
## $quarters
## [1] 1
##
## $sunstars
## [1] 0
##
## $crescent
## [1] 1
##
## $triangle
## [1] 1
##
## $icon
## [1] 0
##
## $animate
## [1] 1
##
## $text
## [1] 1
##
## $topleft
## [1] "red"
##
## $botright
## [1] "red"
What if you had forgotten how unique() works and mistakenly thought it returns the number of unique values contained in | the object passed to it? Then you might have incorrectly expected sapply(flags, unique) to return a numeric vector, since each element of the list returned would contain a single number and sapply() could then simplify the result to a vector.
…
|============================ | 24% | When working interactively (at the prompt), this is not much of a problem, since you see the result immediately and will quickly recognize your mistake. However, when working non-interactively (e.g. writing your own functions), a | misunderstanding may go undetected and cause incorrect results later on. Therefore, you may wish to be more careful and | that’s where vapply() is useful.
Whereas sapply() tries to ‘guess’ the correct format of the result, vapply() allows you to specify it explicitly. If the| result doesn’t match the format you specify, vapply() will throw an error, causing the operation to stop. This
Try vapply(flags, unique, numeric(1)), which says that you expect each element of the result to be a numeric vector of | length 1. Since this is NOT actually the case, YOU WILL GET AN ERROR. Once you get the error, type ok() to continue to the | next question.can prevent | significant problems in your code that might be caused by getting unexpected return values from sapply().
Try vapply(flags, unique, numeric(1)), which says that you expect each element of the result to be a numeric vector of length 1. Since this is NOT actually the case, YOU WILL GET AN ERROR. Once you get the error, type ok() to continue to the next question.
Recall from the previous lesson that sapply(flags, class) will return a character vector containing the class of each column | in the dataset. Try that again now to see the result.
## X name landmass zone area population
## "integer" "character" "integer" "integer" "integer" "integer"
## language religion bars stripes colours red
## "integer" "integer" "integer" "integer" "integer" "integer"
## green blue gold white black orange
## "integer" "integer" "integer" "integer" "integer" "integer"
## mainhue circles crosses saltires quarters sunstars
## "character" "integer" "integer" "integer" "integer" "integer"
## crescent triangle icon animate text topleft
## "integer" "integer" "integer" "integer" "integer" "character"
## botright
## "character"
If we wish to be explicit about the format of the result we expect, we can use vapply(flags, class, character(1)). The | ‘character(1)’ argument tells R that we expect the class function to return a character vector of length 1 when applied to EACH column of the flags dataset.
## X name landmass zone area population
## "integer" "character" "integer" "integer" "integer" "integer"
## language religion bars stripes colours red
## "integer" "integer" "integer" "integer" "integer" "integer"
## green blue gold white black orange
## "integer" "integer" "integer" "integer" "integer" "integer"
## mainhue circles crosses saltires quarters sunstars
## "character" "integer" "integer" "integer" "integer" "integer"
## crescent triangle icon animate text topleft
## "integer" "integer" "integer" "integer" "integer" "character"
## botright
## "character"
As a data analyst, you’ll often wish to split your data up into groups based on the value of some variable, then apply a function to the members of each group. The next function we’ll look at, tapply(), does exactly that.
The ‘landmass’ variable in our dataset takes on integer values between 1 and 6, each of which represents a different part of the world. Use table(flags$landmass) to see how many flags/countries fall into each group.
Use table(flags$landmass) to see how many flags/countries fall into each group.
##
## 1 2 3 4 5 6
## 31 17 35 52 39 20
The ‘animate’ variable in our dataset takes the value 1 if a country’s flag contains an animate image (e.g. an eagle, a | tree, a human hand) and 0 otherwise. Use table(flags$animate) to see how many flags contain an animate image.
##
## 0 1
## 155 39
## 1 2 3 4 5 6
## 0.4193548 0.1764706 0.1142857 0.1346154 0.1538462 0.3000000
Similarly, we can look at a summary of population values (in round millions) for countries with and without the color red on| their flag with tapply(flags\(population, flags\)red, summary).
## $`0`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 3.00 27.63 9.00 684.00
##
## $`1`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 4.0 22.1 15.0 1008.0