This week is what I call “loop functions” in R, which are functions that allow you to execute loop-like behavior in a compact form. These functions typically have the word “apply” in them and are particularly convenient when you need to execute a loop on the command line when using R interactively. These functions are some of the more interesting functions of the R language.

Looping on the Command Line

Writing for, while loops is useful when programming but not particularly easy when working interactively on the command line. There are some functions which implement looping to make life easier.
An auxiliary function split is also useful, particularly in conjunction with lapply.

lapply: Loop over a list and evaluate a function on each element
sapply: Same as lapply but try to simplify the result
apply: Apply a function over the margins of an array
tapply: Apply a function over subsets of a vector
mapply: Multivariate version of lapply

lapply

lapply takes three arguments: (1) a list X; (2) a function (or the name of a function) FUN; (3) other arguments via its … argument. If X is not a list, it will be coerced to a list using as.list.

lapply always returns a list, regardless of the class of the input.

x<- list(a=1:5, b=rnorm(10))
lapply(x,mean)
## $a
## [1] 3
## 
## $b
## [1] -0.09775161
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))

lapply(x,mean)
## $a
## [1] 2.5
## 
## $b
## [1] 0.3634831
## 
## $c
## [1] 0.6652225
## 
## $d
## [1] 4.976192
x<-1:4
lapply(x, runif, min=0, max=10)
## [[1]]
## [1] 5.297485
## 
## [[2]]
## [1] 0.2882099 7.7498925
## 
## [[3]]
## [1] 8.652340 2.975772 5.403049
## 
## [[4]]
## [1] 0.49798448 7.04317793 7.11307719 0.03610844

lapply and friends make heavy use of anonymous functions.

x<- list(a=matrix(1:4,2,2), b=matrix(1:6,3,2))
x
## $a
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## $b
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
#An anonymous function for extracting the first column of each matrix.
lapply(x, function(elt) elt[,1])
## $a
## [1] 1 2
## 
## $b
## [1] 1 2 3

sapply

def

sapply will try to simplify the result of lapply if possible.
If the result is a list where every element is length 1, then a vector is returned
If the result is a list where every element is a vector of the same length (> 1), a matrix is returned.
If it can’t figure things out, a list is returned
#### ex

x<-list(a = 1:4, b=rnorm(10), c=rnorm(20,1), de=rnorm(100,5))
lapply(x,mean)
## $a
## [1] 2.5
## 
## $b
## [1] 0.7113761
## 
## $c
## [1] 1.087103
## 
## $de
## [1] 5.085903
sapply(x,mean)
##         a         b         c        de 
## 2.5000000 0.7113761 1.0871031 5.0859026
mean(x)
## Warning in mean.default(x): argument is not numeric or logical: returning NA
## [1] NA

apply

def

apply is used to a evaluate a function (often an anonymous one) over the margins of an array.
It is most often used to apply a function to the rows or columns of a matrix
It can be used with general arrays, e.g. taking the average of an array of matrices
It is not really faster than writing a loop, but it works in one line!

str(apply)
## function (X, MARGIN, FUN, ...)

X is an array
MARGIN is an integer vector indicating which margins should be “retained”.
FUN is a function to be applied
… is for other arguments to be passed to FUN
·

x<-matrix(1:6, 3,2)
x
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
apply(x,1,sum)
## [1] 5 7 9
apply(x,2,sum)
## [1]  6 15
apply(x,c(1,2),sum)
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
x<-matrix(rnorm(200), 20,10)
apply(x,1,quantile, probs = c(0.25, 0.75))
##           [,1]       [,2]       [,3]       [,4]       [,5]       [,6]      [,7]
## 25% -0.6757766 -0.1301717 -0.4645866 -0.2652885 -0.2390933 -0.6922377 0.2949940
## 75%  0.1625239  0.9880443  0.4148126  0.7966673  0.6607092  0.8237715 0.9689675
##           [,8]       [,9]      [,10]      [,11]      [,12]     [,13]      [,14]
## 25% -0.9211555 -1.4433841 -0.7638743 0.03183095 -0.1327484 0.1416290 -0.5415094
## 75%  0.3238000  0.3993523  0.1504409 0.93155607  1.2156302 0.7694749  0.7011250
##         [,15]      [,16]      [,17]      [,18]       [,19]      [,20]
## 25% 0.0257989 -1.1401260 -1.1654758 -0.5814301 -0.04017409 -0.2467302
## 75% 0.5632047  0.3206481  0.6079709  0.9306677  0.25727992  1.5483479

mapply

mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments.

str(mapply)
## function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)

FUN is a function to apply
contains arguments to apply over
MoreArgs is a list of other arguments to FUN.
SIMPLIFY indicates whether the result should be simplified

ex.

# list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1)) can be written as
mapply(rep, 1:4, 4:1)
## [[1]]
## [1] 1 1 1 1
## 
## [[2]]
## [1] 2 2 2
## 
## [[3]]
## [1] 3 3
## 
## [[4]]
## [1] 4

tapply

tapply is used to apply a function over subsets of a vector.

str(tapply)
## function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)

X is a vector
INDEX is a factor or a list of factors (or else they are coerced to factors)
FUN is a function to be applied
contains other arguments to be passed FUN
simplify, should we simplify the result?

See Exercises: Go

Split

split takes a vector or other objects and splits it into groups determined by a factor or list of factors.

str(split)
## function (x, f, drop = FALSE, ...)

x is a vector (or list) or data frame
f is a factor (or coerced to one) or a list of factors
drop indicates whether empty factors levels should be dropped

Example

#create data frame
N      <- 12
sex    <- sample(c("f", "m"), N, replace=TRUE)
group  <- sample(rep(c("CG", "WL", "T"), 4), N, replace=FALSE)
age    <- sample(18:35, N, replace=TRUE)
IQ     <- round(rnorm(N, mean=100, sd=15))
rating <- round(runif(N, min=0, max=6))
(myDf  <- data.frame(id=1:N, sex, group, age, IQ, rating))
##    id sex group age  IQ rating
## 1   1   f     T  31  88      5
## 2   2   m    CG  35  81      4
## 3   3   f    WL  32  95      5
## 4   4   m     T  29 103      5
## 5   5   f    CG  32  98      1
## 6   6   f    WL  20  96      4
## 7   7   f    CG  24  89      5
## 8   8   m     T  18 124      2
## 9   9   m    WL  19  90      6
## 10 10   f    CG  24 100      0
## 11 11   f    WL  34  83      3
## 12 12   m     T  18  94      2

Group by the ‘group’ variable

gDf<- split(myDf, myDf$group)

gDf
## $CG
##    id sex group age  IQ rating
## 2   2   m    CG  35  81      4
## 5   5   f    CG  32  98      1
## 7   7   f    CG  24  89      5
## 10 10   f    CG  24 100      0
## 
## $T
##    id sex group age  IQ rating
## 1   1   f     T  31  88      5
## 4   4   m     T  29 103      5
## 8   8   m     T  18 124      2
## 12 12   m     T  18  94      2
## 
## $WL
##    id sex group age IQ rating
## 3   3   f    WL  32 95      5
## 6   6   f    WL  20 96      4
## 9   9   m    WL  19 90      6
## 11 11   f    WL  34 83      3
gsDF<-split(myDf, list(myDf$sex, myDf$group))
gsDF
## $f.CG
##    id sex group age  IQ rating
## 5   5   f    CG  32  98      1
## 7   7   f    CG  24  89      5
## 10 10   f    CG  24 100      0
## 
## $m.CG
##   id sex group age IQ rating
## 2  2   m    CG  35 81      4
## 
## $f.T
##   id sex group age IQ rating
## 1  1   f     T  31 88      5
## 
## $m.T
##    id sex group age  IQ rating
## 4   4   m     T  29 103      5
## 8   8   m     T  18 124      2
## 12 12   m     T  18  94      2
## 
## $f.WL
##    id sex group age IQ rating
## 3   3   f    WL  32 95      5
## 6   6   f    WL  20 96      4
## 11 11   f    WL  34 83      3
## 
## $m.WL
##   id sex group age IQ rating
## 9  9   m    WL  19 90      6

Exercises

lapply and sapply

flags<-read.csv("flags.csv")
head(flags)
##   X           name landmass zone area population language religion bars stripes
## 1 1    Afghanistan        5    1  648         16       10        2    0       3
## 2 2        Albania        3    1   29          3        6        6    0       0
## 3 3        Algeria        4    1 2388         20        8        2    2       0
## 4 4 American-Samoa        6    3    0          0        1        1    0       0
## 5 5        Andorra        3    1    0          0        6        0    3       0
## 6 6         Angola        4    2 1247          7       10        5    0       2
##   colours red green blue gold white black orange mainhue circles crosses
## 1       5   1     1    0    1     1     1      0   green       0       0
## 2       3   1     0    0    1     0     1      0     red       0       0
## 3       3   1     1    0    0     1     0      0   green       0       0
## 4       5   1     0    1    1     1     0      1    blue       0       0
## 5       3   1     0    1    1     0     0      0    gold       0       0
## 6       3   1     0    0    1     0     1      0     red       0       0
##   saltires quarters sunstars crescent triangle icon animate text topleft
## 1        0        0        1        0        0    1       0    0   black
## 2        0        0        1        0        0    0       1    0     red
## 3        0        0        1        1        0    0       0    0   green
## 4        0        0        0        0        1    1       1    0    blue
## 5        0        0        0        0        0    0       0    0    blue
## 6        0        0        1        0        0    1       0    0     red
##   botright
## 1    green
## 2      red
## 3    white
## 4      red
## 5      red
## 6    black
dim(flags)
## [1] 194  31
#To open a more complete description of the dataset in a separate text file, type viewinfo()
class(flags)
## [1] "data.frame"

The lapply() function takes a list as input, applies a function to each element of the list, then returns a list of the same length as the original one.
Since a data frame is really just a list of vectors (you can see this with as.list(flags)), we can use lapply() to apply the class() function to each column of the flags dataset. Let’s see it in action!

cls_list <-lapply(flags, class)
cls_list
## $X
## [1] "integer"
## 
## $name
## [1] "character"
## 
## $landmass
## [1] "integer"
## 
## $zone
## [1] "integer"
## 
## $area
## [1] "integer"
## 
## $population
## [1] "integer"
## 
## $language
## [1] "integer"
## 
## $religion
## [1] "integer"
## 
## $bars
## [1] "integer"
## 
## $stripes
## [1] "integer"
## 
## $colours
## [1] "integer"
## 
## $red
## [1] "integer"
## 
## $green
## [1] "integer"
## 
## $blue
## [1] "integer"
## 
## $gold
## [1] "integer"
## 
## $white
## [1] "integer"
## 
## $black
## [1] "integer"
## 
## $orange
## [1] "integer"
## 
## $mainhue
## [1] "character"
## 
## $circles
## [1] "integer"
## 
## $crosses
## [1] "integer"
## 
## $saltires
## [1] "integer"
## 
## $quarters
## [1] "integer"
## 
## $sunstars
## [1] "integer"
## 
## $crescent
## [1] "integer"
## 
## $triangle
## [1] "integer"
## 
## $icon
## [1] "integer"
## 
## $animate
## [1] "integer"
## 
## $text
## [1] "integer"
## 
## $topleft
## [1] "character"
## 
## $botright
## [1] "character"

he ‘l’ in ‘lapply’ stands for ‘list’. Type class(cls_list) to confirm that lapply() returned a list.

class(cls_list)
## [1] "list"

As expected, we got a list of length 30 – one element for each variable/column. The output would be considerably more | compact if we could represent it as a vector instead of a list.

You may remember from a previous lesson that lists are most helpful for storing multiple classes of data. In this case, since every element of the list returned by lapply() is a character vector of length one (i.e. “integer” and “vector”), cls_list can be simplified to a character vector. To do this manually, type as.character(cls_list).

as.character(cls_list)
##  [1] "integer"   "character" "integer"   "integer"   "integer"   "integer"  
##  [7] "integer"   "integer"   "integer"   "integer"   "integer"   "integer"  
## [13] "integer"   "integer"   "integer"   "integer"   "integer"   "integer"  
## [19] "character" "integer"   "integer"   "integer"   "integer"   "integer"  
## [25] "integer"   "integer"   "integer"   "integer"   "integer"   "character"
## [31] "character"

sapply() allows you to automate this process by calling lapply() behind the scenes, but then attempting to simplify (hence| the ‘s’ in ‘sapply’) the result for you. Use sapply() the same way you used lapply() to get the class of each column of the flags dataset and store the result in cls_vect.

cls_vect<- sapply(flags,class)
class(cls_vect)
## [1] "character"

In general, if the result is a list where every element is of length one, then sapply() returns a vector.
If the result is a list where every element is a vector of the same length (> 1), sapply() returns a matrix.
If sapply() can’t figure things out, then it just returns a list, no different from what lapply() would give you.

Columns 11 through 17 of our dataset are indicator variables, each representing a different color. The value of the indicator variable is 1 if the color is present in a country’s flag and 0 otherwise.
herefore, if we want to know the total number of countries (in our dataset) with, for example, the color orange on their flag, we can just add up all of the 1s and 0s in the ‘orange’ column. Try sum(flags$orange to see this.

sum(flags$orange)
## [1] 26

Now we want to repeat this operation for each of the colors recorded in the dataset.

flag_colors<-flags[,11:17] #Note the comma before 11:17. This subsetting command tells R that we want all rows, but only columns 11 through 17.
head(flag_colors)
##   colours red green blue gold white black
## 1       5   1     1    0    1     1     1
## 2       3   1     0    0    1     0     1
## 3       3   1     1    0    0     1     0
## 4       5   1     0    1    1     1     0
## 5       3   1     0    1    1     0     0
## 6       3   1     0    0    1     0     1

To get a list containing the sum of each column of flag_colors, call the lapply() function with two arguments. The first argument is the object over which we are looping (i.e. flag_colors) and the second argument is the name of the function we| wish to apply to each column (i.e. sum). Remember that the second argument is just the name of the function with no parentheses, etc.

lapply(flag_colors,sum)
## $colours
## [1] 672
## 
## $red
## [1] 153
## 
## $green
## [1] 91
## 
## $blue
## [1] 99
## 
## $gold
## [1] 91
## 
## $white
## [1] 146
## 
## $black
## [1] 52

The result is a list, since lapply() always returns a list. Each element of this list is of length one, so the result can be simplified to a vector by calling sapply() instead of lapply(). Try it now.

sapply(flag_colors,sum)
## colours     red   green    blue    gold   white   black 
##     672     153      91      99      91     146      52

Perhaps it’s more informative to find the proportion of flags (out of 194) containing each color. Since each column is just a bunch of 1s and 0s, the arithmetic mean of each column will give us the proportion of 1s. (If it’s not clear why, think of a simpler situation where you have three 1s and two 0s – (1 + 1 + 1 + 0 + 0)/5 = 3/5 = 0.6).

sapply(flag_colors,mean)
##   colours       red     green      blue      gold     white     black 
## 3.4639175 0.7886598 0.4690722 0.5103093 0.4690722 0.7525773 0.2680412

sapply() instead returns a matrix when each element of the list returned by lapply() is a vector of the same length (> 1).

flag_shapes <- flags[, 19:23]
head(flag_shapes)
##   mainhue circles crosses saltires quarters
## 1   green       0       0        0        0
## 2     red       0       0        0        0
## 3   green       0       0        0        0
## 4    blue       0       0        0        0
## 5    gold       0       0        0        0
## 6     red       0       0        0        0

Each of these columns (i.e. variables) represents the number of times a particular shape or design appears on a country’s | flag. We are interested in the minimum and maximum number of times each shape or design appears.

The range() function returns the minimum and maximum of its first argument, which should be a numeric vector. Use lapply() to apply the range function to each column of flag_shapes.

lapply(flag_shapes, range)
## $mainhue
## [1] "black" "white"
## 
## $circles
## [1] 0 4
## 
## $crosses
## [1] 0 2
## 
## $saltires
## [1] 0 1
## 
## $quarters
## [1] 0 4

Do the same operation, but using sapply() and store the result in a variable called shape_mat.

shape_mat <- sapply(flag_shapes, range)
shape_mat
##      mainhue circles crosses saltires quarters
## [1,] "black" "0"     "0"     "0"      "0"     
## [2,] "white" "4"     "2"     "1"      "4"
class(shape_mat)
## [1] "matrix" "array"

As we’ve seen, sapply() always attempts to simplify the result given by lapply(). It has been successful in doing so for | each of the examples we’ve looked at so far. Let’s look at an example where sapply() can’t figure out how to simplify the result and thus returns a list, no different from lapply().

When given a vector, the unique() function returns a vector with all duplicate elements removed. In other words, unique() returns a vector of only the ‘unique’ elements. To see how it works, try unique(c(3, 4, 5, 5, 5, 6, 6)).

unique(c(3,4,5,5,5,6,6))
## [1] 3 4 5 6

Ex - Find the Unique values for each variable in your dataset

unique_vals <- lapply(flags, unique)
unique_vals
## $X
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194
## 
## $name
##   [1] "Afghanistan"              "Albania"                 
##   [3] "Algeria"                  "American-Samoa"          
##   [5] "Andorra"                  "Angola"                  
##   [7] "Anguilla"                 "Antigua-Barbuda"         
##   [9] "Argentina"                "Argentine"               
##  [11] "Australia"                "Austria"                 
##  [13] "Bahamas"                  "Bahrain"                 
##  [15] "Bangladesh"               "Barbados"                
##  [17] "Belgium"                  "Belize"                  
##  [19] "Benin"                    "Bermuda"                 
##  [21] "Bhutan"                   "Bolivia"                 
##  [23] "Botswana"                 "Brazil"                  
##  [25] "British-Virgin-Isles"     "Brunei"                  
##  [27] "Bulgaria"                 "Burkina"                 
##  [29] "Burma"                    "Burundi"                 
##  [31] "Cameroon"                 "Canada"                  
##  [33] "Cape-Verde-Islands"       "Cayman-Islands"          
##  [35] "Central-African-Republic" "Chad"                    
##  [37] "Chile"                    "China"                   
##  [39] "Colombia"                 "Comorro-Islands"         
##  [41] "Congo"                    "Cook-Islands"            
##  [43] "Costa-Rica"               "Cuba"                    
##  [45] "Cyprus"                   "Czechoslovakia"          
##  [47] "Denmark"                  "Djibouti"                
##  [49] "Dominica"                 "Dominican-Republic"      
##  [51] "Ecuador"                  "Egypt"                   
##  [53] "El-Salvador"              "Equatorial-Guinea"       
##  [55] "Ethiopia"                 "Faeroes"                 
##  [57] "Falklands-Malvinas"       "Fiji"                    
##  [59] "Finland"                  "France"                  
##  [61] "French-Guiana"            "French-Polynesia"        
##  [63] "Gabon"                    "Gambia"                  
##  [65] "Germany-DDR"              "Germany-FRG"             
##  [67] "Ghana"                    "Gibraltar"               
##  [69] "Greece"                   "Greenland"               
##  [71] "Grenada"                  "Guam"                    
##  [73] "Guatemala"                "Guinea"                  
##  [75] "Guinea-Bissau"            "Guyana"                  
##  [77] "Haiti"                    "Honduras"                
##  [79] "Hong-Kong"                "Hungary"                 
##  [81] "Iceland"                  "India"                   
##  [83] "Indonesia"                "Iran"                    
##  [85] "Iraq"                     "Ireland"                 
##  [87] "Israel"                   "Italy"                   
##  [89] "Ivory-Coast"              "Jamaica"                 
##  [91] "Japan"                    "Jordan"                  
##  [93] "Kampuchea"                "Kenya"                   
##  [95] "Kiribati"                 "Kuwait"                  
##  [97] "Laos"                     "Lebanon"                 
##  [99] "Lesotho"                  "Liberia"                 
## [101] "Libya"                    "Liechtenstein"           
## [103] "Luxembourg"               "Malagasy"                
## [105] "Malawi"                   "Malaysia"                
## [107] "Maldive-Islands"          "Mali"                    
## [109] "Malta"                    "Marianas"                
## [111] "Mauritania"               "Mauritius"               
## [113] "Mexico"                   "Micronesia"              
## [115] "Monaco"                   "Mongolia"                
## [117] "Montserrat"               "Morocco"                 
## [119] "Mozambique"               "Nauru"                   
## [121] "Nepal"                    "Netherlands"             
## [123] "Netherlands-Antilles"     "New-Zealand"             
## [125] "Nicaragua"                "Niger"                   
## [127] "Nigeria"                  "Niue"                    
## [129] "North-Korea"              "North-Yemen"             
## [131] "Norway"                   "Oman"                    
## [133] "Pakistan"                 "Panama"                  
## [135] "Papua-New-Guinea"         "Parguay"                 
## [137] "Peru"                     "Philippines"             
## [139] "Poland"                   "Portugal"                
## [141] "Puerto-Rico"              "Qatar"                   
## [143] "Romania"                  "Rwanda"                  
## [145] "San-Marino"               "Sao-Tome"                
## [147] "Saudi-Arabia"             "Senegal"                 
## [149] "Seychelles"               "Sierra-Leone"            
## [151] "Singapore"                "Soloman-Islands"         
## [153] "Somalia"                  "South-Africa"            
## [155] "South-Korea"              "South-Yemen"             
## [157] "Spain"                    "Sri-Lanka"               
## [159] "St-Helena"                "St-Kitts-Nevis"          
## [161] "St-Lucia"                 "St-Vincent"              
## [163] "Sudan"                    "Surinam"                 
## [165] "Swaziland"                "Sweden"                  
## [167] "Switzerland"              "Syria"                   
## [169] "Taiwan"                   "Tanzania"                
## [171] "Thailand"                 "Togo"                    
## [173] "Tonga"                    "Trinidad-Tobago"         
## [175] "Tunisia"                  "Turkey"                  
## [177] "Turks-Cocos-Islands"      "Tuvalu"                  
## [179] "UAE"                      "Uganda"                  
## [181] "UK"                       "Uruguay"                 
## [183] "US-Virgin-Isles"          "USA"                     
## [185] "USSR"                     "Vanuatu"                 
## [187] "Vatican-City"             "Venezuela"               
## [189] "Vietnam"                  "Western-Samoa"           
## [191] "Yugoslavia"               "Zaire"                   
## [193] "Zambia"                   "Zimbabwe"                
## 
## $landmass
## [1] 5 3 4 6 1 2
## 
## $zone
## [1] 1 3 2 4
## 
## $area
##   [1]   648    29  2388     0  1247  2777  7690    84    19     1   143    31
##  [13]    23   113    47  1099   600  8512     6   111   274   678    28   474
##  [25]  9976     4   623  1284   757  9561  1139     2   342    51   115     9
##  [37]   128    43    22    49   284  1001    21  1222    12    18   337   547
##  [49]    91   268    10   108   249   239   132  2176   109   246    36   215
##  [61]   112    93   103  3268  1904  1648   435    70   301   323    11   372
##  [73]    98   181   583   236    30  1760     3   587   118   333  1240  1031
##  [85]  1973  1566   447   783   140    41  1267   925   121   195   324   212
##  [97]   804    76   463   407  1285   300   313    92   237    26  2150   196
## [109]    72   637  1221    99   288   505    66  2506    63    17   450   185
## [121]   945   514    57     5   164   781   245   178  9363 22402    15   912
## [133]   256   905   753   391
## 
## $population
##  [1]   16    3   20    0    7   28   15    8   90   10    1    6  119    9   35
## [16]    4   24    2   11 1008    5   47   31   54   17   61   14  684  157   39
## [31]   57  118   13   77   12   56   18   84   48   36   22   29   38   49   45
## [46]  231  274   60
## 
## $language
##  [1] 10  6  8  1  2  4  3  5  7  9
## 
## $religion
## [1] 2 6 1 0 5 3 4 7
## 
## $bars
## [1] 0 2 3 1 5
## 
## $stripes
##  [1]  3  0  2  1  5  9 11 14  4  6 13  7
## 
## $colours
## [1] 5 3 2 8 6 4 7 1
## 
## $red
## [1] 1 0
## 
## $green
## [1] 1 0
## 
## $blue
## [1] 0 1
## 
## $gold
## [1] 1 0
## 
## $white
## [1] 1 0
## 
## $black
## [1] 1 0
## 
## $orange
## [1] 0 1
## 
## $mainhue
## [1] "green"  "red"    "blue"   "gold"   "white"  "orange" "black"  "brown" 
## 
## $circles
## [1] 0 1 4 2
## 
## $crosses
## [1] 0 1 2
## 
## $saltires
## [1] 0 1
## 
## $quarters
## [1] 0 1 4
## 
## $sunstars
##  [1]  1  0  6 22 14  3  4  5 15 10  7  2  9 50
## 
## $crescent
## [1] 0 1
## 
## $triangle
## [1] 0 1
## 
## $icon
## [1] 1 0
## 
## $animate
## [1] 0 1
## 
## $text
## [1] 0 1
## 
## $topleft
## [1] "black"  "red"    "green"  "blue"   "white"  "orange" "gold"  
## 
## $botright
## [1] "green"  "red"    "white"  "black"  "blue"   "gold"   "orange" "brown"

Since unique_vals is a list, you can use what you’ve learned to determine the length of each element of unique_vals (i.e.| the number of unique values for each variable). Simplify the result, if possible. Hint: Apply the length() function to each element of unique_vals.

sapply(unique_vals, length)
##          X       name   landmass       zone       area population   language 
##        194        194          6          4        136         48         10 
##   religion       bars    stripes    colours        red      green       blue 
##          8          5         12          8          2          2          2 
##       gold      white      black     orange    mainhue    circles    crosses 
##          2          2          2          2          8          4          3 
##   saltires   quarters   sunstars   crescent   triangle       icon    animate 
##          2          3         14          2          2          2          2 
##       text    topleft   botright 
##          2          7          8

Use sapply() to apply the unique() function to each column of the flags dataset to see that you get the same unsimplified| list that you got from lapply().

sapply(flags, unique)
## $X
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194
## 
## $name
##   [1] "Afghanistan"              "Albania"                 
##   [3] "Algeria"                  "American-Samoa"          
##   [5] "Andorra"                  "Angola"                  
##   [7] "Anguilla"                 "Antigua-Barbuda"         
##   [9] "Argentina"                "Argentine"               
##  [11] "Australia"                "Austria"                 
##  [13] "Bahamas"                  "Bahrain"                 
##  [15] "Bangladesh"               "Barbados"                
##  [17] "Belgium"                  "Belize"                  
##  [19] "Benin"                    "Bermuda"                 
##  [21] "Bhutan"                   "Bolivia"                 
##  [23] "Botswana"                 "Brazil"                  
##  [25] "British-Virgin-Isles"     "Brunei"                  
##  [27] "Bulgaria"                 "Burkina"                 
##  [29] "Burma"                    "Burundi"                 
##  [31] "Cameroon"                 "Canada"                  
##  [33] "Cape-Verde-Islands"       "Cayman-Islands"          
##  [35] "Central-African-Republic" "Chad"                    
##  [37] "Chile"                    "China"                   
##  [39] "Colombia"                 "Comorro-Islands"         
##  [41] "Congo"                    "Cook-Islands"            
##  [43] "Costa-Rica"               "Cuba"                    
##  [45] "Cyprus"                   "Czechoslovakia"          
##  [47] "Denmark"                  "Djibouti"                
##  [49] "Dominica"                 "Dominican-Republic"      
##  [51] "Ecuador"                  "Egypt"                   
##  [53] "El-Salvador"              "Equatorial-Guinea"       
##  [55] "Ethiopia"                 "Faeroes"                 
##  [57] "Falklands-Malvinas"       "Fiji"                    
##  [59] "Finland"                  "France"                  
##  [61] "French-Guiana"            "French-Polynesia"        
##  [63] "Gabon"                    "Gambia"                  
##  [65] "Germany-DDR"              "Germany-FRG"             
##  [67] "Ghana"                    "Gibraltar"               
##  [69] "Greece"                   "Greenland"               
##  [71] "Grenada"                  "Guam"                    
##  [73] "Guatemala"                "Guinea"                  
##  [75] "Guinea-Bissau"            "Guyana"                  
##  [77] "Haiti"                    "Honduras"                
##  [79] "Hong-Kong"                "Hungary"                 
##  [81] "Iceland"                  "India"                   
##  [83] "Indonesia"                "Iran"                    
##  [85] "Iraq"                     "Ireland"                 
##  [87] "Israel"                   "Italy"                   
##  [89] "Ivory-Coast"              "Jamaica"                 
##  [91] "Japan"                    "Jordan"                  
##  [93] "Kampuchea"                "Kenya"                   
##  [95] "Kiribati"                 "Kuwait"                  
##  [97] "Laos"                     "Lebanon"                 
##  [99] "Lesotho"                  "Liberia"                 
## [101] "Libya"                    "Liechtenstein"           
## [103] "Luxembourg"               "Malagasy"                
## [105] "Malawi"                   "Malaysia"                
## [107] "Maldive-Islands"          "Mali"                    
## [109] "Malta"                    "Marianas"                
## [111] "Mauritania"               "Mauritius"               
## [113] "Mexico"                   "Micronesia"              
## [115] "Monaco"                   "Mongolia"                
## [117] "Montserrat"               "Morocco"                 
## [119] "Mozambique"               "Nauru"                   
## [121] "Nepal"                    "Netherlands"             
## [123] "Netherlands-Antilles"     "New-Zealand"             
## [125] "Nicaragua"                "Niger"                   
## [127] "Nigeria"                  "Niue"                    
## [129] "North-Korea"              "North-Yemen"             
## [131] "Norway"                   "Oman"                    
## [133] "Pakistan"                 "Panama"                  
## [135] "Papua-New-Guinea"         "Parguay"                 
## [137] "Peru"                     "Philippines"             
## [139] "Poland"                   "Portugal"                
## [141] "Puerto-Rico"              "Qatar"                   
## [143] "Romania"                  "Rwanda"                  
## [145] "San-Marino"               "Sao-Tome"                
## [147] "Saudi-Arabia"             "Senegal"                 
## [149] "Seychelles"               "Sierra-Leone"            
## [151] "Singapore"                "Soloman-Islands"         
## [153] "Somalia"                  "South-Africa"            
## [155] "South-Korea"              "South-Yemen"             
## [157] "Spain"                    "Sri-Lanka"               
## [159] "St-Helena"                "St-Kitts-Nevis"          
## [161] "St-Lucia"                 "St-Vincent"              
## [163] "Sudan"                    "Surinam"                 
## [165] "Swaziland"                "Sweden"                  
## [167] "Switzerland"              "Syria"                   
## [169] "Taiwan"                   "Tanzania"                
## [171] "Thailand"                 "Togo"                    
## [173] "Tonga"                    "Trinidad-Tobago"         
## [175] "Tunisia"                  "Turkey"                  
## [177] "Turks-Cocos-Islands"      "Tuvalu"                  
## [179] "UAE"                      "Uganda"                  
## [181] "UK"                       "Uruguay"                 
## [183] "US-Virgin-Isles"          "USA"                     
## [185] "USSR"                     "Vanuatu"                 
## [187] "Vatican-City"             "Venezuela"               
## [189] "Vietnam"                  "Western-Samoa"           
## [191] "Yugoslavia"               "Zaire"                   
## [193] "Zambia"                   "Zimbabwe"                
## 
## $landmass
## [1] 5 3 4 6 1 2
## 
## $zone
## [1] 1 3 2 4
## 
## $area
##   [1]   648    29  2388     0  1247  2777  7690    84    19     1   143    31
##  [13]    23   113    47  1099   600  8512     6   111   274   678    28   474
##  [25]  9976     4   623  1284   757  9561  1139     2   342    51   115     9
##  [37]   128    43    22    49   284  1001    21  1222    12    18   337   547
##  [49]    91   268    10   108   249   239   132  2176   109   246    36   215
##  [61]   112    93   103  3268  1904  1648   435    70   301   323    11   372
##  [73]    98   181   583   236    30  1760     3   587   118   333  1240  1031
##  [85]  1973  1566   447   783   140    41  1267   925   121   195   324   212
##  [97]   804    76   463   407  1285   300   313    92   237    26  2150   196
## [109]    72   637  1221    99   288   505    66  2506    63    17   450   185
## [121]   945   514    57     5   164   781   245   178  9363 22402    15   912
## [133]   256   905   753   391
## 
## $population
##  [1]   16    3   20    0    7   28   15    8   90   10    1    6  119    9   35
## [16]    4   24    2   11 1008    5   47   31   54   17   61   14  684  157   39
## [31]   57  118   13   77   12   56   18   84   48   36   22   29   38   49   45
## [46]  231  274   60
## 
## $language
##  [1] 10  6  8  1  2  4  3  5  7  9
## 
## $religion
## [1] 2 6 1 0 5 3 4 7
## 
## $bars
## [1] 0 2 3 1 5
## 
## $stripes
##  [1]  3  0  2  1  5  9 11 14  4  6 13  7
## 
## $colours
## [1] 5 3 2 8 6 4 7 1
## 
## $red
## [1] 1 0
## 
## $green
## [1] 1 0
## 
## $blue
## [1] 0 1
## 
## $gold
## [1] 1 0
## 
## $white
## [1] 1 0
## 
## $black
## [1] 1 0
## 
## $orange
## [1] 0 1
## 
## $mainhue
## [1] "green"  "red"    "blue"   "gold"   "white"  "orange" "black"  "brown" 
## 
## $circles
## [1] 0 1 4 2
## 
## $crosses
## [1] 0 1 2
## 
## $saltires
## [1] 0 1
## 
## $quarters
## [1] 0 1 4
## 
## $sunstars
##  [1]  1  0  6 22 14  3  4  5 15 10  7  2  9 50
## 
## $crescent
## [1] 0 1
## 
## $triangle
## [1] 0 1
## 
## $icon
## [1] 1 0
## 
## $animate
## [1] 0 1
## 
## $text
## [1] 0 1
## 
## $topleft
## [1] "black"  "red"    "green"  "blue"   "white"  "orange" "gold"  
## 
## $botright
## [1] "green"  "red"    "white"  "black"  "blue"   "gold"   "orange" "brown"

Occasionally, you may need to apply a function that is not yet defined, thus requiring you to write your own.

Pretend you are interested in only the second item from each element of the unique_vals list that you just created. Since| each element of the unique_vals list is a vector and we’re not aware of any built-in function in R that returns the second element of a vector, we will construct our own function.

Our function has no name and disappears as soon as lapply() is done using it. So-called ‘anonymous functions’ can be very useful when one of R’s built-in functions isn’t an option.

lapply(unique_vals, function(elem) elem[2])
## $X
## [1] 2
## 
## $name
## [1] "Albania"
## 
## $landmass
## [1] 3
## 
## $zone
## [1] 3
## 
## $area
## [1] 29
## 
## $population
## [1] 3
## 
## $language
## [1] 6
## 
## $religion
## [1] 6
## 
## $bars
## [1] 2
## 
## $stripes
## [1] 0
## 
## $colours
## [1] 3
## 
## $red
## [1] 0
## 
## $green
## [1] 0
## 
## $blue
## [1] 1
## 
## $gold
## [1] 0
## 
## $white
## [1] 0
## 
## $black
## [1] 0
## 
## $orange
## [1] 1
## 
## $mainhue
## [1] "red"
## 
## $circles
## [1] 1
## 
## $crosses
## [1] 1
## 
## $saltires
## [1] 1
## 
## $quarters
## [1] 1
## 
## $sunstars
## [1] 0
## 
## $crescent
## [1] 1
## 
## $triangle
## [1] 1
## 
## $icon
## [1] 0
## 
## $animate
## [1] 1
## 
## $text
## [1] 1
## 
## $topleft
## [1] "red"
## 
## $botright
## [1] "red"

tapply and vapply

What if you had forgotten how unique() works and mistakenly thought it returns the number of unique values contained in | the object passed to it? Then you might have incorrectly expected sapply(flags, unique) to return a numeric vector, since each element of the list returned would contain a single number and sapply() could then simplify the result to a vector.

|============================ | 24% | When working interactively (at the prompt), this is not much of a problem, since you see the result immediately and will quickly recognize your mistake. However, when working non-interactively (e.g. writing your own functions), a | misunderstanding may go undetected and cause incorrect results later on. Therefore, you may wish to be more careful and | that’s where vapply() is useful.

Whereas sapply() tries to ‘guess’ the correct format of the result, vapply() allows you to specify it explicitly. If the| result doesn’t match the format you specify, vapply() will throw an error, causing the operation to stop. This
Try vapply(flags, unique, numeric(1)), which says that you expect each element of the result to be a numeric vector of | length 1. Since this is NOT actually the case, YOU WILL GET AN ERROR. Once you get the error, type ok() to continue to the | next question.can prevent | significant problems in your code that might be caused by getting unexpected return values from sapply().

Try vapply(flags, unique, numeric(1)), which says that you expect each element of the result to be a numeric vector of length 1. Since this is NOT actually the case, YOU WILL GET AN ERROR. Once you get the error, type ok() to continue to the next question.

#vapply(flags, unique, numeric(1))
#ok()

Recall from the previous lesson that sapply(flags, class) will return a character vector containing the class of each column | in the dataset. Try that again now to see the result.

sapply(flags, class)
##           X        name    landmass        zone        area  population 
##   "integer" "character"   "integer"   "integer"   "integer"   "integer" 
##    language    religion        bars     stripes     colours         red 
##   "integer"   "integer"   "integer"   "integer"   "integer"   "integer" 
##       green        blue        gold       white       black      orange 
##   "integer"   "integer"   "integer"   "integer"   "integer"   "integer" 
##     mainhue     circles     crosses    saltires    quarters    sunstars 
## "character"   "integer"   "integer"   "integer"   "integer"   "integer" 
##    crescent    triangle        icon     animate        text     topleft 
##   "integer"   "integer"   "integer"   "integer"   "integer" "character" 
##    botright 
## "character"

If we wish to be explicit about the format of the result we expect, we can use vapply(flags, class, character(1)). The | ‘character(1)’ argument tells R that we expect the class function to return a character vector of length 1 when applied to EACH column of the flags dataset.

vapply(flags, class, character(1))
##           X        name    landmass        zone        area  population 
##   "integer" "character"   "integer"   "integer"   "integer"   "integer" 
##    language    religion        bars     stripes     colours         red 
##   "integer"   "integer"   "integer"   "integer"   "integer"   "integer" 
##       green        blue        gold       white       black      orange 
##   "integer"   "integer"   "integer"   "integer"   "integer"   "integer" 
##     mainhue     circles     crosses    saltires    quarters    sunstars 
## "character"   "integer"   "integer"   "integer"   "integer"   "integer" 
##    crescent    triangle        icon     animate        text     topleft 
##   "integer"   "integer"   "integer"   "integer"   "integer" "character" 
##    botright 
## "character"

As a data analyst, you’ll often wish to split your data up into groups based on the value of some variable, then apply a function to the members of each group. The next function we’ll look at, tapply(), does exactly that.

tapply

The ‘landmass’ variable in our dataset takes on integer values between 1 and 6, each of which represents a different part of the world. Use table(flags$landmass) to see how many flags/countries fall into each group.

Use table(flags$landmass) to see how many flags/countries fall into each group.

table(flags$landmass)
## 
##  1  2  3  4  5  6 
## 31 17 35 52 39 20

The ‘animate’ variable in our dataset takes the value 1 if a country’s flag contains an animate image (e.g. an eagle, a | tree, a human hand) and 0 otherwise. Use table(flags$animate) to see how many flags contain an animate image.

table(flags$animate)
## 
##   0   1 
## 155  39
If you take the arithmetic mean of a bunch of 0s and 1s, you get the proportion of 1s. **Use tapply(flags$animate,
flags$landmass, mean) to apply the mean function to the ‘animate’ variable separately for each of the six landmass groups, thus giving us the proportion of flags containing an animate image WITHIN each landmass group.**
tapply(flags$animate, flags$landmass,mean)
##         1         2         3         4         5         6 
## 0.4193548 0.1764706 0.1142857 0.1346154 0.1538462 0.3000000

Similarly, we can look at a summary of population values (in round millions) for countries with and without the color red on| their flag with tapply(flags\(population, flags\)red, summary).

tapply(flags$population, flags$red, summary)
## $`0`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    3.00   27.63    9.00  684.00 
## 
## $`1`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     4.0    22.1    15.0  1008.0