Author: Abhinav Agrawal
tapply() applies a function or operation on subset of the vector broken down by a given factor variable.
To understand this, imagine we have ages of 20 people (male/females), and we need to know the average age of males and females from this sample. To start with we can group ages by the gender (male or female), ages of 12 males, and ages of 8 females, and later calculate the average age for males and females.
In this example, technically, we have a quantitative variable age, factor variable, gender, We created subset of this quantitative varible broken down by gender, and after subsetting, we got ages of 12 males, and ages of 8 females. Next, average operation was performed individually on the above subsets.
This is exactly what tapply() does!
Syntax of tapply: tapply(X, INDEX, FUN, …)
X = a vector, INDEX = list of one or more factor, FUN = Function or operation that needs to be applied, … optional arguments for the function
We will use the iris dataset for this example. Load the iris dataset.
data(iris) # Load the dataset iris
str(iris) # Structure of the dataset
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Let us calculate the mean of the Sepal Length
mean(iris$Sepal.Length)
## [1] 5.843
Now, we want to calculate the mean of the Sepal Length but broken by the Species, so we will use the tapply() function
tapply(iris$Sepal.Length, iris$Species, mean)
## setosa versicolor virginica
## 5.006 5.936 6.588
Now, let us see another example, this time another inbuilt dataset from R , mtcars dataset
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
We are interested in seeing the avg mpg for the various transmission types and number of cylinders in car. This is nothing but avg mpg grouped by transmission type and the number of cylinders in car.
tapply(mtcars$mpg, list(mtcars$cyl, mtcars$am), mean)
## 0 1
## 4 22.90 28.07
## 6 19.12 20.57
## 8 15.05 15.40