This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(ggplot2)
You can also embed plots, for example: ## Outlines 1 The tapply function 2 How to use tapply in R? 2.1 Additional arguments example: Ignore NA 3 Tapply in R with multiple factors
#The R tapply function is very similar to the apply function. In the following block of code we show the function syntax and the simplified description of each argument.
#tapply(X, Object you can split (matrix, data frame, …) # INDEX, List of factors of the same length # FUN, Function to be applied to factors (or NULL) # …, Additional arguments to be passed to FUN # default = NA, If simplify = TRUE, is the array initialization value # simplify = (TRUE) If set to FALSE returns a list object
#The tapply function is very easy to use in R. First, consider the following example dataset, that represents the price of some objects,its type and the store where they were sold.
set.seed(2)
data_set <- data.frame(price = round(rnorm(25, sd = 10, mean = 30)),
type = sample(1:4, size = 25, replace = TRUE), store =
sample(paste("Store", 1:4), size = 25, replace = TRUE))
head(data_set)
## price type store
## 1 21 2 Store 2
## 2 32 3 Store 3
## 3 46 4 Store 4
## 4 19 3 Store 4
## 5 29 1 Store 4
## 6 31 3 Store 4
#Second, store the values as variables and convert the column named type to factor.
price <- data_set$price
store <- data_set$store
type <- factor(data_set$type,
labels = c("toy", "food", "electronics","drinks"))
#Finally, you can use the tapply function to calculate the mean by type of object of the stores as follows:
mean_prices <- tapply(price, type, mean)
mean_prices
## toy food electronics drinks
## 39.50000 30.33333 32.20000 29.33333
class(mean_prices)
## [1] "array"
mean_prices[2]
## food
## 30.33333
#However, you can modify the output class to list if you set the simplify argument to FALSE.
mean_prices_list <- tapply(price, type, mean, simplify = FALSE)
mean_prices_list
## $toy
## [1] 39.5
##
## $food
## [1] 30.33333
##
## $electronics
## [1] 32.2
##
## $drinks
## [1] 29.33333
#In this case, you can access the output elements with the $ sign and the element name.
mean_prices_list$toy
## [1] 39.5
#Suppose that your data frame contains some NA values in its columns.
data_set[1, 1] <- NA
data_set[2, 3] <- NA
tapply(data_set$price, data_set$store, mean)
## Store 1 Store 2 Store 3 Store 4
## 32.00000 NA 39.25000 33.14286
tapply(data_set$price, data_set$store, mean, na.rm = TRUE)
## Store 1 Store 2 Store 3 Store 4
## 32.00000 33.50000 39.25000 33.14286
The previous is equivalent to the following:
f <- function(x) mean(x, na.rm = TRUE)
tapply(data_set$price, data_set$store, f)
## Store 1 Store 2 Store 3 Store 4
## 32.00000 33.50000 39.25000 33.14286
#You can apply the tapply function to multiple columns
(or factor variables) passing them through the list
function. In this example, we are going to apply the tapply
function to the type and store factors to
calculate the mean price of the objects by type and store.
Mean price by product type and store
tapply(price, list(type, store), mean)
## Store 1 Store 2 Store 3 Store 4
## toy 46 31.00000 49 36.66667
## food 26 30.33333 39 NA
## electronics 50 29.00000 32 25.00000
## drinks 22 40.00000 20 36.00000
Mean price by product type and store, changing default argument
tapply(price, list(type, store), mean, default = 0)
## Store 1 Store 2 Store 3 Store 4
## toy 46 31.00000 49 36.66667
## food 26 30.33333 39 0.00000
## electronics 50 29.00000 32 25.00000
## drinks 22 40.00000 20 36.00000