R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(ggplot2)

Including Plots

You can also embed plots, for example: ## Outlines 1 The tapply function 2 How to use tapply in R? 2.1 Additional arguments example: Ignore NA 3 Tapply in R with multiple factors

The tapply function

#The R tapply function is very similar to the apply function. In the following block of code we show the function syntax and the simplified description of each argument.

Syntax

#tapply(X, Object you can split (matrix, data frame, …) # INDEX, List of factors of the same length # FUN, Function to be applied to factors (or NULL) # …, Additional arguments to be passed to FUN # default = NA, If simplify = TRUE, is the array initialization value # simplify = (TRUE) If set to FALSE returns a list object

How to use tapply in R?

#The tapply function is very easy to use in R. First, consider the following example dataset, that represents the price of some objects,its type and the store where they were sold.

set.seed(2)

data_set <- data.frame(price = round(rnorm(25, sd = 10, mean = 30)),
type = sample(1:4, size = 25, replace = TRUE), store =
sample(paste("Store", 1:4), size = 25, replace = TRUE))

head(data_set)
##   price type   store
## 1    21    2 Store 2
## 2    32    3 Store 3
## 3    46    4 Store 4
## 4    19    3 Store 4
## 5    29    1 Store 4
## 6    31    3 Store 4

#Second, store the values as variables and convert the column named type to factor.

price <- data_set$price 
store <- data_set$store 
type <- factor(data_set$type, 
labels = c("toy", "food", "electronics","drinks"))

#Finally, you can use the tapply function to calculate the mean by type of object of the stores as follows:

Mean price by product type

mean_prices <- tapply(price, type, mean) 
mean_prices
##         toy        food electronics      drinks 
##    39.50000    30.33333    32.20000    29.33333

“array”

class(mean_prices) 
## [1] "array"
mean_prices[2]
##     food 
## 30.33333

#However, you can modify the output class to list if you set the simplify argument to FALSE.

Mean price by product type

mean_prices_list <- tapply(price, type, mean, simplify = FALSE)
mean_prices_list
## $toy
## [1] 39.5
## 
## $food
## [1] 30.33333
## 
## $electronics
## [1] 32.2
## 
## $drinks
## [1] 29.33333

#In this case, you can access the output elements with the $ sign and the element name.

mean_prices_list$toy
## [1] 39.5

#Suppose that your data frame contains some NA values in its columns.

Adding a NA values to the data set

data_set[1, 1] <- NA
data_set[2, 3] <- NA

Mean price by store

tapply(data_set$price, data_set$store, mean)
##  Store 1  Store 2  Store 3  Store 4 
## 32.00000       NA 39.25000 33.14286
tapply(data_set$price, data_set$store, mean, na.rm = TRUE)
##  Store 1  Store 2  Store 3  Store 4 
## 32.00000 33.50000 39.25000 33.14286

The previous is equivalent to the following:

f <- function(x) mean(x, na.rm = TRUE)
tapply(data_set$price, data_set$store, f)
##  Store 1  Store 2  Store 3  Store 4 
## 32.00000 33.50000 39.25000 33.14286

Tapply in R with multiple factors

#You can apply the tapply function to multiple columns (or factor variables) passing them through the list function. In this example, we are going to apply the tapply function to the type and store factors to calculate the mean price of the objects by type and store.

Mean price by product type and store

tapply(price, list(type, store), mean)
##             Store 1  Store 2 Store 3  Store 4
## toy              46 31.00000      49 36.66667
## food             26 30.33333      39       NA
## electronics      50 29.00000      32 25.00000
## drinks           22 40.00000      20 36.00000

Mean price by product type and store, changing default argument

tapply(price, list(type, store), mean, default = 0)
##             Store 1  Store 2 Store 3  Store 4
## toy              46 31.00000      49 36.66667
## food             26 30.33333      39  0.00000
## electronics      50 29.00000      32 25.00000
## drinks           22 40.00000      20 36.00000