This is part of My notes on R programming on my site: https://dataz4s.com/. It is based on Mike Marin’s Statslectures video ’Apply Function in R’.
Apply functions consist of a set of loop functions in R. They take less coding and thereby result in lower risk of error when writing as well as they usually a faster than e.g. for loops.
?apply(): apply(X, MARGIN, FUN, …) the X is the object to which we apply the function to. The MARGIN is for rows or columns. MARGIN1 means rows and MARGIN2 is for columns. FUN is the function and the … are the rest of the arguments we might send to the function.Let’s run an example with the dataset StockData which can be downloaded in Mike Marin’s page: https://www.statslectures.com/r-scripts-datasets
# Read in data via read_excel
library(readxl)
StockData <- read.table("C:/Users/Usuario/Documents/dataZ4s/R/Apply function/StockExample.txt")
StockData
## Stock1 Stock2 Stock3 Stock4
## Day1 185.74 1.47 1605 95.05
## Day2 184.26 1.56 1580 97.49
## Day3 162.21 1.39 1490 88.57
## Day4 159.04 1.43 1520 85.55
## Day5 164.87 1.42 1550 92.04
## Day6 162.72 1.36 1525 91.70
## Day7 157.89 NA 1495 89.88
## Day8 159.49 1.43 1485 93.17
## Day9 150.22 1.57 1470 90.12
## Day10 151.02 1.54 1510 92.14
# We will use the apply function
# MARGIN=2 meaning for columns. The data is StockData and the function is mean()
# An NA value is returned for column 2 as Day 7 in Stock 2 has a missing value
apply(X = StockData, MARGIN = 2,FUN = mean)
## Stock1 Stock2 Stock3 Stock4
## 163.746 NA 1523.000 91.571
# With the na.rm function we can have NA values removed
# With the na.rm function we thereby get the mean of all 4 stocks
apply(X = StockData, MARGIN = 2,FUN = mean, na.rm=TRUE)
## Stock1 Stock2 Stock3 Stock4
## 163.746000 1.463333 1523.000000 91.571000
# Save apply function to object
AVG <- apply(X = StockData, MARGIN = 2,FUN = mean, na.rm=TRUE)
AVG
## Stock1 Stock2 Stock3 Stock4
## 163.746000 1.463333 1523.000000 91.571000
# When confortable with the commands and the default orders in the functions we can skip the argument names
apply(StockData, 2, mean, na.rm=TRUE)
## Stock1 Stock2 Stock3 Stock4
## 163.746000 1.463333 1523.000000 91.571000
# The colMeans command does the same as the apply command that we used above
# It is already built into the function that it is the mean of columns
# The argument only takes the data adding argument for na.rm
colMeans(StockData, na.rm = TRUE)
## Stock1 Stock2 Stock3 Stock4
## 163.746000 1.463333 1523.000000 91.571000
# Max values of the stocks
apply(X = StockData, MARGIN = 2, FUN = max, na.rm=TRUE)
## Stock1 Stock2 Stock3 Stock4
## 185.74 1.57 1605.00 97.49
# 20st and 80st percentiles
apply(X = StockData, MARGIN = 2, FUN = quantile, probs=c(0.2, 0.8), na.rm=TRUE)
## Stock1 Stock2 Stock3 Stock4
## 20% 156.516 1.408 1489 89.618
## 80% 168.748 1.548 1556 93.546
# Sum for each row
apply(X=StockData, MARGIN = 1, FUN = sum, na.rm=TRUE)
## Day1 Day2 Day3 Day4 Day5 Day6 Day7 Day8 Day9 Day10
## 1887.26 1863.31 1742.17 1766.02 1808.33 1780.78 1742.77 1739.09 1711.91 1754.70
# And like the colMeans command, there is a rowSums command
rowSums(StockData, na.rm = TRUE)
## Day1 Day2 Day3 Day4 Day5 Day6 Day7 Day8 Day9 Day10
## 1887.26 1863.31 1742.17 1766.02 1808.33 1780.78 1742.77 1739.09 1711.91 1754.70
# Create line plots for each of the stocks
apply(X = StockData, MARGIN = 2, FUN = plot, type="l", main="Stock", ylab="Price", xlab="Day")
## NULL
# Plot for total per day
plot(apply(X=StockData, MARGIN = 1, FUN = sum, na.rm=TRUE), type = "l", ylab = "Total Market Value", xlab = "Day", main = "Markets per day")
View this page on my site: https://dataz4s.com/r-statistical-programming/apply-function-in-r/