This tutorial covers how to use for() loops to conduct iterative data analysis steps.
Let’s first create a sample dataset consisting of 10 weeks worth of sales data for 5 different sales employees:
set.seed(999)
dat1 <- tibble(emp_id = rep(c(100101,102423,105043,102341,111302),
rep(10, 5)),
week = rep(1:10,5),
sales = sample(0:10,50,T),
calls = sample(100:300,50,T),
mgr = c(rep("Bob",20),rep("Sally",20),rep("Omar",10)))
dat1 %>%
arrange(week,desc(sales)) %>%
head(5)## # A tibble: 5 × 5
## emp_id week sales calls mgr
## <dbl> <int> <int> <int> <chr>
## 1 100101 1 10 186 Bob
## 2 105043 1 9 222 Sally
## 3 111302 1 9 223 Omar
## 4 102341 1 8 132 Sally
## 5 102423 1 7 116 Bob
For Loops execute a code statement iteratively over items in a sequence where the number of iterations is known prior to execution.
A for loop consists of only 3 components:
output <- vector()i in seq_along(data)output[[i]] <- max(data[[i]])Conceptually:
output = for(value in sequence){execute action}
For loops can also be simple/single loops, or nested/multiple loops.
A simple for loop is a for loop with only a single sequence to
iterate over. The most common way to iterate over the input elements is
to use a numeric index, typically using
seq_along
seq_along(unique(dat1$week))Example: Calculate the total number of sales calls made for each week
Note: When writing a loop, it can be helpful to reframe your goal in “loop format”
# Define sequence index of weeks
weekindex <- unique(dat1$week)
# Test what the numeric index sequence looks like
seq_along(weekindex)## [1] 1 2 3 4 5 6 7 8 9 10
# Initialize output vector & give element names
output <- vector(mode = "double", length = length(weekindex))
names(output) <- paste("Week",weekindex)
# Write Loop
for(i in seq_along(weekindex)){
output[[i]] <- sum(dat1$calls[dat1$week==i])
}
# Results
output## Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10
## 879 968 932 1018 1009 880 1116 958 959 914
We can verify these results by using a group_by() and summarize() statement:
dat1 %>%
group_by(week) %>%
summarize(totalcalls = sum(calls))## # A tibble: 10 × 2
## week totalcalls
## <int> <int>
## 1 1 879
## 2 2 968
## 3 3 932
## 4 4 1018
## 5 5 1009
## 6 6 880
## 7 7 1116
## 8 8 958
## 9 9 959
## 10 10 914
As expected, the results match!
Instead of using a numeric index, we can also iterate over out input elements using names.
seq_along(unique(dat1$mgr))Example: Calculate the min, max, mean, median, and sd of sales for each manager by name:
# Index of Manager Names
mgrnames <- unique(dat1$mgr)
# Test what the name index sequence looks like
mgrnames## [1] "Bob" "Sally" "Omar"
# Function to compute our 5 Stats
mystats <- function(x) {
min_sales <- min(x)
mean_sales <- mean(x)
median_sales <- median(x)
max_sales <- max(x)
sd_sales <- sd(x)
list(min_sales = min_sales,
mean_sales = mean_sales,
median_sales = median_sales,
max_sales = max_sales,
sd_sales = sd_sales)
}
# Initialize Output list
output <- vector(mode = "list", length = length(mgrnames))
names(output) <- mgrnames
# Loop -> For each manager, calculate their sales stats
for(i in mgrnames){
output[[i]] <- mystats(dat1$sales[dat1$mgr==i])
}
# Unlist results and round
round(unlist(output),2) ## Bob.min_sales Bob.mean_sales Bob.median_sales Bob.max_sales
## 0.00 4.95 5.00 10.00
## Bob.sd_sales Sally.min_sales Sally.mean_sales Sally.median_sales
## 3.22 0.00 6.25 7.00
## Sally.max_sales Sally.sd_sales Omar.min_sales Omar.mean_sales
## 10.00 3.08 1.00 5.00
## Omar.median_sales Omar.max_sales Omar.sd_sales
## 5.50 9.00 2.91
Nested For Loops consist of one loop nested within another, and are used when you wish to iterate over multiple sequences or dimensions
The components of the nested for loop are the same as in the single loop, with a couple of modifications:
for i in seq_along(for j in seq_along())output <- matrix(nrow = length(i), ncol = length(j)output <- vector(length = length(i) * length(j))outmat[i,j] <- x[[i]] * y[[j]]Example: For each manager and each week, calculate the average number of sales made
weekindex <- unique(dat1$week)
names(weekindex) <- paste("Week",weekindex)
mgr_index <- unique(dat1$mgr)
names(mgr_index) <- paste("Mgr",mgr_index)
outmat <- matrix(nrow = length(weekindex),ncol = length(mgr_index))
colnames(outmat) <- names(mgr_index)
row.names(outmat) <- names(weekindex)
for(j in seq_along(mgr_index)){
for(i in seq_along(weekindex)){
outmat[i,j] <- mean(dat1$sales[dat1$mgr==mgr_index[[j]] & dat1$week==weekindex[[i]]])
}
}
outmat## Mgr Bob Mgr Sally Mgr Omar
## Week 1 8.5 8.5 9
## Week 2 3.5 7.0 3
## Week 3 6.0 7.0 3
## Week 4 6.5 3.5 8
## Week 5 1.0 6.0 6
## Week 6 9.5 4.0 1
## Week 7 5.0 9.5 8
## Week 8 2.5 5.0 1
## Week 9 3.0 5.0 5
## Week 10 4.0 7.0 6
Let’s quickly validate the results:
dat1 %>%
group_by(mgr,week) %>%
summarize(avg_sales = mean(sales)) %>%
pivot_wider(names_from = "mgr",
values_from = "avg_sales")## `summarise()` has grouped output by 'mgr'. You can override using the `.groups`
## argument.
## # A tibble: 10 × 4
## week Bob Omar Sally
## <int> <dbl> <dbl> <dbl>
## 1 1 8.5 9 8.5
## 2 2 3.5 3 7
## 3 3 6 3 7
## 4 4 6.5 8 3.5
## 5 5 1 6 6
## 6 6 9.5 1 4
## 7 7 5 8 9.5
## 8 8 2.5 1 5
## 9 9 3 5 5
## 10 10 4 6 7
Great! The results match!
Thanks for reading!