For Loops in R

Overview

This tutorial covers how to use for() loops to conduct iterative data analysis steps.


Sample Data

Let’s first create a sample dataset consisting of 10 weeks worth of sales data for 5 different sales employees:

set.seed(999)

dat1 <- tibble(emp_id = rep(c(100101,102423,105043,102341,111302),
                            rep(10, 5)),
               week = rep(1:10,5),
               sales = sample(0:10,50,T),
               calls = sample(100:300,50,T),
               mgr = c(rep("Bob",20),rep("Sally",20),rep("Omar",10)))

dat1 %>% 
  arrange(week,desc(sales)) %>% 
  head(5)
## # A tibble: 5 × 5
##   emp_id  week sales calls mgr  
##    <dbl> <int> <int> <int> <chr>
## 1 100101     1    10   186 Bob  
## 2 105043     1     9   222 Sally
## 3 111302     1     9   223 Omar 
## 4 102341     1     8   132 Sally
## 5 102423     1     7   116 Bob

For Loops

For Loops execute a code statement iteratively over items in a sequence where the number of iterations is known prior to execution.

For Loop Structure

A for loop consists of only 3 components:

Conceptually: output = for(value in sequence){execute action}

For loops can also be simple/single loops, or nested/multiple loops.


Single For Loop - Numeric Indexing

A simple for loop is a for loop with only a single sequence to iterate over. The most common way to iterate over the input elements is to use a numeric index, typically using seq_along


Example: Calculate the total number of sales calls made for each week

Note: When writing a loop, it can be helpful to reframe your goal in “loop format”

# Define sequence index of weeks
weekindex <- unique(dat1$week)

# Test what the numeric index sequence looks like
seq_along(weekindex)
##  [1]  1  2  3  4  5  6  7  8  9 10
# Initialize output vector & give element names
output <- vector(mode = "double", length = length(weekindex))
names(output) <- paste("Week",weekindex)

# Write Loop
for(i in seq_along(weekindex)){
  output[[i]] <- sum(dat1$calls[dat1$week==i])
}

# Results
output
##  Week 1  Week 2  Week 3  Week 4  Week 5  Week 6  Week 7  Week 8  Week 9 Week 10 
##     879     968     932    1018    1009     880    1116     958     959     914

We can verify these results by using a group_by() and summarize() statement:

dat1 %>% 
  group_by(week) %>% 
  summarize(totalcalls = sum(calls))
## # A tibble: 10 × 2
##     week totalcalls
##    <int>      <int>
##  1     1        879
##  2     2        968
##  3     3        932
##  4     4       1018
##  5     5       1009
##  6     6        880
##  7     7       1116
##  8     8        958
##  9     9        959
## 10    10        914

As expected, the results match!


For Loops & Indexing by Name

Instead of using a numeric index, we can also iterate over out input elements using names.


Example: Calculate the min, max, mean, median, and sd of sales for each manager by name:

# Index of Manager Names
mgrnames <- unique(dat1$mgr)

# Test what the name index sequence looks like
mgrnames
## [1] "Bob"   "Sally" "Omar"
# Function to compute our 5 Stats
mystats <- function(x) {
  min_sales <- min(x)
  mean_sales <- mean(x)
  median_sales <- median(x)
  max_sales <- max(x)
  sd_sales <- sd(x)
  
  list(min_sales = min_sales,
       mean_sales = mean_sales,
       median_sales = median_sales,
       max_sales = max_sales,
       sd_sales = sd_sales)
}

# Initialize Output list
output <- vector(mode = "list", length = length(mgrnames))
names(output) <- mgrnames

# Loop -> For each manager, calculate their sales stats
for(i in mgrnames){
  output[[i]] <- mystats(dat1$sales[dat1$mgr==i])
}

# Unlist results and round
round(unlist(output),2) 
##      Bob.min_sales     Bob.mean_sales   Bob.median_sales      Bob.max_sales 
##               0.00               4.95               5.00              10.00 
##       Bob.sd_sales    Sally.min_sales   Sally.mean_sales Sally.median_sales 
##               3.22               0.00               6.25               7.00 
##    Sally.max_sales     Sally.sd_sales     Omar.min_sales    Omar.mean_sales 
##              10.00               3.08               1.00               5.00 
##  Omar.median_sales     Omar.max_sales      Omar.sd_sales 
##               5.50               9.00               2.91

Nested For Loops

Nested For Loops consist of one loop nested within another, and are used when you wish to iterate over multiple sequences or dimensions


Structure for Nested For Loops

The components of the nested for loop are the same as in the single loop, with a couple of modifications:


Example: For each manager and each week, calculate the average number of sales made

weekindex <- unique(dat1$week)
names(weekindex) <- paste("Week",weekindex)

mgr_index <- unique(dat1$mgr)
names(mgr_index) <- paste("Mgr",mgr_index)

outmat <- matrix(nrow = length(weekindex),ncol = length(mgr_index))
colnames(outmat) <- names(mgr_index)
row.names(outmat) <- names(weekindex)

for(j in seq_along(mgr_index)){
  for(i in seq_along(weekindex)){
    outmat[i,j] <- mean(dat1$sales[dat1$mgr==mgr_index[[j]] & dat1$week==weekindex[[i]]])
  }
}

outmat
##         Mgr Bob Mgr Sally Mgr Omar
## Week 1      8.5       8.5        9
## Week 2      3.5       7.0        3
## Week 3      6.0       7.0        3
## Week 4      6.5       3.5        8
## Week 5      1.0       6.0        6
## Week 6      9.5       4.0        1
## Week 7      5.0       9.5        8
## Week 8      2.5       5.0        1
## Week 9      3.0       5.0        5
## Week 10     4.0       7.0        6

Let’s quickly validate the results:

dat1 %>% 
  group_by(mgr,week) %>% 
  summarize(avg_sales = mean(sales)) %>% 
  pivot_wider(names_from = "mgr",
              values_from = "avg_sales")
## `summarise()` has grouped output by 'mgr'. You can override using the `.groups`
## argument.

## # A tibble: 10 × 4
##     week   Bob  Omar Sally
##    <int> <dbl> <dbl> <dbl>
##  1     1   8.5     9   8.5
##  2     2   3.5     3   7  
##  3     3   6       3   7  
##  4     4   6.5     8   3.5
##  5     5   1       6   6  
##  6     6   9.5     1   4  
##  7     7   5       8   9.5
##  8     8   2.5     1   5  
##  9     9   3       5   5  
## 10    10   4       6   7

Great! The results match!


Thanks for reading!