For Loops in R

Overview

This tutorial covers how to use for() loops to conduct iterative data analysis steps.

Sample Data

Let’s first create a sample dataset consisting of 10 weeks worth of sales data for 5 different sales employees:

set.seed(999)

dat1 <- tibble(emp_id = rep(c(100101,102423,105043,102341,111302),
                            rep(10, 5)),
               week = rep(1:10,5),
               sales = sample(0:10,50,T),
               calls = sample(100:300,50,T),
               mgr = c(rep("Bob",20),rep("Sally",20),rep("Omar",10)))

dat1 %>% 
  arrange(week,desc(sales)) %>% 
  head(5)

## # A tibble: 5 × 5
##   emp_id  week sales calls mgr  
##    <dbl> <int> <int> <int> <chr>
## 1 100101     1    10   186 Bob  
## 2 105043     1     9   222 Sally
## 3 111302     1     9   223 Omar 
## 4 102341     1     8   132 Sally
## 5 102423     1     7   116 Bob

For Loops

For Loops execute a code statement iteratively over items in a sequence where the number of iterations is known prior to execution.

For Loops typically run until the last item in the pre-specified sequence is encountered

For Loop Structure

A for loop consists of only 3 components:

output: pre-allocated space for the loop output: output <- vector()
sequence: sequence/index to iterate over: i in seq_along(data)
body: statement executed during each iteration: output[[i]] <- max(data[[i]])

Conceptually: output = for(value in sequence){execute action}

For loops can also be simple/single loops, or nested/multiple loops.

Single For Loop - Numeric Indexing

A simple for loop is a for loop with only a single sequence to iterate over. The most common way to iterate over the input elements is to use a numeric index, typically using seq_along

e.g. Sequence of Sales Weeks: seq_along(unique(dat1$week))

Example: Calculate the total number of sales calls made for each week

Note: When writing a loop, it can be helpful to reframe your goal in “loop format”

e.g. “For each week in weeks 1:10, add up the total calls made that week”

# Define sequence index of weeks
weekindex <- unique(dat1$week)

# Test what the numeric index sequence looks like
seq_along(weekindex)

##  [1]  1  2  3  4  5  6  7  8  9 10

# Initialize output vector & give element names
output <- vector(mode = "double", length = length(weekindex))
names(output) <- paste("Week",weekindex)

# Write Loop
for(i in seq_along(weekindex)){
  output[[i]] <- sum(dat1$calls[dat1$week==i])
}

# Results
output

##  Week 1  Week 2  Week 3  Week 4  Week 5  Week 6  Week 7  Week 8  Week 9 Week 10 
##     879     968     932    1018    1009     880    1116     958     959     914

We can verify these results by using a group_by() and summarize() statement:

dat1 %>% 
  group_by(week) %>% 
  summarize(totalcalls = sum(calls))

## # A tibble: 10 × 2
##     week totalcalls
##    <int>      <int>
##  1     1        879
##  2     2        968
##  3     3        932
##  4     4       1018
##  5     5       1009
##  6     6        880
##  7     7       1116
##  8     8        958
##  9     9        959
## 10    10        914

As expected, the results match!

For Loops & Indexing by Name

Instead of using a numeric index, we can also iterate over out input elements using names.

e.g. Sequence of Manager Names: seq_along(unique(dat1$mgr))

Example: Calculate the min, max, mean, median, and sd of sales for each manager by name:

# Index of Manager Names
mgrnames <- unique(dat1$mgr)

# Test what the name index sequence looks like
mgrnames

## [1] "Bob"   "Sally" "Omar"

# Function to compute our 5 Stats
mystats <- function(x) {
  min_sales <- min(x)
  mean_sales <- mean(x)
  median_sales <- median(x)
  max_sales <- max(x)
  sd_sales <- sd(x)
  
  list(min_sales = min_sales,
       mean_sales = mean_sales,
       median_sales = median_sales,
       max_sales = max_sales,
       sd_sales = sd_sales)
}

# Initialize Output list
output <- vector(mode = "list", length = length(mgrnames))
names(output) <- mgrnames

# Loop -> For each manager, calculate their sales stats
for(i in mgrnames){
  output[[i]] <- mystats(dat1$sales[dat1$mgr==i])
}

# Unlist results and round
round(unlist(output),2)

##      Bob.min_sales     Bob.mean_sales   Bob.median_sales      Bob.max_sales 
##               0.00               4.95               5.00              10.00 
##       Bob.sd_sales    Sally.min_sales   Sally.mean_sales Sally.median_sales 
##               3.22               0.00               6.25               7.00 
##    Sally.max_sales     Sally.sd_sales     Omar.min_sales    Omar.mean_sales 
##              10.00               3.08               1.00               5.00 
##  Omar.median_sales     Omar.max_sales      Omar.sd_sales 
##               5.50               9.00               2.91

Nested For Loops

Nested For Loops consist of one loop nested within another, and are used when you wish to iterate over multiple sequences or dimensions

Most nested loops are double loops, but there’s no limit

Structure for Nested For Loops

The components of the nested for loop are the same as in the single loop, with a couple of modifications:

sequence: Multiple instead of single
- for i in seq_along(for j in seq_along())
output: either a vector with length = multiples or matrix/array
- output <- matrix(nrow = length(i), ncol = length(j)
- output <- vector(length = length(i) * length(j))
body: iterates execution over both indices
- outmat[i,j] <- x[[i]] * y[[j]]

Example: For each manager and each week, calculate the average number of sales made

weekindex <- unique(dat1$week)
names(weekindex) <- paste("Week",weekindex)

mgr_index <- unique(dat1$mgr)
names(mgr_index) <- paste("Mgr",mgr_index)

outmat <- matrix(nrow = length(weekindex),ncol = length(mgr_index))
colnames(outmat) <- names(mgr_index)
row.names(outmat) <- names(weekindex)

for(j in seq_along(mgr_index)){
  for(i in seq_along(weekindex)){
    outmat[i,j] <- mean(dat1$sales[dat1$mgr==mgr_index[[j]] & dat1$week==weekindex[[i]]])
  }
}

outmat

##         Mgr Bob Mgr Sally Mgr Omar
## Week 1      8.5       8.5        9
## Week 2      3.5       7.0        3
## Week 3      6.0       7.0        3
## Week 4      6.5       3.5        8
## Week 5      1.0       6.0        6
## Week 6      9.5       4.0        1
## Week 7      5.0       9.5        8
## Week 8      2.5       5.0        1
## Week 9      3.0       5.0        5
## Week 10     4.0       7.0        6

Let’s quickly validate the results:

dat1 %>% 
  group_by(mgr,week) %>% 
  summarize(avg_sales = mean(sales)) %>% 
  pivot_wider(names_from = "mgr",
              values_from = "avg_sales")

## `summarise()` has grouped output by 'mgr'. You can override using the `.groups`
## argument.

## # A tibble: 10 × 4
##     week   Bob  Omar Sally
##    <int> <dbl> <dbl> <dbl>
##  1     1   8.5     9   8.5
##  2     2   3.5     3   7  
##  3     3   6       3   7  
##  4     4   6.5     8   3.5
##  5     5   1       6   6  
##  6     6   9.5     1   4  
##  7     7   5       8   9.5
##  8     8   2.5     1   5  
##  9     9   3       5   5  
## 10    10   4       6   7

Great! The results match!

Thanks for reading!