Lecture Goals

  1. Learn how to create “for loops” to run an action over an index variable
  2. Understand how to create “design matricies” using expand.grid to make loops simpler
  3. Create multiple plots in a grid easily using a loop

In many analysis tasks, you will want to perform some calculation many times - perhaps over one or more index variables. To accomplish this, we may wish to use a for loop.

A for-loop has two elements, an index that is looped over, and an action that is performed for each index value.

A simple loop

Let’s create a vector containing the integers from 1 to 10 using a simple loop:

Result.Vec <- rep(NA, 10) # Create an initial vector of 10 NA's

for (i in 1:10) {  # Define the index
  
  Result.Vec[i] <- i   # Define the action
                       # In this case, assign the value i to ith element of Result.Vec 
  
  }  # Close the loop

Result.Vec # Show the result
##  [1]  1  2  3  4  5  6  7  8  9 10

Of course, this is a tedius way to calculate the integers 1 to 10 - we could have acheived the same result by using any one of the following much simpler functions:

seq(1, 10, 1)
1:10

For this example, we started by creating a place-holder vector filled with NAs. We then updated each element of the place-holder vector for each index value. While you don’t always need to do this, it’s often a good idea to set up the result ahead of time using a place-holder.

Using a loop to simulate p-values

Let’s do a more complicated loop and use it to test the definition of p-values. We all know that, by definition, a p-value is the probabily of getting a certain test statistic given that the null-hypothesis is false. Let’s conduct a little simulation to confirm this. For this simulation, I will generate 10,000 random samples of size 10 from the standard normal distribution (which has a mean of 0 and standard deviation of 1). For each sample, I will calculate a one-sample t.test on the sample mean, comparing it to the null hypothesis of 0. According to what our statistics teachers told us, we should see p-values less than .05 only 5% of the time:

N.Sim <- 10000  # Number of simulations

p.values <- rep(NA, N.Sim) # Place-holder vector to update with simulation results

for (i in 1:N.Sim) { # Define the action for each loop
  
  Sample.Data <- rnorm(10, mean = 0, sd = 1) # Generate 10 observations from the standard normal dist
  Test <- t.test(Sample.Data) # Conduct a one-sample t.test on the sample data
  Current.p.value <- Test$p.value # Extract the p-value from the test
  
  p.values[i] <- Current.p.value # Add the p-value to the p.values vector
  
  } # End Loop
  

p.values.lt05 <- p.values <= .05 # Create a new vector indicating which p.values are less than .05
Final.Result <- mean(p.values.lt05)  # Final result! Should be close to .05
Final.Result
## [1] 0.0459

I got a final result of 0.0459, very close to .05 - so it looks like our prediction did indeed hold up!

Looping over many values using a Design Matrix

But what if you want to vary more than one variable in a loop? For example, what if we wanted to test whether or not our result depends on the sample size? Maybe larger sample sizes are more or less likely to result in a p-value less than .05 than a sample size of 10. To test this, let’s repeat the simulation, but apply it to several sample sizes.

Design matrix: Keeping loops simple

Because this simulation requires me to loop over two variables (Simulation number and Sample Size), I will create a Design Matrix called Design.mtx. The purpose of this matrix is to allow me to use a single loop instead of two loops. I do this by creating a matrix with two columns, one for each looping variable, that contains all combinations of the looping variables. The easiest way to do this is with the expand.grid function. To see how the expand.grid function works, let’s create a matrix with all combinations of the letters A, B, C, and the numbers 1, 2 and 3:

Design.mtx <- expand.grid(
            "Letter" = c("A", "B", "C"),
            "Number" = 1:3
  )

Design.mtx
##   Letter Number
## 1      A      1
## 2      B      1
## 3      C      1
## 4      A      2
## 5      B      2
## 6      C      2
## 7      A      3
## 8      B      3
## 9      C      3

Now, if I want to run a simulation on all combinations of Letter and Number, I can do a loop over the rows of the Design Matrix, and extract the Letter and Number corresponding to each row. This makes code much simpler than if we separately conducted a loop over letters and numbers!

Now let’s start the new p-value simulation using a Design Matrix for Sample Size and Simulation:

N.Sim <- 10000 # This is the number of simulations for each SS

# Create the design matrix using expand.grid
#   which gives us all combinations of two vectors. We will run our loop over
#   all rows of this Design matrix

Design.mtx <- expand.grid(
                          "Sample.Size" = c(10, 20, 50, 100, 1000), 
                          "Sim" = 1:N.Sim
                          )

# Calculate the total number of simulation runs in the Design matrix.
#   It should be the product of the length of the vectors in expand.grid()

Total.Runs <- nrow(Design.mtx)

# Set up the result vector

p.values <- rep(NA, Total.Runs)

for (i in 1:Total.Runs) {# Define the action for each loop
  
  Current.Sample.Size <- Design.mtx$Sample.Size[i] # Get the sample size for the current run
  
  Sample.Data <- rnorm(Current.Sample.Size, mean = 0, sd = 1) 
      # Generate [Current.Sample.Size] observations from the standard normal dist
  
  Test <- t.test(Sample.Data) # Conduct a one-sample t.test on the sample data
  Current.p.value <- Test$p.value # Extract the p-value from the test
  
  p.values[i] <- Current.p.value # Add the p-value to the p.values vector
  
  }

Design.mtx$p.values <- p.values # Add the final p.values vector to the dataframe
Design.mtx$p.values.lt05 <- p.values <= .05 # Determine which values are less than .05

Final.Result <- with(Design.mtx, tapply(p.values.lt05, Sample.Size, mean)) # Calculate the proportion
 # of values that are less than .05 for each sample size

Final.Result
##     10     20     50    100   1000 
## 0.0516 0.0483 0.0493 0.0519 0.0527

The vector Final.Result shows the final proportions for each sample size, and they all look very close to .05. So, it looks like the total sample size does not affect our interpretation of a p.value!

Using a loop to generate multiple plots

You can use a loop to do more than conduct calculations. You can use them to easily generate multiple plots. For example, let’s say you ran a study where 25 participants responded to 100 problems. After collecting the data, you want to graph the distribution of response times for each participant. We can easily do this with a loop:

First, I’ll generate some random response time data (using a loop!)

Response.Times <- expand.grid(
                              "Participant" = 1:9,
                              "Response" = rep(NA, 100)
  )

True.Rates <- runif(9, 0, 1) # Here are the rates for each participant

for (i in 1:nrow(Response.Times)) {
  
  Current.Participant <- Response.Times$Participant[i]  # What is the current participant?
  Current.Participant.Rate <- True.Rates[Current.Participant] # What is this participant's true response rate?
  
  Response.Times$Response[i] <- rexp(1, Current.Participant.Rate) # Generate a single response time

}

Now that I have the respons times, I’ll create a histogram for each participant. I’ll get a bit fancy by addiing the participant number to the plot head as well as the participant’s median response time.

par(mfrow = c(3, 3)) # Create a 3x3 matrix of plotting regions

for (i in 1:9) { # Loop over participants
  
  Data.i <- Response.Times$Response[Response.Times$Participant == i] # Get response times for current participant
  Median.i <- round(median(Data.i), 2)
  
  hist(Data.i, 
       xlim = c(0, 40), 
       main = paste("Participant ", i, "\nMedian = ", Median.i, sep = "")
       ) # Generate histogram
  
  abline(v = Median.i, lwd = 2, col = "red") # Add vertical line at the median
  
  }

plot of chunk unnamed-chunk-8

Here are a few last tips for using loops:

  1. If your loop is very simple (like listing numbers 1 through 10), check if there is a built in R function that can do it with less code (e.g.; tapply())
  2. Before running your loop, create a place-holder vector or matrix that stores the results of each iteration of your loop.
  3. If you have several index variables, try to avoid using a loop for each index variable. Instead, consider using a Design matrix that contains all combinations of your index variables. Then do a single loop over the rows of the Design Matrix.
  4. If your loop takes a long time, you may want R to tell you which iteration it is on by including a command like the following:
paste("Current Iteration is ", i, "out of 1000")