loop walkthrough

I was asked about a loop I wrote, so this documents my line by line thinking in writing it.

data(iris)
trial_size <- 200
collected_results <- numeric(trial_size)
for (i in 1:trial_size){
  single_function_time <- system.time(cor(iris$Sepal.Width,iris$Sepal.Length))
  collected_results[i] <- single_function_time[1]
}
print(mean(collected_results))

Line 1

data(iris)

My thinking: I like the iris data set, I tend to use it in any example I can.

Line 2

trial_size <- 200

My thinking: Create a numeric variable called trial_size to be able to easily control the size of the trial in a couple of places subsequently.

Line 3

collected_results <- numeric(trial_size)

My thinking: Of the kind of things in R, you can create a new vector of that kind of thing with name(howmany). Think of it as creating a line of 200 empty boxes that can only be filled by numbers. I could also have created the right number of boxes with collected_results <- 1:trial_size but then the boxes would already have had numbers in them, and later on if something goes wrong putting things in the boxes it is easier to tell if it has gone wrong if there is nothing in the box to start with.

Line 4ish

for (i in 1:trial_size){
    ...
    }

My thinking: A for loop to repeat something a lot. in this case do something while counting up from 1 to trial_size, keeping track of which number you are at by setting i to it, and repeating the process as you go.

Line 5

single_function_time <- system.time(cor(iris$Sepal.Width,iris$Sepal.Length))

My thinking: Let’s do one reading of system.time() on the function and store the results

Line 6

collected_results[i] <- single_function_time[1]

My thinking: After reading the help for system.time with ?system.time and realizing that it is the user time that is most reliable in older systems that are short of RAM and doing a lot of virtual memory swapping or similar, I used str(single_function_time) to poke around inside the system.time object to see what was where. The first thing in it was the user time, hence single_function_time[1]. This value is assigned to the position in the collected_results vector that matches the step in the loop that we are up to. Technically we are subsetting collected_results to only include the one equivalent to the place we are at collected_results[i] and assigning that subset the value.

Line 7

print(mean(collected_results))

My thinking: Let’s get a representative number by using mean(). Looking at a graph of the timings with plot(collected_results) I would probably use the median() in future as it would better represent the common time.