Introduction to Loops in R

Using for(i in x) is an efficient way to repeatedly execute a code for i number of iterations. In this recitation, we will (1) execute basic loops, (2) create basic functions and (3) demonstrate the Law of Large Numbers and the Central Limit Theorem using loops.

Relevant functions: set.seed(), rnorm(), for(i in x), sample(), cat(), print(), replicate(), sqrt(), prod().

1. Generating Data

1.1 Creating a random normal distribution using `rnorm()`

set.seed(150) # Setting the seed for replication purposes

myData <- rnorm(1000,45,15) # Creating a random normal distribution (n=1000, mean=45, sd=15)

1.2 Performing sanity checks on myData using `length()`, `mean()` and `sd()`

length(myData) # How many observations?

## [1] 1000

mean(myData) # What is the mean?

## [1] 44.52407

sd(myData) # What is the standard deviation?

## [1] 14.85105

1.3 Graphing myData

2. Basic Loops with `for(i in x)`

2.1 Printing 5 observations from myData using `sample()` and `print()`

set.seed(300) # Setting the seed for replication purposes

for (i in 1:5) # Specifying the number of iterations 
{
    obs <- sample(myData,size=1) # Sampling one observation from the myData vector and storing it into the "obs" object
    print(obs) # Printing the value of that "obs" object
    cat("I have finished", i,"iterations \n") # Printing a string of characters after each iteration
}

## [1] 51.39313
## I have finished 1 iterations 
## [1] 46.5734
## I have finished 2 iterations 
## [1] 58.46658
## I have finished 3 iterations 
## [1] 46.90793
## I have finished 4 iterations 
## [1] 26.78559
## I have finished 5 iterations

2.2 Creating a vector with the square root of 5 observations from myData using `sample()` and `sqrt()`

set.seed(300) # Setting the seed for replication purposes

results <- c() # Creating an empty vector to hold the results

for (i in 1:5) # Specifying the number of iterations 
{
    obs <- sample(myData,size=1) # Sampling one observation from the myData vector and storing it into the "obs" object
    results[i] <- sqrt(obs) # Calculating the square root of that "obs" object and storing it into the "results" vector
    cat("The square root of", obs, "is", results[i],"\n") # Printing a string of characters after each iteration
}

## The square root of 51.39313 is 7.1689 
## The square root of 46.5734 is 6.824471 
## The square root of 58.46658 is 7.646344 
## The square root of 46.90793 is 6.848937 
## The square root of 26.78559 is 5.175479

summary(results) # Using summary as a sanity check

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.175   6.824   6.849   6.733   7.169   7.646

2.3 Creating a vector with the sum of 5 pairs of observations from myData using `sample()` and `sum()`

set.seed(300) # Setting the seed for replication purposes

results <- c() # Creating an empty vector to hold the results

for (i in 1:5)
{
    obs <- sample(myData,size=2) # Sampling two observations from the myData vector and storing it into the "obs" object
    results[i] <- sum(obs) # Calculating the sum of the elements encompassed within the "obs" object and storing it into the "results" vector
    cat("The sum of", obs[1], "and", obs[2], "is", results[i],"\n") # Printing a string of characters after each iteration
}

## The sum of 51.39313 and 42.72983 is 94.12297 
## The sum of 58.46658 and 46.90793 is 105.3745 
## The sum of 26.78559 and 44.83856 is 71.62414 
## The sum of 41.26901 and 26.99162 is 68.26063 
## The sum of 68.06239 and 43.11086 is 111.1733

summary(results) # Using summary as a sanity check

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   68.26   71.62   94.12   90.11  105.37  111.17

3. Law of Large Numbers and Central Limit Theorem

3.1 Creating a `die_mean()` function to calculate the mean of `n` number of die rolls

die <- 1:6 # Creating a "die" vector with all numbers from 1 to 6 

die_mean <- function(n) {
  mean(sample(die, size = n, replace = TRUE)) # Sampling with replacement "n" number of times from the following vector: c(1,2,3,4,5,6)
}

3.2 Calculating the mean of `n` number of die rolls using `die_mean()`

set.seed(500) # Setting the seed for replication purposes

for (n in c(10,100,1000,10000,100000)) # Here, we are using 5 iterations with the values within this vector for "n"
{
    result <- die_mean(n) # Calculating the mean of n die rolls
    cat("The mean of", n, "number of die rolls is",  result,"\n") # Printing a string of characters after each iteration
}

## The mean of 10 number of die rolls is 4.7 
## The mean of 100 number of die rolls is 3.26 
## The mean of 1000 number of die rolls is 3.526 
## The mean of 10000 number of die rolls is 3.485 
## The mean of 1e+05 number of die rolls is 3.49953

The mean gets closer to the expected value (or population parameter) as the sample size increases. This is the Law of Large Numbers.

3.3 Drawing `i` number of observations from a vector of means of 15 die rolls using `replicate()` and graphing the results using `hist()`

set.seed(100) # Setting the seed for replication purposes

for (i in c(100,1000,100000))
{
  x <- replicate(i, { # Using replicate to repeat the "randomized experiment" several times without getting the same answer, and storing the results within a x vector (in other words, replicate  reevaluates the given expresion for each replication)
   mm <- die_mean(15) # Calculating the mean of 15 die rolls i number of times
   mean(mm) # Calculating the mean of these means (this isn't as confusing as it sounds: we basically calculate the mean of 15 die rolls multiple times (either 100, 1000 or 100000 times) and calculate the mean of these 100, 1000 or 100000 observations)
   })
  a <- round(sd(x),digits =3)
  hist(x, # Specifying which data will be plotted
       xlab="Mean of 15 die rolls", # Labeling the x axis
       xlim=c(1,6), # Delimiting the x axis
       col="#4286f4", # Check out "color picker" on Google if you want to select a custom color!
       main=paste("Histogram for ", i," random draws (sd = ", a,")", sep="")) # Using the paste command to generate the main plot title
}

The distribution of the sample mean tends toward a normal distribution as the sample size increases. This is the Central Limit Theorem.

To learn more about graphical parameters in base R, you can check this out: https://www.statmethods.net/advgraphs/parameters.html

Exercises

Exercise 1

Calculate the square root of all i numbers from 100 to 115. Print a textual output that states “The square root of i is answer” for each iteration.

Relevant functions: sqrt(), cat().

Exercise 2

Set your seed at 250. Generate a random normal distribution of 5000 observations, with a mean of 20 and a standard deviation of 3. Calculate the product (*) of 10 pairs of observations from this distribution. Print a textual output that states “The product of first value and second value is answer” for each iteration.

Relevant functions: set.seed(), rnorm(), prod(), cat().

Exercise 3

Set your seed at 100. Provide an example of the Central Limit Theorem by generating 4 histograms of the means of 20 coin flips (one for 10 draws, one for 50 draws, one for 100 draws, and one for 1000 draws).

Relevant functions: function(), mean(), replicate(), hist().

Introduction to Loops in R

Evelyne Brie

1. Generating Data

1.1 Creating a random normal distribution using rnorm()

1.2 Performing sanity checks on myData using length(), mean() and sd()

1.3 Graphing myData

2. Basic Loops with for(i in x)

2.1 Printing 5 observations from myData using sample() and print()

2.2 Creating a vector with the square root of 5 observations from myData using sample() and sqrt()

2.3 Creating a vector with the sum of 5 pairs of observations from myData using sample() and sum()