Learning R, Part 3

Nathan Byers
April 21, 2014

Advanced topics

Now that we've covered the basics and some intermediate material, we'll get into more advanced topics

  • writing our own functions
  • for loops
  • the lapply() function

Writing functions

Writing functions

  • In part 2 we covered how functions worked
  • We used the matrix() function as an example
  • To review how that function works, and what the arguments are, run ?matrix

Writing functions

  • We're going to write our own function that finds the mean of a vector of numbers
  • Since there's already a mean() function in R, we're going to call our function myMean()

Writing functions

  • First, let's create the skeleton of a function
  • All functions are created as a variable using <-
myMean <- function(){

}

Writing functions

Since the function will require a vector, we put a variable name for that in the parenthesis

myMean <- function(vector){

}

Writing functions

  • Now we put the calculation of the mean inside the curly braces
  • Because we won't know how many values will be in the vector, we use the length() function to get that

Writing functions

For example, here we use length() to find the number of values in this vector

x <- c(1, 4, 3, 8, 2)
length(x)
[1] 5

Writing functions

  • We will also need the sum() function
  • Here we find the sum of the vector we created in the previous slide
sum(x)
[1] 18

Writing functions

Now we have all of the necessary pieces to write our calculation in the curly braces

myMean <- function(vector){
  sum.of.vector <- sum(vector)
  number.of.values <- length(vector)
  average <- sum.of.vector/number.of.values
  average
}

Writing functions

  • The function works by taking the vector that the user provides and running each of the lines inside the curly braces
  • The last line of the function is what is returned
  • You can also assign the value to another variable

Writing functions

Here we use the x variable we've already created

myMean(x)
[1] 3.6
x.mean <- myMean(x)
x.mean
[1] 3.6

For loops

For loops

  • Like in most programming languages, R has while loops and for loops
  • For loops are more commonly used, so that's what we'll cover
  • But in R, it's actually even more common to use a special family of loop functions in R, called apply() functions
  • We'll cover one of those in the next section

For loops

  • We use for() in R to repeat a sequence of evaluations for a set number of times
  • So it looks like this
for(var in seq)
  • where var is the current iteration, and seq is the sequence of numbers over which the loop iterates

For loops

Let's say we have three vectors, and we want to use our myMean() function on all of them

x <- c(1, 4, 3, 8, 2)
y <- c(10, 43, 8, 15, 29, -4)
z <- c(1.3, 5.6, 7)

For loops

  • Then seq would be the vector c(1, 2, 3) or equivalently 1:3, because we want to run one function 3 times
  • And we start with var equal to 1
  • Here's what the skeleton of the loop looks like so far
var <- 1
for(var in 1:3){

}

For loops

  • But we need a way to refer to each vector as 1, 2, and 3
  • This is a good place to use the list data type
  • We create a list of the vectors

For loops

vect.list <- list(x, y, z)
vect.list
[[1]]
[1] 1 4 3 8 2

[[2]]
[1] 10 43  8 15 29 -4

[[3]]
[1] 1.3 5.6 7.0

For loops

  • Notice that each vector in the list is named by the numbers 1, 2, and 3 in the double brackets
  • We'll use that as our index for looping the vectors through our function
  • But we also need to collect our means in a vector
  • We create an empty vector for that, the.means <- c()

For loops

the.means <- c()
var <- 1
for(var in 1:3){
  the.means[var] <- myMean(vect.list[[var]])
  var <- var + 1
}

For loops

  • In the first loop, the empty the.means is filled with the first mean value from our function, which calculates the mean of the first vector in the list
  • We also add a 1 to var
  • The for loop runs what is inside the curly braces again, but the second time var has the value 2 and myMean calculates the mean of the second vector in the list

For loops

Our resulting vector of mean values is

the.means
[1]  3.600 16.833  4.633

The lapply() function

The lapply() function

  • Actually, most experienced R programmers would not have used a for loop when presented with our example
  • Applying one function to a list of vectors is an ideal situation for the lapply() function
  • lapply() is one of the (in)famous apply() functions (run ?apply and ?lapply() to see descriptions of the many related functions)

The lapply() function

  • lapply() is a simple function which takes just two arguments
  • However, these apply functions are so different from the way most loops are done in other languages, it's sometimes hard to think about your programming problem in apply-like terms
  • Anyway, the helpfile shows the arguments for lapply as
lapply(X, FUN, ...)

The lapply() function

  • Again, the situation is that you want to apply a function (FUN) to a list of data objects (X)
  • In our situation, we want to apply myMean() to the list vect.list
lapply(vect.list, myMean)
[[1]]
[1] 3.6

[[2]]
[1] 16.83

[[3]]
[1] 4.633

The lapply() function

  • Like any other function, you can assign the output to another variable
  • We can also use the class() function to see what kind of data type the output is
lapplyed.means <- lapply(vect.list, myMean)
class(lapplyed.means)
[1] "list"

The lapply() function

  • We can see that the output is actually another list
  • We can turn the list into a vector by using the function unlist()
vectored.means <- unlist(lapplyed.means)
class(vectored.means)
[1] "numeric"
vectored.means
[1]  3.600 16.833  4.633

Summary

Summary