Source file ⇒ lec15.Rmd

Today

  1. Finish loops (chapter 2 in Data Camp’s Intermediate R)
  2. Functions (chapter 3 in Data Camp’s Intermediate R)

1. Loops

Recall the syntax or for and while loops:

for ( name in vector ){
  statement
}
while (condition){
  statement
}

Task for you:

Simulate rolling a die until you get a 4. Create a vector of your rolls.

Hint: roll <- sample(1:6,1) simulates rolling a die.

myrolls <- c()
roll <- 1000 #some number not 1 through 6
while(roll != 4){
  roll <- sample(1:6,1)
  myrolls=c(myrolls,roll)
}
myrolls
## [1] 1 3 1 1 1 6 4

The break statement causes a loop to exit. Example:

# Pre-defined variables
rquote <- "R's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
chars
##  [1] "R" "'" "s" " " "i" "n" "t" "e" "r" "n" "a" "l" "s" " " "a" "r" "e"
## [18] " " "i" "r" "r" "e" "f" "u" "t" "a" "b" "l" "y" " " "i" "n" "t" "r"
## [35] "i" "g" "u" "i" "n" "g"

Count the number of “r”s (both “r” and “R”) that come before the first letter “u” (both “u” and “U”) in the rquote character string. Store the result in a variable rcount.

rcount <- 0
for (char in chars) {
  if (char == "u" | char == "U") {
    break
  }
  if (char == "r" | char == "R") {
    rcount <- rcount + 1
  }
}

# Print the resulting rcount variable to the console
rcount
## [1] 5

The break statement causes a loop to exit. This is particularly useful with while loops, which, if we’re not careful, might loop indefinitely (or until we kill R).

Here is an example:

# Simulate steps for random walk to cross threshold = 10
max.iter <- 10000
x <- 0
steps <- 0
mywalk=c()
while(x < 10){
  x <- x + sample(c(-1, 1), 1)
  steps <- steps + 1
  mywalk=c(mywalk,x)
  if(steps == max.iter){
    warning("Maximum iteration reached")
    break }
}
mywalk %>% head(100)
##   [1]  -1  -2  -3  -2  -3  -4  -3  -2  -1  -2  -3  -2  -3  -4  -3  -4  -5
##  [18]  -6  -7  -6  -5  -6  -7  -8  -9  -8  -7  -8  -9 -10 -11 -10  -9 -10
##  [35]  -9  -8  -9  -8  -7  -6  -5  -6  -5  -6  -7  -6  -5  -6  -5  -6  -7
##  [52]  -6  -7  -6  -7  -8  -9  -8  -9  -8  -9 -10  -9 -10  -9 -10 -11 -12
##  [69] -11 -12 -13 -14 -13 -14 -15 -16 -15 -14 -15 -16 -17 -18 -17 -16 -15
##  [86] -14 -13 -14 -13 -14 -13 -12 -11 -10  -9 -10 -11 -12 -11 -10

2. Functions

Functions are one of the most important constructs in R (and many other languages). They allow you to modularize your code - encapsulating a set of repeatable operations as an individual function call.

For example I might want to have a function that simulates rolling a die. Analyze this code line by line with a neighbor.

die_rolling_simulation <- function(n=10){
  if( !(is.wholenumber(n) & n>0)) { 
    stop("n must be natural number")}
  myrolls=c()
  for(i in 1:n){
    roll <- sample(1:6,1)
    myrolls=c(myrolls,roll)
  }
  return(myrolls)
}

die_rolling_simulation(20)
##  [1] 4 1 5 2 4 2 3 1 5 4 6 1 6 6 6 5 6 3 1 5

For another example, R has a function var that computes the unbiased estimate of variance, or sample variance, usually denoted \(s^2\). Suppose I need to repeatedly compute the maximum likelihood estimator (MLE) of variance \[\hat{\sigma}^2=\frac{1}{n}\sum_{1}^{n}(x_i-\overline{x})^2=\frac{n-1}{n}s^2\] instead of s^2.

myvar <- function(x){
  if( !(is.vector(x))) { 
    stop("x must be a vector")}
    n <- length(x)
    return((n-1)*var(x)/n)
}
myvar(c(1,2,3,4))
## [1] 1.25
var(c(1,2,3,4))
## [1] 1.666667

You should rely heavily on functions rather than having long sets of expressions in R scripts.

Functions have many important advantages:

A basic goal in writing functions is modularity.

In general, a function should

Anatomy of a function

The syntax for writing a function is

function (arglist) body

Typically we assign the function to a particular name.

myfunc <- function (arglist) body

The keyword function just tells R that you want to create a function.

Recall that the arguments to a function are its inputs, which may have default values. For example the arguments of the substring() function are:

args(substring)
## function (text, first, last = 1000000L) 
## NULL

for example:

substring("abcdef",2,4)
## [1] "bcd"

Here, if we do not explicitly specify last when we call substring, it will be assigned the default value of 1e+06, which is very large. (Why do you think this was chosen?)

A few notes on writing the arguments list.

When you’re writing your own function, it’s good practice to put the most important arguments first. Often these will not have default values.

This allows the user of your function to easily specify the arguments by position. For example:

mtcars %>% ggplot(mpg, wt)

rather than

 mtcars %>% ggplot(x = mpg, y = wt)

Next we have the body of the function, which typically consists of expressions surrounded by curly brackets. Think of these as performing some operations on the input values given by the arguments.

{
    expression 1  
    expression 2
    return(value)
}

The return expression hands control back to the caller of the function and returns a given value. If the function returns more than one thing, this is done using a named list, for example

stats <- function(x){
  if( !(is.vector(x))) { 
    stop("x must be a vector")}
  return(list(total=sum(x), avg=mean(x)))
}

stats(c(1,2,3))
## $total
## [1] 6
## 
## $avg
## [1] 2

In the absence of a return expression, a function will return the last evaluated expression. This is particularly common if the function is short. For example, I could write the simple function:

sumofsquares <- function(x, y) sum(x^2+y^2)

sumofsquares(2,3)
## [1] 13

Here I don’t even need brackets {}, since there is only one expression.

A return expression anywhere in the function will cause the function to return control to the user immediately, without evaluating the rest of the function. This is often used in conjunction with if statements. For example:

normt <- function(n, dist){
  if ( dist == "normal" ){
    return( rnorm(n) )
  } else if (dist == "t"){
    return(rt(n, df = 1, ncp = 0))
  } else stop("distribution not implemented")
}
normt(10,"t")
##  [1]  -0.16633595  -3.51477122   0.84074802   0.09909053 -13.98834171
##  [6]   1.51017102   0.49864128   0.71675840  -0.65529783   0.99753964

A task for you

Write an R function that will take an input vector and set any negative values in the vector to zero.

nonneg.vec <- function(x){
  if( !(is.vector(x))) { 
    stop("x must be a vector")}
  ifelse(x>=0, x, 0) 
}
nonneg.vec(c(-2,1,3,-5))
## [1] 0 1 3 0

i-clicker

myfunc <-  function(fx=function(x){x^2}, plotit=TRUE){
  xseq=seq(-1, 1, length = 100)
  yseq=fx(xseq)
  df <- data.frame(xseq,yseq)
  p <- df %>% ggplot(aes(x=xseq, y=yseq)) + geom_line(stat="identity", col="red")
  if(plotit){
    print(p)
    }
}

myfunc(function(x){x}, FALSE)