Source file ⇒ lec15.Rmd
Recall the syntax or for
and while
loops:
for ( name in vector ){
statement
}
while (condition){
statement
}
Simulate rolling a die until you get a 4. Create a vector of your rolls.
Hint: roll <- sample(1:6,1) simulates rolling a die.
myrolls <- c()
roll <- 1000 #some number not 1 through 6
while(roll != 4){
roll <- sample(1:6,1)
myrolls=c(myrolls,roll)
}
myrolls
## [1] 1 3 1 1 1 6 4
The break
statement causes a loop to exit. Example:
# Pre-defined variables
rquote <- "R's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
chars
## [1] "R" "'" "s" " " "i" "n" "t" "e" "r" "n" "a" "l" "s" " " "a" "r" "e"
## [18] " " "i" "r" "r" "e" "f" "u" "t" "a" "b" "l" "y" " " "i" "n" "t" "r"
## [35] "i" "g" "u" "i" "n" "g"
Count the number of “r”s (both “r” and “R”) that come before the first letter “u” (both “u” and “U”) in the rquote character string. Store the result in a variable rcount.
rcount <- 0
for (char in chars) {
if (char == "u" | char == "U") {
break
}
if (char == "r" | char == "R") {
rcount <- rcount + 1
}
}
# Print the resulting rcount variable to the console
rcount
## [1] 5
The break statement causes a loop to exit. This is particularly useful with while loops, which, if we’re not careful, might loop indefinitely (or until we kill R).
Here is an example:
# Simulate steps for random walk to cross threshold = 10
max.iter <- 10000
x <- 0
steps <- 0
mywalk=c()
while(x < 10){
x <- x + sample(c(-1, 1), 1)
steps <- steps + 1
mywalk=c(mywalk,x)
if(steps == max.iter){
warning("Maximum iteration reached")
break }
}
mywalk %>% head(100)
## [1] -1 -2 -3 -2 -3 -4 -3 -2 -1 -2 -3 -2 -3 -4 -3 -4 -5
## [18] -6 -7 -6 -5 -6 -7 -8 -9 -8 -7 -8 -9 -10 -11 -10 -9 -10
## [35] -9 -8 -9 -8 -7 -6 -5 -6 -5 -6 -7 -6 -5 -6 -5 -6 -7
## [52] -6 -7 -6 -7 -8 -9 -8 -9 -8 -9 -10 -9 -10 -9 -10 -11 -12
## [69] -11 -12 -13 -14 -13 -14 -15 -16 -15 -14 -15 -16 -17 -18 -17 -16 -15
## [86] -14 -13 -14 -13 -14 -13 -12 -11 -10 -9 -10 -11 -12 -11 -10
Functions are one of the most important constructs in R (and many other languages). They allow you to modularize your code - encapsulating a set of repeatable operations as an individual function call.
For example I might want to have a function that simulates rolling a die. Analyze this code line by line with a neighbor.
die_rolling_simulation <- function(n=10){
if( !(is.wholenumber(n) & n>0)) {
stop("n must be natural number")}
myrolls=c()
for(i in 1:n){
roll <- sample(1:6,1)
myrolls=c(myrolls,roll)
}
return(myrolls)
}
die_rolling_simulation(20)
## [1] 4 1 5 2 4 2 3 1 5 4 6 1 6 6 6 5 6 3 1 5
For another example, R has a function var
that computes the unbiased estimate of variance, or sample
variance, usually denoted \(s^2\). Suppose I need to repeatedly compute the maximum likelihood estimator (MLE) of variance \[\hat{\sigma}^2=\frac{1}{n}\sum_{1}^{n}(x_i-\overline{x})^2=\frac{n-1}{n}s^2\] instead of s^2
.
myvar <- function(x){
if( !(is.vector(x))) {
stop("x must be a vector")}
n <- length(x)
return((n-1)*var(x)/n)
}
myvar(c(1,2,3,4))
## [1] 1.25
var(c(1,2,3,4))
## [1] 1.666667
You should rely heavily on functions rather than having long sets of expressions in R scripts.
Functions have many important advantages:
A basic goal in writing functions is modularity.
In general, a function should
The syntax for writing a function is
function (arglist) body
Typically we assign the function to a particular name.
myfunc <- function (arglist) body
The keyword function
just tells R that you want to create a function.
Recall that the arguments to a function are its inputs, which may have default values. For example the arguments of the substring()
function are:
args(substring)
## function (text, first, last = 1000000L)
## NULL
for example:
substring("abcdef",2,4)
## [1] "bcd"
Here, if we do not explicitly specify last
when we call substring, it will be assigned the default value of 1e+06, which is very large. (Why do you think this was chosen?)
A few notes on writing the arguments list.
When you’re writing your own function, it’s good practice to put the most important arguments first. Often these will not have default values.
This allows the user of your function to easily specify the arguments by position. For example:
mtcars %>% ggplot(mpg, wt)
rather than
mtcars %>% ggplot(x = mpg, y = wt)
Next we have the body of the function, which typically consists of expressions surrounded by curly brackets. Think of these as performing some operations on the input values given by the arguments.
{
expression 1
expression 2
return(value)
}
The return
expression hands control back to the caller of the function and returns a given value. If the function returns more than one thing, this is done using a named list, for example
stats <- function(x){
if( !(is.vector(x))) {
stop("x must be a vector")}
return(list(total=sum(x), avg=mean(x)))
}
stats(c(1,2,3))
## $total
## [1] 6
##
## $avg
## [1] 2
In the absence of a return
expression, a function will return the last evaluated expression. This is particularly common if the function is short. For example, I could write the simple function:
sumofsquares <- function(x, y) sum(x^2+y^2)
sumofsquares(2,3)
## [1] 13
Here I don’t even need brackets {}, since there is only one expression.
A return
expression anywhere in the function will cause the function to return control to the user immediately, without evaluating the rest of the function. This is often used in conjunction with if
statements. For example:
normt <- function(n, dist){
if ( dist == "normal" ){
return( rnorm(n) )
} else if (dist == "t"){
return(rt(n, df = 1, ncp = 0))
} else stop("distribution not implemented")
}
normt(10,"t")
## [1] -0.16633595 -3.51477122 0.84074802 0.09909053 -13.98834171
## [6] 1.51017102 0.49864128 0.71675840 -0.65529783 0.99753964
Write an R function that will take an input vector and set any negative values in the vector to zero.
nonneg.vec <- function(x){
if( !(is.vector(x))) {
stop("x must be a vector")}
ifelse(x>=0, x, 0)
}
nonneg.vec(c(-2,1,3,-5))
## [1] 0 1 3 0
myfunc <- function(fx=function(x){x^2}, plotit=TRUE){
xseq=seq(-1, 1, length = 100)
yseq=fx(xseq)
df <- data.frame(xseq,yseq)
p <- df %>% ggplot(aes(x=xseq, y=yseq)) + geom_line(stat="identity", col="red")
if(plotit){
print(p)
}
}
myfunc(function(x){x}, FALSE)