Source file ⇒ 2017-lec9.Rmd
Computing in R consists of sequentially evaluating statements. Flow control structures allow us to control which statements are evaluated and in what order. In R the primary ones consist of
Expressions, such as p <- .5
and sample(c(1, 0), size = 1, prob = c(p, 1-p))
can be grouped together using curly braces “{” and “}”. A group of expressions is called a block. For today’s lecture, the word statement will refer to either a single expression or a block.
The basic syntax for an if/else
statement is
if ( condition ) {
statement1
} else {
statement2
}
You can also write this on a single line as:
if(condition) statement1 else statement2
First, condition
is evaluated. If the first element of the result is TRUE
then statement1 is evaluated. If the first element of the result is FALSE
then statement2 is evaluated. Only the first element of the result is used. If the result is numeric, 0 is treated as FALSE and any other number as TRUE. If the result is not logical or numeric, or if it is NA, you will get an error.
For example,
a <- c(3==2,5==5,0)
a
## [1] 0 1 0
if(a){
print("hi")
} else{
print("bye")
}
## Warning in if (a) {: the condition has length > 1 and only the first
## element will be used
## [1] "bye"
However,
if("pizza"){
print("hi")
} else{
print("bye")
}
this will result in an error Error in if ("pizza") { : argument is not interpretable as logical
When we discussed Boolean algebra before, we met the operators & (AND) and | (OR).
Recall that these are both vectorized operators. For example:
x <- c(.2,3.3,.4)
(x < -1 | x > 1)
## [1] FALSE TRUE FALSE
If/else statements, on the other hand, are based on a single,“global” condition. So we often see constructions using any
or all
to express something related to the whole vector, like
if(any(x<-1|x>1)) print("hi") else print("bye")
## [1] "hi"
if(all(x < -1|x > 1)) print("hi") else print("bye")
## [1] "bye"
The result of an if/else statement can be assigned. For example, these give the same result:
x <- c(.2,3.3,.4)
y <- if ( any(x <= 0) ) log(1+x) else log(x)
y
## [1] -1.6094379 1.1939225 -0.9162907
if ( any(x <= 0) ) y <- log(1+x) else y <- log(x)
y
## [1] -1.6094379 1.1939225 -0.9162907
Also,the else clause is optional. Another way to do the above is:
if( any(x <= 0) ) x <- 1+x
y <- log(x)
y
## [1] -1.6094379 1.1939225 -0.9162907
However this changes x as well.
If/else statements can be nested.
if (condition1 )
statement1
else if (condition2)
statement2
else if (condition3)
statement3
else
statement4
The conditions are evaluated, in order, until one evaluates to TRUE
. The the associated statement/block is evaluated. The statement in the final else clause is evaluated if none of the conditions evaluates to TRUE
.
A note about formatting if/else statements:
When the if statement is not in a block, the else
(if present) must appear on the same line as statement1
or immediately following the closing brace. For example,
if (condition) {statement1}
else {statement2}
will be an error. For example
if(3==2){ print("hi")}
else {print("bye")}
gives an error Error: unexpected 'else' in "else"
I suggest using the format
if (condition) {
statement1
} else {
statement2
}
corplot <- function(x,y,plotit=TRUE){
df=data.frame(var1=x,var2=y)
p <- df %>% ggplot(aes(x=var1,y=var2)) + geom_point(size=3) +theme(text=element_text(size=25))
if(plotit) print(p)
cor(x,y)
}
corplot(runif(n=100,min=0,max=1),runif(n=100,min=0,max=1))
## [1] 0.04544766
m=matrix(c(1,2,3,4),nrow=2)
#myfunc <- function(m){ #myfunc is a function taking a matrix as an argument
if( !is.matrix(m)){
stop("m must be a matrix")
} else{
print("hi")
}
## [1] "hi"
#}
but
m=c(1,2,3)
#myfunc <- function(m){ #myfunc is a function taking a matrix as an argument
if( !is.matrix(m)){
stop("m must be a matrix")
print("hi")
}
#}
will throw an error Error: m must be a matrix
y <- 3
x <- 0
ratio <- if(x!=0) y/x else NA
ratio
## [1] NA
normt <- function(n, dist){
if ( dist == "normal" ){
return( rnorm(n) )
} else if (dist == "t"){
return(rt(n, df = 1))
} else stop("distribution not implemented")
}
normt(10,"t")
## [1] 1.5966448 19.7773382 -2.0911095 -1.4787619 1.4109150 -2.8416535
## [7] -1.4607166 0.1385624 1.0839467 1.3603078
These if/else constructions are useful for global tests, not tests applied to individual elements of a vector.
However, there is a vectorized function called ifelse
args(ifelse)
## function (test, yes, no)
## NULL
Here test
is an R object that can be coerced to logical. yes
and no
are R objects of the same size as test
y <- 2
x <- c(-1,0,2,5)
y/x
## [1] -2.0 Inf 1.0 0.4
ifelse(x!=0, y/x,NA)
## [1] -2.0 NA 1.0 0.4
For each element of test
, the corresponding element of yes
is returned if the element is TRUE
, and the corresponding element of no
is returned if it is FALSE
.’
Here is an example from HW 3:
ADM_RATE | CONTROL | |
---|---|---|
477 | 0.5579 | 2 |
948 | 0.5646 | 2 |
531 | 0.4860 | 2 |
my_Scorecard<- my_Scorecard %>% mutate(CONTROL = ifelse(CONTROL==1, "Public", "Private"))
my_Scorecard[random_vec,]
ADM_RATE | CONTROL | |
---|---|---|
477 | 0.5579 | Private |
948 | 0.5646 | Private |
531 | 0.4860 | Private |
Please do the first question (conditional statements)
http://gandalf.berkeley.edu:3838/alucas/Lecture-09-collection/
Looping is the repeated evaluation of a statement or block of statements.
Much of what is handled using loops in other languages can be more efficiently handled in R using vectorized calculations or one of the apply
mechanisms.
However, certain algorithms, such as those requiring recursion, can only be handled by loops.
There are two main looping constructs in R: for
and while
.
A for loop repeats a statement or block of statements a predefined number of times.
The syntax in R is
for ( name in vector ){
statement
}
For each element in vector
, the variable name
is set to the value of that element and statement
is evaluated.
vector often contains integers, but can be any valid type.
Examples:
for( word in c("my","name","is","adam")){
print(word)
}
## [1] "my"
## [1] "name"
## [1] "is"
## [1] "adam"
or
xseq <- seq(-2.5, 2.5, length = 100)
yseq <- xseq^2
df <- data.frame(xseq,yseq)
p <- df %>% ggplot(aes(x=xseq, y=yseq)) + geom_line(stat="identity", col="red")
for(i in -2:2){
p <- p + geom_point(x=i,y=2,col="blue",size=3)
}
p
A while loop repeats a statement or block of statements for as many times as a particular condition is TRUE
.
The syntax in R is
while (condition){
statement
}
condition
is evaluated, and if it is TRUE
, statement is evaluated. This process continues until condition evaluates to FALSE
.
The expression,
sample(c(1, 0), size = 1, prob = c(p, 1-p))
simulates a random coin flip, where the coin has probability p of coming up heads, represented by a 1. For p=.5 simulate flipping a coin until 20 of heads are obtained. Produce the vector of 0 and 1.
p <- .5
count <- 0
flips <- c()
while(count<20){
flip <- sample(c(1, 0), size = 1, prob = c(p, 1-p))
flips <- c(flips,flip)
count <- count + flip
}
flips
## [1] 1 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1
## [36] 0 1 1 1 1
The break statement causes a loop to exit. This is particularly useful with while loops, which, if we’re not careful, might loop indefinitely (or until we kill R).
Here is an example:
# Simulate steps for random walk to cross threshold = 10
max.iter <- 10000
x <- 0
steps <- 0
mywalk=c()
while(x < 10){
x <- x + sample(c(-1, 1), 1)
steps <- steps + 1
mywalk <- c(mywalk,x)
if(steps == max.iter){
warning("Maximum iteration reached")
break }
}
mywalk %>% head(100)
## [1] -1 -2 -1 0 1 2 1 2 3 2 1 2 3 4 5 6 5 4 5 4 3 4 5
## [24] 6 7 8 9 8 9 8 9 10
do the second problem (loops)
http://gandalf.berkeley.edu:3838/alucas/Lecture-09-collection/
Functions are one of the most important constructs in R (and many other languages). They allow you to modularize your code - encapsulating a set of repeatable operations as an individual function call.
For another example, R has a function var
that computes the unbiased estimate of variance, or sample
variance, usually denoted \(s^2\). Suppose I need to repeatedly compute the maximum likelihood estimator (MLE) of variance \[\hat{\sigma}^2=\frac{1}{n}\sum_{1}^{n}(x_i-\overline{x})^2=\frac{n-1}{n}s^2\] instead of s^2
.
myvar <- function(x){
if( !(is.vector(x))) {
stop("x must be a vector")}
n <- length(x)
return((n-1)*var(x)/n)
}
myvar(c(1,2,3,4))
## [1] 1.25
var(c(1,2,3,4))
## [1] 1.666667
For another example I might want to have a function that simulates rolling a die. Analyze this code line by line with a neighbor.
is.whole <- function(x) is.numeric(x) && floor(x)==x
die_rolling_simulation <- function(n=10){
if( !(is.whole(n) & n>0)) {
stop("n must be natural number")}
myrolls <- c()
for(i in 1:n){
roll <- sample(1:6,1)
myrolls <- c(myrolls,roll)
}
return(myrolls)
}
die_rolling_simulation()
## [1] 2 3 5 1 1 1 6 1 5 5
You should rely heavily on functions rather than having long sets of expressions in R scripts.
Functions have many important advantages:
A basic goal in writing functions is modularity.
In general, a function should
The syntax for writing a function is
function (arglist) body
Typically we assign the function to a particular name.
myfunc <- function (arglist) body
The keyword function
just tells R that you want to create a function.
Recall that the arguments to a function are its inputs, which may have default values. For example the arguments of the substring()
function are:
args(substring)
## function (text, first, last = 1000000L)
## NULL
for example:
substring("abcdef",2,4)
## [1] "bcd"
Here, if we do not explicitly specify last
when we call substring, it will be assigned the default value of 1e+06, which is very large. (Why do you think this was chosen?)
A few notes on writing the arguments list.
When you’re writing your own function, it’s good practice to put the most important arguments first. Often these will not have default values.
This allows the user of your function to easily specify the arguments by position. For example:
mtcars %>% ggplot(mpg, wt)
rather than
mtcars %>% ggplot(x = mpg, y = wt)
Next we have the body of the function, which typically consists of expressions surrounded by curly brackets. Think of these as performing some operations on the input values given by the arguments.
{
expression 1
expression 2
return(value)
}
The return
expression hands control back to the caller of the function and returns a given value. If the function returns more than one thing, this is done using a named list, for example
stats <- function(x){
if( !(is.vector(x))) {
stop("x must be a vector")}
return(list(total=sum(x), avg=mean(x)))
}
stats(c(1,2,3))
## $total
## [1] 6
##
## $avg
## [1] 2
In the absence of a return
expression, a function will return the last evaluated expression. This is particularly common if the function is short. For example, I could write the simple function:
sumofsquares <- function(x, y) sum(x^2+y^2)
sumofsquares(2,3)
## [1] 13
Here I don’t even need brackets {}, since there is only one expression.
A return
expression anywhere in the function will cause the function to return control to the user immediately, without evaluating the rest of the function. This is often used in conjunction with if
statements. For example:
normt <- function(n, dist){
if ( dist == "normal" ){
return( rnorm(n) )
} else if (dist == "t"){
return(rt(n, df = 1, ncp = 0))
} else stop("distribution not implemented")
}
normt(10,"t")
## [1] 0.77897788 -0.04010573 1.02454517 -0.65281212 -0.61048674
## [6] -0.49889556 13.87764177 1.42182713 0.39178274 -3.81849141
Please do the last exercise (functions) http://gandalf.berkeley.edu:3838/alucas/Lecture-09-collection/
answ: There is no output
myfunc <- function(fx=function(x){x^2}, plotit=TRUE){
xseq <- seq(-1, 1, length = 100)
yseq <- fx(xseq)
df <- data.frame(xseq,yseq)
p <- df %>% ggplot(aes(x=xseq, y=yseq)) + geom_line(stat="identity", col="red")
if(plotit){
print(p)
}
}
myfunc(function(x){x}, FALSE)
Note:
sum(c(1,2,3,NA))
## [1] NA
median(c(1,2,3,NA))
## [1] NA
answ: NA
MAD <- function(x,na.rm=FALSE){
if(na.rm){
x=x[!is.na(x)]
}
return(median(abs(x-median(x))))
}
MAD(c(1,2,3,NA))
## [1] NA