Source file ⇒ 2017-lec9.Rmd

Today (Programming in R)

  1. Conditionals and Control Flow (Chapter 1 Data Camp’s Intermediate R)
  2. Loops (Chapter 2 Data Camp’s Intermediate R)
  3. Functions (Chapter 3 Data Camp’s Intermediate R)

1. Conditionals and Control Flow

Computing in R consists of sequentially evaluating statements. Flow control structures allow us to control which statements are evaluated and in what order. In R the primary ones consist of

Expressions, such as p <- .5 and sample(c(1, 0), size = 1, prob = c(p, 1-p)) can be grouped together using curly braces “{” and “}”. A group of expressions is called a block. For today’s lecture, the word statement will refer to either a single expression or a block.

The basic syntax for an if/else statement is

if ( condition ) {
  statement1
} else {
  statement2
}

You can also write this on a single line as:

if(condition) statement1 else statement2

First, condition is evaluated. If the first element of the result is TRUE then statement1 is evaluated. If the first element of the result is FALSE then statement2 is evaluated. Only the first element of the result is used. If the result is numeric, 0 is treated as FALSE and any other number as TRUE. If the result is not logical or numeric, or if it is NA, you will get an error.

For example,

a <- c(3==2,5==5,0)
a
## [1] 0 1 0
if(a){
  print("hi")
} else{
  print("bye")
}
## Warning in if (a) {: the condition has length > 1 and only the first
## element will be used
## [1] "bye"

However,

if("pizza"){
  print("hi")
} else{
  print("bye")
}

this will result in an error Error in if ("pizza") { : argument is not interpretable as logical

When we discussed Boolean algebra before, we met the operators & (AND) and | (OR).

Recall that these are both vectorized operators. For example:

x <- c(.2,3.3,.4)
(x < -1 | x > 1)
## [1] FALSE  TRUE FALSE

If/else statements, on the other hand, are based on a single,“global” condition. So we often see constructions using any or all to express something related to the whole vector, like

if(any(x<-1|x>1)) print("hi") else print("bye")
## [1] "hi"
if(all(x < -1|x > 1)) print("hi") else print("bye")
## [1] "bye"

The result of an if/else statement can be assigned. For example, these give the same result:

x <- c(.2,3.3,.4)

y <- if ( any(x <= 0) ) log(1+x) else log(x)
y
## [1] -1.6094379  1.1939225 -0.9162907
if ( any(x <= 0) ) y <- log(1+x) else y <- log(x)
y
## [1] -1.6094379  1.1939225 -0.9162907

Also,the else clause is optional. Another way to do the above is:

if( any(x <= 0) ) x <- 1+x
y <- log(x)
y
## [1] -1.6094379  1.1939225 -0.9162907

However this changes x as well.

If/else statements can be nested.

if (condition1 )
  statement1
else if (condition2)
  statement2
else if (condition3)
  statement3
else
  statement4

The conditions are evaluated, in order, until one evaluates to TRUE. The the associated statement/block is evaluated. The statement in the final else clause is evaluated if none of the conditions evaluates to TRUE.

A note about formatting if/else statements:

When the if statement is not in a block, the else (if present) must appear on the same line as statement1 or immediately following the closing brace. For example,

if (condition) {statement1}
else {statement2}

will be an error. For example

if(3==2){ print("hi")}
else {print("bye")}

gives an error Error: unexpected 'else' in "else"

I suggest using the format

if (condition) {
  statement1
} else {
  statement2
}

Some common uses of if/else clauses

  1. With logical arguments to tell a function what to do
corplot <-  function(x,y,plotit=TRUE){
  df=data.frame(var1=x,var2=y)
  p <-  df %>% ggplot(aes(x=var1,y=var2)) + geom_point(size=3) +theme(text=element_text(size=25))
  if(plotit) print(p)
  cor(x,y)
}

corplot(runif(n=100,min=0,max=1),runif(n=100,min=0,max=1))

## [1] 0.04544766
  1. To verify that the arguments of a function are as expected
m=matrix(c(1,2,3,4),nrow=2)

#myfunc <- function(m){      #myfunc is a function taking a matrix as an argument 
if( !is.matrix(m)){
  stop("m must be a matrix")
} else{
  print("hi")
}
## [1] "hi"
#}

but

m=c(1,2,3)
#myfunc <- function(m){    #myfunc is a function taking a matrix as an argument 
if( !is.matrix(m)){
  stop("m must be a matrix")
  print("hi")
}
#}

will throw an error Error: m must be a matrix

  1. To handle common numerical errors
y <- 3
x <- 0
ratio <- if(x!=0) y/x else NA
ratio
## [1] NA
  1. In general, to control which block of code is executed
normt <- function(n, dist){
  if ( dist == "normal" ){
    return( rnorm(n) )
  } else if (dist == "t"){
    return(rt(n, df = 1))
  } else stop("distribution not implemented")
}
normt(10,"t")
##  [1]  1.5966448 19.7773382 -2.0911095 -1.4787619  1.4109150 -2.8416535
##  [7] -1.4607166  0.1385624  1.0839467  1.3603078

These if/else constructions are useful for global tests, not tests applied to individual elements of a vector.

ifelse() function

However, there is a vectorized function called ifelse

args(ifelse)
## function (test, yes, no) 
## NULL

Here test is an R object that can be coerced to logical. yes and no are R objects of the same size as test

y <- 2
x <- c(-1,0,2,5)
y/x
## [1] -2.0  Inf  1.0  0.4
ifelse(x!=0, y/x,NA)
## [1] -2.0   NA  1.0  0.4

For each element of test, the corresponding element of yes is returned if the element is TRUE, and the corresponding element of no is returned if it is FALSE.’

Here is an example from HW 3:

ADM_RATE CONTROL
477 0.5579 2
948 0.5646 2
531 0.4860 2
my_Scorecard<- my_Scorecard %>% mutate(CONTROL = ifelse(CONTROL==1, "Public", "Private"))

my_Scorecard[random_vec,]
ADM_RATE CONTROL
477 0.5579 Private
948 0.5646 Private
531 0.4860 Private

In class exercise

Please do the first question (conditional statements)

http://gandalf.berkeley.edu:3838/alucas/Lecture-09-collection/

2. Loops

Looping is the repeated evaluation of a statement or block of statements.

Much of what is handled using loops in other languages can be more efficiently handled in R using vectorized calculations or one of the apply mechanisms.

However, certain algorithms, such as those requiring recursion, can only be handled by loops.

There are two main looping constructs in R: for and while.

for loops

A for loop repeats a statement or block of statements a predefined number of times.

The syntax in R is

for ( name in vector ){
  statement
}

For each element in vector, the variable name is set to the value of that element and statement is evaluated.

vector often contains integers, but can be any valid type.

Examples:

for( word in c("my","name","is","adam")){
  print(word)
}
## [1] "my"
## [1] "name"
## [1] "is"
## [1] "adam"

or

xseq <- seq(-2.5, 2.5, length = 100) 
yseq <- xseq^2
df <- data.frame(xseq,yseq) 
p <- df %>% ggplot(aes(x=xseq, y=yseq)) + geom_line(stat="identity", col="red")

for(i in -2:2){
 p <- p + geom_point(x=i,y=2,col="blue",size=3)
}
p

while loops

A while loop repeats a statement or block of statements for as many times as a particular condition is TRUE.

The syntax in R is

while (condition){
  statement
}

condition is evaluated, and if it is TRUE, statement is evaluated. This process continues until condition evaluates to FALSE.

example

The expression,

sample(c(1, 0), size = 1, prob = c(p, 1-p))

simulates a random coin flip, where the coin has probability p of coming up heads, represented by a 1. For p=.5 simulate flipping a coin until 20 of heads are obtained. Produce the vector of 0 and 1.

p <- .5
count <- 0
flips <- c()
while(count<20){
  flip <-  sample(c(1, 0), size = 1, prob = c(p, 1-p))
  flips <- c(flips,flip)
  count <- count + flip
}
flips
##  [1] 1 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1
## [36] 0 1 1 1 1

The break statement causes a loop to exit. This is particularly useful with while loops, which, if we’re not careful, might loop indefinitely (or until we kill R).

Here is an example:

# Simulate steps for random walk to cross threshold = 10
max.iter <- 10000
x <- 0
steps <- 0
mywalk=c()
while(x < 10){
  x <- x + sample(c(-1, 1), 1)
  steps <- steps + 1
  mywalk <- c(mywalk,x)
  if(steps == max.iter){
    warning("Maximum iteration reached")
    break }
}
mywalk %>% head(100)
##  [1] -1 -2 -1  0  1  2  1  2  3  2  1  2  3  4  5  6  5  4  5  4  3  4  5
## [24]  6  7  8  9  8  9  8  9 10

In class exercise

do the second problem (loops)

http://gandalf.berkeley.edu:3838/alucas/Lecture-09-collection/

3. Functions

Functions are one of the most important constructs in R (and many other languages). They allow you to modularize your code - encapsulating a set of repeatable operations as an individual function call.

For another example, R has a function var that computes the unbiased estimate of variance, or sample variance, usually denoted \(s^2\). Suppose I need to repeatedly compute the maximum likelihood estimator (MLE) of variance \[\hat{\sigma}^2=\frac{1}{n}\sum_{1}^{n}(x_i-\overline{x})^2=\frac{n-1}{n}s^2\] instead of s^2.

myvar <- function(x){
  if( !(is.vector(x))) { 
    stop("x must be a vector")}
    n <- length(x)
    return((n-1)*var(x)/n)
}
myvar(c(1,2,3,4))
## [1] 1.25
var(c(1,2,3,4))
## [1] 1.666667

For another example I might want to have a function that simulates rolling a die. Analyze this code line by line with a neighbor.

is.whole <- function(x) is.numeric(x) && floor(x)==x


die_rolling_simulation <- function(n=10){
  if( !(is.whole(n) & n>0)) { 
    stop("n must be natural number")}
  myrolls <- c()
  for(i in 1:n){
    roll <- sample(1:6,1)
    myrolls <- c(myrolls,roll)
  }
  return(myrolls)
}

die_rolling_simulation()
##  [1] 2 3 5 1 1 1 6 1 5 5

You should rely heavily on functions rather than having long sets of expressions in R scripts.

Functions have many important advantages:

A basic goal in writing functions is modularity.

In general, a function should

Anatomy of a function

The syntax for writing a function is

function (arglist) body

Typically we assign the function to a particular name.

myfunc <- function (arglist) body

The keyword function just tells R that you want to create a function.

Recall that the arguments to a function are its inputs, which may have default values. For example the arguments of the substring() function are:

args(substring)
## function (text, first, last = 1000000L) 
## NULL

for example:

substring("abcdef",2,4)
## [1] "bcd"

Here, if we do not explicitly specify last when we call substring, it will be assigned the default value of 1e+06, which is very large. (Why do you think this was chosen?)

A few notes on writing the arguments list.

When you’re writing your own function, it’s good practice to put the most important arguments first. Often these will not have default values.

This allows the user of your function to easily specify the arguments by position. For example:

mtcars %>% ggplot(mpg, wt)

rather than

 mtcars %>% ggplot(x = mpg, y = wt)

Next we have the body of the function, which typically consists of expressions surrounded by curly brackets. Think of these as performing some operations on the input values given by the arguments.

{
    expression 1  
    expression 2
    return(value)
}

The return expression hands control back to the caller of the function and returns a given value. If the function returns more than one thing, this is done using a named list, for example

stats <- function(x){
  if( !(is.vector(x))) { 
    stop("x must be a vector")}
  return(list(total=sum(x), avg=mean(x)))
}

stats(c(1,2,3))
## $total
## [1] 6
## 
## $avg
## [1] 2

In the absence of a return expression, a function will return the last evaluated expression. This is particularly common if the function is short. For example, I could write the simple function:

sumofsquares <- function(x, y) sum(x^2+y^2)

sumofsquares(2,3)
## [1] 13

Here I don’t even need brackets {}, since there is only one expression.

A return expression anywhere in the function will cause the function to return control to the user immediately, without evaluating the rest of the function. This is often used in conjunction with if statements. For example:

normt <- function(n, dist){
  if ( dist == "normal" ){
    return( rnorm(n) )
  } else if (dist == "t"){
    return(rt(n, df = 1, ncp = 0))
  } else stop("distribution not implemented")
}
normt(10,"t")
##  [1]  0.77897788 -0.04010573  1.02454517 -0.65281212 -0.61048674
##  [6] -0.49889556 13.87764177  1.42182713  0.39178274 -3.81849141

in class exercise

Please do the last exercise (functions) http://gandalf.berkeley.edu:3838/alucas/Lecture-09-collection/

i-clicker

answ: There is no output

myfunc <-  function(fx=function(x){x^2}, plotit=TRUE){
  xseq <- seq(-1, 1, length = 100)
  yseq <- fx(xseq)
  df <- data.frame(xseq,yseq)
  p <- df %>% ggplot(aes(x=xseq, y=yseq)) + geom_line(stat="identity", col="red")
  if(plotit){
    print(p)
    }
}

myfunc(function(x){x}, FALSE)

Note:

sum(c(1,2,3,NA))
## [1] NA
median(c(1,2,3,NA))
## [1] NA

answ: NA

MAD <- function(x,na.rm=FALSE){
 if(na.rm){
   x=x[!is.na(x)]
 } 
  return(median(abs(x-median(x))))
}

MAD(c(1,2,3,NA))
## [1] NA