Week 3: R Studio, Control flows and Functions

RStudio interface - page 40

From an R script in RStudio (a file with extension .R), you can send code for execution.
Use the ‘Run’ arrow execute the commands that have been selected.
The key combination CTRL+ENTER can also used to execute the commands that have been selected.
Find the following windows in R Studio. What is their purpose?
1. console
2. environment
3. history
4. plots
5. package
6. help

Control Flow - Introduction

Control flows connect several building blocks of your code together

A loop can be used to repeat the same portion of code (or block of code) for a number of times
In an algorithm, logical operations are used to decide whether a loop should continue or not
An ‘if’ statement is used to execute a command if some condition is satisfied
A ‘for’ loop is used to repeat a building block a pre-determined number of times
A ‘while’ loop is used to repeat a building block until some condition fails

Note: We will come back to the notion of ‘Algorithm’ in much more details in Week 7.

Control Flow - Instructions if and else - Page 117

x <- 3
y <- 2
if (x<=y) {
  print("x smaller than y")
  } else {
  print("x larger than y")
}

## [1] "x larger than y"

Control Flow - Instructions for - Page 119

A for loop has the general syntax (m,n are integers):

for (i in m:n){}

It repeats whatever commands are inside brackets {} for all values of i starting at i=m up to i=n. Note i is just a name here (it can be anything).

x <- c(1,3,7,2)
for (i in 1:4){print(x[i])}

## [1] 1
## [1] 3
## [1] 7
## [1] 2

for (k in 2:4){
  x[k-1] <- x[k]
}
x

## [1] 3 7 2 2

Exercise: For loop

Suppose you open a new bank account. At time $0$, you deposit $\$500$ into your account. Your account earns $i\%$ per annum of interest in year $i$ (i.e. $1\%$ in the first year, $2\%$ in the second year, etc.)

What would be your account balance at the end of the 10th year? Use a ‘for’ loop in your solution.

Solution

balance <- 500
for (i in 1:10) {
  balance <- balance*(1 + i/100)
}
balance

## [1] 850.9107

Control Flow - Instructions while - Page 119

A while loop has the general syntax:

while (condition){}

It repeats whatever commands are inside brackets {} as long as condition is TRUE.

x <- 2
y <- 1
while(x+y<6){
  x<-x+y
  print(x+y)}

## [1] 4
## [1] 5
## [1] 6

## [1] 5

# (x=2,y=1) -> (x+y=3<6) -> (x=3,y=1) -> (x+y=4<6) -> 
#(x=4,y=1) -> (x+y=5<6) -> (x=5,y=1) -> (x+y=6) -> End

Exercise: While loop

With the help of function rpois(n=1, lambda = 2), generate a series of random variables $X_1, X_2, X_3, \ldots$ until one of them is equal to $5$. Store those random variables and display the full sequence.

Solution

X <- rpois(n=1, lambda = 2) # We generate our first random variable
X.vector <- X # To start with, our sequence only contains one result (X)
while (X != 5) {
  X <- rpois(n=1, lambda = 2) # We generate a new random variable
  X.vector <- c(X.vector,X) # We add it to the sequence
}
X.vector

##  [1] 0 0 1 0 4 3 2 1 1 1 2 3 2 2 6 0 4 4 3 1 1 1 6 0 3 2 1 1 1 4 1 3 2 3 1 4 1 0
## [39] 2 2 2 3 2 3 0 0 3 2 1 3 0 6 3 3 1 1 0 2 3 0 0 4 2 1 5

Notes

Loops can be inefficient and, if possible, you should avoid them.
Other loops exists: break can be used to exit a loop, and next moves to the next iteration within a loop. See more info and examples on Pages 119-120 or here.

Vectorization - Pages 85-86

As we have already seen, R ‘naturally’ operates on vectors and matrices, applying operations element-by-element

x <- c(1,2,3)
y <- c(4,5,6)
x+y

## [1] 5 7 9

M <- matrix(1:9, nrow=3)
exp(M)

##           [,1]      [,2]     [,3]
## [1,]  2.718282  54.59815 1096.633
## [2,]  7.389056 148.41316 2980.958
## [3,] 20.085537 403.42879 8103.084

We call this behaviour ‘vectorization’, and it is one of the main strengths of R
Vectorization is much quicker than using loops!

Vectorization - Example - Pages 86

x <- runif(5000000) # Generate 5 million random elements
z <- 0;
system.time(for(i in 1:5000000){z <- z + x[i]})

##    user  system elapsed 
##   0.073   0.000   0.073

## [1] 2500198

system.time(zz <- sum(x))

##    user  system elapsed 
##   0.008   0.000   0.008

zz

## [1] 2500198

Functions - calling functions - pages 43-44

A function in R is defined by its name and by the list of its parameters (or arguments). Most functions output a value.
Using a function (or calling or executing it) is done by typing its name followed, in brackets, by the list of arguments to be used. Arguments are separated by commas. Each argument can be followed by the sign = and the value to be given to the argument.

functionname(arg1 = value1, arg2 = value2, arg3 = value3)

Note that you do not necessarily need to indicate the names of the arguments, but only their values, as long as you follow their order.
For any R function, some arguments must be specified and others are optional (because a default value is already given in the code of the function).
Can you name some functions you already know and that we have seen?

Functions - calling functions - page 44

Some functions don’t have arguments!

factorial(6)

## [1] 720

date()

## [1] "Mon Jun 19 02:03:04 2023"

Functions - Arguments - page 45

The function log(x, base = exp(1)) can take two arguments: x (its value must be specified) and base (optional, because a default value is provided as exp(1)).
You can call a function by playing with the arguments in several different ways. This is an important feature of R which makes it easier to use. All the following commands will execute the same calculations

log(3)

## [1] 1.098612

log(x = 3)

## [1] 1.098612

log(x = 3, base = exp(1))

## [1] 1.098612

log(x = 3, exp(1))

## [1] 1.098612

log(3, base = exp(1))

## [1] 1.098612

log(3, exp(1))

## [1] 1.098612

log(base = exp(1), 3)

## [1] 1.098612

log(base = exp(1), x = 3)

## [1] 1.098612

Developing functions - Creating a function - Page 194

An important part of coding in R is creating your own functions.

Creating a function is done following the general syntax: function(<list of arguments>){<body of the function>}, where

<list of arguments> is a list of named arguments (also called formal arguments) ;
<body of the function> represents, as the name suggests, the contents of the code to execute when the function is called.

Developing functions - Calling a function - Pages 194-195

To execute it, the user needs to call the function, followed by the effective arguments listed between brackets () and separated by commas. Here an effective argument is the value affected to a formal argument.

# This line creates a function called 'hello' with one argument called 'name'
hello <- function(name){cat("Hello, my dear", name, "!")}
# This line executes the function, with the the effective argument 'Josephine'
hello(name = "Josephine")

## Hello, my dear Josephine !

Note: R allows calling a function without typing in the complete name of a formal argument:

hello(na="Jinxia")

## Hello, my dear Jinxia !

hello(n="Samantha")

## Hello, my dear Samantha !

Developing functions - Body of a function - Page 195

The body of a function can be a simple R instruction, or a sequence of R instructions. In the latter case, the instructions must be enclosed between the characters { and } to delimit the beginning and end of the body of the function.
Several R instructions can be written on the same line as long as they are separated by a semicolon ‘;’

Exercise - Page 195

Create a function hello() in R, such that

there is a single argument called name
it returns ‘Hello, my dear’ followed by the name in capital letters followed by ’ !’.

Target output

>hello("Peter")

Hello, my dear PETER !

Hint: Use function toupper().

Solution - Page 195

hello <- function(name){
  # Convert the name to upper case.
  name <- toupper(name)
  cat("Hello, my dear", name, "!")
  }

hello("Peter")

## Hello, my dear PETER !

Functions - Multiple arguments example

Of course, a function can have more than one argument. Here, function CDF.pois() has two arguments, x and lambda. It calculates the CDF $F_X(x)$ at x of a Poisson random variable with parameter equal to lambda. Note the use of a for loop.

CDF.pois <- function(x, lambda){
  # Initialise the cdf to 0
  cdf = 0
  # For k from 0 to x, add together the probablity masses p(k)
  for (k in 0:x){
    cdf = cdf + exp(-lambda)*lambda^k/factorial(k)
  }
  # Return the result
  return(cdf)
}
CDF.pois(x = 3, lambda = 4)

## [1] 0.4334701

Note: we have every right to use a function within a function. For instance, here we used the (already defined) function factorial() inside our new function CDF.pois().

Functions - Exercise - pages 45-46

Code a function which takes two arguments $n$ and $p$ and calculates the binomial coefficient \[{n \choose p}=\frac{n!}{p!(n-p)!}\]

Test your function by evaluating the result of \[{5 \choose 3}\] which should yield $10$.

Functions - Solution - pages 45-46

binomial <- function(n,p) {factorial(n)/(factorial(p)*factorial(n-p))}
binomial(5,3)

## [1] 10

Developing functions - Default argument values - Page 195

When declaring a function, all arguments are identified by a unique name.
Each argument can be associated with a default value. To specify a default value, use the character = followed by the default value.
When the function is called with no effective argument for that argument, the default value will be used.

# Declare function 'binomial' with default values
binomial <- function(n=5,p=3){factorial(n)/(factorial(p)*factorial(n-p))}

binomial() # Use both default values

## [1] 10

binomial(n=6) # Specify first argument, but use default value for the second

## [1] 20

Developing functions - Object returned by a function - Pages 198-199

A way to explicitly tell an R function what object to return is to use the function return(). This instruction halts the execution of the code in the body of the function and returns the object between brackets.

binomial <- function(n=5,p=3){
  return(factorial(n)/(factorial(p)*factorial(n-p)))
  # Everything below will not be executed, because we returned the first part
  my.unif <- runif(1)
  while (my.unif <= 1){ # Note that this part would be an infinite loop! Beware of those!
    my.unif <- runif(1)
  }
}

Note that if there is no ‘return()’ in the body of the function, then the function will return the result of the last evaluated expression.

Developing functions - Variable scope in the body of a function - Page 200-201

Variables defined inside the body of a function have a local scope during function execution. This means that a variable inside the body of a function is physically different from another variable with the same name, but defined in the work space of your R session.
Generally speaking, local scope means that a variable only exists inside the body of the function. After the execution of the function, the variable is thus automatically deleted from the memory of the computer.

Exercise

Create a function in R that calculates the present value of an annuity (paying $1$ per year). The inputs are

the number of years, which is by default $1$
whether the payments are paid in arrears or not, which is by default TRUE
the annual interest rate, which is by default $6\%$

Note: recall that the present value of an annuity that pays $1$ at the end of each year for $n$ years is \[\frac{1+(1+i)^{-n}}{i}.\] If payments occur at the beginning of the year (rather than in arrears), then the expected present value is \[(1+i)\frac{1+(1+i)^{-n}}{i}.\]

Solution

annuity_cal <- function(n=1, arrears=TRUE, i=0.06){
  discount <- 1/(1+i)
  temp <- (1-discount^n)/i
    if (arrears == T){
      return(temp)
    }
    else{ 
      return(temp*(1+i))
    }
}

annuity_cal()

## [1] 0.9433962

annuity_cal(a=FALSE)

## [1] 1

annuity_cal(n=5, a=FALSE, i=0.07)

## [1] 4.387211

Exercise

Create a function in R that plots the density/distribution function of a normal random variable. The arguments are

mean $\mu$, which is by default 0
variance $\sigma^2$, which is by default 1
whether a density function is plotted, which is by default TRUE; if FALSE, then the cumulative distribution function is plotted

The output is either the density or the distribution function over the range $(\mu-4\sigma,\mu+4\sigma)$.

Hint: You will need functions dnorm() and pnorm() as well as function plot().

Note: There is more to come about graphical tools in Weeks 5.

Solution

plot_norm <- function(mean=0, variance=1, density=TRUE) {
  temp <- seq(from=mean-4*sqrt(variance), to=mean+4*sqrt(variance), by=sqrt(variance)/50)
  
  if (density) # Note writing 'if (density)' is equivalent to writing 'if (density = T)'
    plot(temp, dnorm(temp, mean, sqrt(variance)))
  else 
    plot(temp, pnorm(temp, mean, sqrt(variance)))
  }

plot_norm()

plot_norm(1,4,TRUE)

plot_norm(1,4,FALSE)

Week 3: R Studio, Control flows and Functions

Term2 2023

RStudio interface - page 40

Control Flow - Introduction

Control Flow - Instructions if and else - Page 117

Control Flow - Instructions for - Page 119

Exercise: For loop

Solution

Control Flow - Instructions while - Page 119

Exercise: While loop

Solution

Notes

Vectorization - Pages 85-86

Vectorization - Example - Pages 86

Functions - calling functions - pages 43-44

Functions - calling functions - page 44

Functions - Arguments - page 45

Developing functions - Creating a function - Page 194

Developing functions - Calling a function - Pages 194-195

Developing functions - Body of a function - Page 195

Exercise - Page 195

Solution - Page 195

Functions - Multiple arguments example

Functions - Exercise - pages 45-46

Functions - Solution - pages 45-46

Developing functions - Default argument values - Page 195

Developing functions - Object returned by a function - Pages 198-199

Developing functions - Variable scope in the body of a function - Page 200-201

Exercise

Solution

Exercise

Solution