Module 3 Lesson 9 :R Programming Structures

Introduction

R is a block-structured language in the manner of the ALGOL-descendant family, such as C, C++, Python, Perl, and so on. As you’ve seen, blocks are delineated by braces, though braces are optional if the block consists of just a single statement. Statements are separated by newline characters or, optionally, by semicolons.

Control Statements

Loops

x <- c(5,12,13)
for (n in x) print(n^2)

## [1] 25
## [1] 144
## [1] 169

Looping with while and repeat is also available, complete with break, a statement that causes control to leave the loop. Here is an example that uses all three:

i <- 1
while (i <= 10) i <- i+4
i

## [1] 13

i <- 1
while(TRUE) { # similar loop to above
  i <- i+4
  if (i > 10) break
}
i

## [1] 13

i <- 1
repeat { # again similar
  i <- i+4
  if (i > 10) break
}
i

## [1] 13

In the first code snippet, the variable i took on the values 1, 5, 9, and 13 as the loop went through its iterations. In that last case, the condition i <= 10 failed, so the break took hold and we left the loop.
This code shows three different ways of accomplishing the same thing, with break playing a key role in the second and third ways. Note that repeat has no Boolean exit condition. You must use break (or something like return()). Of course, break can be used with for loops, too.

Looping Over Nonvector Sets

R does not directly support iteration over nonvector sets, but there are a couple of indirect yet easy ways to accomplish it:
- Use lapply(), assuming that the iterations of the loop are independent of each other, thus allowing them to be performed in any order.
- Use get(). As its name implies, this function takes as an argument a character string representing the name of some object and returns the object of that name. It sounds simple, but get() is a very powerful function.
Let’s look at an example of using get(). Say we have two matrices, u and v, containing statistical data, and we wish to apply R’s linear regression function lm() to each of them.

u<-matrix(c(1,2,3,1,2,4), nrow=3)
v<-matrix(c(8,12,20,15,10,2), nrow=3)
for (m in c("u","v")) {
  z <- get(m)
  print(lm(z[,2] ~ z[,1]))
}

## 
## Call:
## lm(formula = z[, 2] ~ z[, 1])
## 
## Coefficients:
## (Intercept)       z[, 1]  
##     -0.6667       1.5000  
## 
## 
## Call:
## lm(formula = z[, 2] ~ z[, 1])
## 
## Coefficients:
## (Intercept)       z[, 1]  
##      23.286       -1.071

Here, m was first set to u. Then these lines assign the matrix u to z, which allows the call to lm() on u:

z <- get(m)
print(lm(z[,2] ~ z[,1]))

## 
## Call:
## lm(formula = z[, 2] ~ z[, 1])
## 
## Coefficients:
## (Intercept)       z[, 1]  
##      23.286       -1.071

The same then occurs with v.

if-else

The syntax for if-else looks like this:

if (r == 4) {
  x<-1
} else {
  x<-3
  y<-4
}

It looks simple, but there is an important subtlety here. The if section consists of just a single statement:

x <- 1

So, you might guess that the braces around that statement are not necessary.However, they are indeed needed.
The right brace before the else is used by the R parser to deduce that this is an if-else rather than just an if. In interactive mode, without braces, the parser would mistakenly think the latter and act accordingly, which is not what we want.
An if-else statement works as a function call, and as such, it returns the last value assigned.

v <- if (cond) expression1 else expression2

This will set v to the result of expression1 or expression2, depending on whether cond is true. You can use this fact to compact your code. Here’s a simple example:

x <- 2
y <- if(x == 2) x else x+1
y

## [1] 2

x <- 3
y <- if(x == 2) x else x+1
y

## [1] 4

Without taking this tack, the code

y <- if(x == 2) x else x+1

would instead consist of the somewhat more cluttered

if(x == 2) y <- x else y <- x+1

Arithmetic and Boolean Operators and Values

Figure1

Though R ostensibly has no scalar types, with scalars being treated as one-element vectors, we see the exception in the Figure1. There are different Boolean operators for the scalar and vector cases. This may seem odd, but a simple example will demonstrate the need for such a distinction.

x<-c(TRUE, FALSE, TRUE)
y<-c(TRUE,TRUE,FALSE)
x&y

## [1]  TRUE FALSE FALSE

x[1] && y[1]

## [1] TRUE

x && y # looks at just the first elements of each vector

## [1] TRUE

if (x[1] && y[1]) print("both TRUE")

## [1] "both TRUE"

if (x & y) print("both TRUE")

## [1] "both TRUE"

The central point is that in evaluating an if, we need a single Boolean, not a vector of Booleans, hence the warning seen in the preceding example, as well as the need for having both the & and && operators.
The Boolean values TRUE and FALSE can be abbreviated as T and F (both must be capitalized). These values change to 1 and 0 in arithmetic expressions:

 1 < 2

## [1] TRUE

(1 < 2) * (3 < 4)

## [1] 1

(1 < 2) * (3 < 4) * (5 < 1)

## [1] 0

(1 < 2) == TRUE

## [1] TRUE

(1 < 2) == 1

## [1] TRUE

In the second computation, for instance, the comparison 1 < 2 returns TRUE, and 3 < 4 yields TRUE as well. Both values are treated as 1 values, so the product is 1.
On the surface, R functions look similar to those of C, Java, and so on. However, they have much more of a functional programming flavor, which has direct implications for the R programmer.

Default Values for Arguments

In the previous lessons, we read in a data set from a file named exams:

testscores <- read.table("exams",header=TRUE)

The argument header=TRUE tells R that we have a header line, so R should not count that first line in the file as data. This is an example of the use of named arguments. Here are the first few lines of the function:

read.table
function (file, header = FALSE, sep = "", quote = "\"'", dec = ".", row.names, col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = default.stringsAsFactors(), encoding = "unknown")
{
if (is.character(file)) {
file <- file(file, "r")
on.exit(close(file))
...
...

The second formal argument is named header. The = FALSE field means that this argument is optional, and if we don’t specify it, the default value will be FALSE. If we don’t want the default value, we must name the argument in our call:

testscores <- read.table("exams",header=TRUE)

Return Values

The return value of a function can be any R object. Although the return value is often a list, it could even be another function.
You can transmit a value back to the caller by explicitly calling return(). Without this call, the value of the last executed statement will be returned by default. For instance, consider the oddcount() function below:

oddcount<-function(x) {
k <- 0 # assign 0 to k
for (n in x) {
if (n %% 2 == 1) k <- k+1 # %% is the modulo operator
}
return(k)
}

This function returns the count of odd numbers in the argument. We could slightly simplify the code by eliminating the call to return(). To do this, we evaluate the expression to be returned, k, as our last statement in the code:

oddcount <- function(x) {
k <- 0
pagebreak
for (n in x) {
if (n %% 2 == 1) k <- k+1
}
k
}

On the other hand, consider this code:

oddcount <- function(x) {
k <- 0
for (n in x) {
if (n %% 2 == 1) k <- k+1
}
}

It wouldn’t work, for a rather subtle reason: The last executed statement here is the call to for(), which returns the value NULL (and does so, in R parlance, invisibly, meaning that it is discarded if not stored by assignment). Thus, there would be no return value at all.

Functions are Objects

R functions are first-class objects (of the class “function”, of course), meaning that they can be used for the most part just like other objects. This is seen in the syntax of function creation:

g <- function(x) {
  return(x+1)
}

Here, function() is a built-in R function whose job is to create functions!
On the right-hand side, there are really two arguments to function(): The first is the formal argument list for the function we’re creating—here, just x—and the second is the body of that function—here, just the single statement return(x+1). That second argument must be of class “expression”. So, the point is that the right-hand side creates a function object, which is then assigned to g.
By the way, even the “{” is a function, as you can verify by typing this:

?"{"

Its job is the make a single unit of what could be several statements. These two arguments to function() can later be accessed via the R functions formals() and body(), as follows:

formals(g)

## $x

body(g)

## {
##     return(x + 1)
## }

Recall that when using R in interactive mode, simply typing the name of an object results in printing that object to the screen. Functions are no exception, since they are objects just like anything else.

## function(x) {
##   return(x+1)
## }

This is handy if you’re using a function that you wrote but which you’ve forgotten the details of. Printing out a function is also useful if you are not quite sure what an R library function does. By looking at the code, you may understand it better. For example, if you are not sure as to the exact behavior of the graphics function abline(), you could browse through its code to better understand how to use it.

abline

## function (a = NULL, b = NULL, h = NULL, v = NULL, reg = NULL, 
##     coef = NULL, untf = FALSE, ...) 
## {
##     int_abline <- function(a, b, h, v, untf, col = par("col"), 
##         lty = par("lty"), lwd = par("lwd"), ...) .External.graphics(C_abline, 
##         a, b, h, v, untf, col, lty, lwd, ...)
##     if (!is.null(reg)) {
##         if (!is.null(a)) 
##             warning("'a' is overridden by 'reg'")
##         a <- reg
##     }
##     if (is.object(a) || is.list(a)) {
##         p <- length(coefa <- as.vector(coef(a)))
##         if (p > 2) 
##             warning(gettextf("only using the first two of %d regression coefficients", 
##                 p), domain = NA)
##         islm <- inherits(a, "lm")
##         noInt <- if (islm) 
##             !as.logical(attr(stats::terms(a), "intercept"))
##         else p == 1
##         if (noInt) {
##             a <- 0
##             b <- coefa[1L]
##         }
##         else {
##             a <- coefa[1L]
##             b <- if (p >= 2) 
##                 coefa[2L]
##             else 0
##         }
##     }
##     if (!is.null(coef)) {
##         if (!is.null(a)) 
##             warning("'a' and 'b' are overridden by 'coef'")
##         a <- coef[1L]
##         b <- coef[2L]
##     }
##     int_abline(a = a, b = b, h = h, v = v, untf = untf, ...)
##     invisible()
## }
## <bytecode: 0x000000001507a8e0>
## <environment: namespace:graphics>

If you wish to view a lengthy function in this way, run it through page():

page(abline)

An alternative is to edit it using the edit() function, which we will discuss in the next lecture.
Note, though, that some of R’s most fundamental built-in functions are written directly in C, and thus they are not viewable in this manner. Here’s an example:

sum

## function (..., na.rm = FALSE)  .Primitive("sum")

Since functions are objects, you can also assign them, use them as arguments to other functions, and so on.

f1 <- function(a,b) return(a+b)
f2 <- function(a,b) return(a-b)
f <- f1
f(3,2)

## [1] 5

f <- f2
f(3,2)

## [1] 1

g <- function(h,a,b) h(a,b)
g(f1,3,2)

## [1] 5

g(f2,3,2)

## [1] 1

And since functions are objects, you can loop through a list consisting of several functions. This would be useful, for instance, if you wished to write a loop to plot a number of functions on the same graph, as follows:

g1 <- function(x) return(sin(x))
g2 <- function(x) return(sqrt(x^2+1))
g3 <- function(x) return(2*x-1)
plot(c(0,1),c(-1,1.5)) # prepare the graph, specifying X and Y ranges
for (f in c(g1,g2,g3)) plot(f,0,1,add=T) # add plot to existing graph

The functions formals() and body() can even be used as replacement functions. We’ll discuss replacement functions in the next lecture, but for now, consider how you could change the body of a function by assignment:

g <- function(h,a,b) h(a,b)
body(g) <- quote(2*x + 3)
g

## function (h, a, b) 
## 2 * x + 3

g(3)

## [1] 5 3 5

The reason quote() was needed is that technically, the body of a function has the class “call”, which is the class produced by quote(). Without the call to quote(), R would try to evaluate the quantity 2*x+3. So if x had been defined and equal to 3, for example, we would assign 9 to the body of g(), certainly not what we want.