1 An overview

This chapter gives an overview of the R language, designed to help you understand R code and write your own.

1.1 Expressions

R code is composed of a series of expressions. Examples of expressions in R include assignment statements, conditional statements, and arithmetic expressions. Here are a few examples of expressions:

x <- 1
if (1 > 2) "yes" else "no"
## [1] "no"
127 %% 10
## [1] 7

Expressions are composed of objects and functions. You may separate expressions with new lines or with semicolons. For example, here is a series of expressions separated by semicolons:

"this expression will be printed"; 7 + 13; exp(0+1i*pi)
## [1] "this expression will be printed"
## [1] 20
## [1] -1+0i

1.2 Objects

All R code manipulates objects. Examples of objects in R include numeric vectors, character vectors, lists, and functions. Here are some examples of objects:

c(1,2,3,4,5) # a numerical vector (with five elements)
## [1] 1 2 3 4 5
"This is an object too" # a character vector (with one element)
## [1] "This is an object too"
list(c(1,2,3,4,5),"This is an object too", " this is a list") # a list
## [[1]]
## [1] 1 2 3 4 5
## 
## [[2]]
## [1] "This is an object too"
## 
## [[3]]
## [1] " this is a list"
function(x,y) {x + y}  # a function
## function(x,y) {x + y}

1.3 Symbols

Formally, variable names in R are called symbols. When you assign an object to a variable name, you are actually assigning the object to a symbol in the current environment. For example, the statement:

 x <- 1

assigns the symbol x to the object 1 in the current environment.

1.4 Functions

A function is an object in R that takes some input objects (called the arguments of the function) and returns an output object. All work in R is done by functions. Every statement in R (setting variables, doing arithmetic, repeating code in a loop) can be written as a function. Here are a few more examples of R syntax and the corresponding function calls:

apples <- 3  # pretty assignment
apples
## [1] 3
`<-`(apples,3) # functional form of assignment
apples
## [1] 3
`<-`(oranges,4) # another assignment statement, so that we can compare apples and oranges
oranges
## [1] 4
apples + oranges # pretty arithmetic expression
## [1] 7
`+`(apples,oranges) # functional form of arithmetic expression
## [1] 7
# pretty form of if-then statement
if (apples > oranges) "apples are better" else "oranges are better" 
## [1] "oranges are better"
# functional form of if-then statement
`if`(apples > oranges,"apples are better","oranges are better") 
## [1] "oranges are better"
x <- c("apple","orange","banana","pear")
x[2] # pretty form of vector reference
## [1] "orange"
`[`(x,2) # functional form or vector reference
## [1] "orange"

1.5 Objects are copied in assignment statements

In assignment statements, most objects are immutable. R will copy the object, not just the reference to the object. For example:

u <- list(1)
v <- u
u[[1]] <- "hat"
u
## [[1]]
## [1] "hat"
v
## [[1]]
## [1] 1

This is also true in function calls. Consider the following function,

f <- function(x,i) {x[i] = 4}
w <- c(10, 11, 12, 13)
f(w,1)
w
## [1] 10 11 12 13

The vector w is copied when it is passed to the function, so it is not modified by the function. The value x is modified inside the context of the function.

1.6 Everything in R is an object

In the last few sections, most examples of objects were objects that stored data: vectors, lists, and other data structures. However, everything in R is an object: functions, symbols, and even R expressions.

For example, function names in R are really symbol objects that point to function objects. (That relationship is, in turn, stored in an environment object.) You can assign a symbol to refer to a numeric object and then change the symbol to refer to a function:

x <- 1
x
## [1] 1
x(2)
## Error in x(2): could not find function "x"
x <- function(i) i^2
x
## function(i) i^2
x(2)
## [1] 4

1.7 Special values

There are a few special values that are used in R.

1.7.1 NA

In R, the NA values are used to represent missing values. (NAstands for “not available”.) You may encounter NA values in text loaded into R (to represent missing values) or in data loaded from databases (to replace NULL values).

v <- c(1,2,3)
v
## [1] 1 2 3
length(v) <- 4
v
## [1]  1  2  3 NA

1.7.2 Inf and -Inf

If a computation results in a number that is too big, R will return Inf for a positive number and -Inf for a negative number (meaning positive and negative infinity, respectively):

2 ^ 1024
## [1] Inf
-2 ^ 1024
## [1] -Inf

This is also the value returned when you divide by 0:

1 / 0
## [1] Inf

1.7.3 NaN

Sometimes, a computation will produce a result that makes little sense. In these cases, R will often return NaN (meaning “not a number”):

Inf - Inf
## [1] NaN
0 / 0
## [1] NaN

1.7.4 NULL

Additionally, there is a null object in R, represented by the symbol NULL. (The symbol NULL always points to the same object.) NULL is often used as an argument in functions to mean that no value was assigned to the argument. Additionally, some functions may return NULL. Note that NULL is not the same as NA, Inf, -Inf, or NaN.

x <- NULL
for(i in 1:5) x <- c(x,i)
x
## [1] 1 2 3 4 5

1.8 Coercion

When you call a function with an argument of the wrong type, R will try to coerce values to a different type so that the function will work. There are two types of coercion that occur automatically in R: coercion with formal objects and coercion with built-in types.

  • With generic functions, R will look for a suitable method. If no exact match exists, R will search for a coercion method that converts the object to a type for which a suitable method does exist.

  • Additionally, R will automatically convert between built-in object types when appropriate. R will convert from more specific types to more general types.

x <- c(1, 2, 3, 4, 5)
x
## [1] 1 2 3 4 5
typeof(x)
## [1] "double"
class(x)
## [1] "numeric"
x[2] <- "hat"
x
## [1] "1"   "hat" "3"   "4"   "5"
typeof(x)
## [1] "character"
class(x)
## [1] "character"

Here is an overview of the coercion rules:

  • Logical values are converted to numbers: TRUE is converted to 1 and FALSE to 0.
  • Values are converted to the simplest type required to represent all information.
  • The ordering is roughly logical < integer < numeric < complex < character < list.
  • Objects of type raw are not converted to other types.
  • Object attributes are dropped when an object is coerced from one type to another.

2 R syntax

It is possible to write almost any R expression as a function call. However, it’s confusing reading lots of embedded function calls, so R provides some special syntax to make code for common operations more readable.

2.1 Constants

Constants are the basic building blocks for data objects in R: numbers, character values, and symbols.

2.1.1 Numeric vectors

Numbers are interpreted literally in R.

The sequence operator a:b will return a vector of integers between a and b. To combine an arbitrary set of numbers into a vector, use the c() function:

v <- c(173,12,1.12312,-93)

R allows a lot of flexibility when entering numbers. However, there is a limit to the size and precision of numbers that R can represent:

(2^1023 + 1) == 2^1023 # limits of precision
## [1] TRUE
2^1024 # limits of size
## [1] Inf

R also supports complex numbers. Complex values are written as real_part+imaginary_parti. For example:

0+1i ^ 2
## [1] -1+0i
sqrt(-1+0i)
## [1] 0+1i
exp(0+1i * pi)
## [1] -1+0i

Note that the function sqrt() returns a value of the same type as its input; it will return the value 0+1i when passed -1+0i but will return an NaN value when just passed the numeric value -1:

sqrt(-1)
Warning message:
In sqrt(-1) : NaNs produced

2.1.2 Character vectors

A character object contains all of the text between a pair of quotes.

"hello"
## [1] "hello"
'hello'
## [1] "hello"
identical("\"hello\"",'"hello"')
## [1] TRUE
identical('\'hello\'',"'hello'")
## [1] TRUE
numbers <- c("one","two","three","four","five")
numbers
## [1] "one"   "two"   "three" "four"  "five"

2.1.3 Symbols

An important class of constants is symbols. A symbol is an object in R that refers to another object; a symbol is the name of a variable in R. For example, let’s assign the numeric value 1 to the symbol x:

x <- 1
  • A symbol that begins with a character and contains other characters, numbers, periods, and underscores may be used directly in R statements. Here are a few examples of symbol names that can be typed without escape characters:
x <- 0
x1 <- 1
X1 <- 2
x1
## [1] 1
X1
## [1] 2
x1.1 <- 3
x1.1_1 <- 4
  • Some symbols contain special syntax. In order to refer to these objects, you enclose them in backquotes. For example, to get help on the assignment operator (<-), you would use a command like this:
?`<-`
  • If you really wanted to, you could use backquotes to define a symbol that contains special characters or starts with a number:
`1+2=3` <- "hello"
`1+2=3`
## [1] "hello"
  • Not all words are valid as symbols; some words are reserved in R. Specifically, you can’t use if, else, repeat, while, function, for, in, next, break, TRUE, FALSE, NULL, Inf, NaN, NA, NA_integer_, NA_real_, NA_complex_, NA_character_, ..., ..1, ..2, ..3, ..4, ..5, ..6, ..7, ..8, or ..9.

  • You can redefine primitive functions that are not on this list. For example,

c
## function (...)  .Primitive("c")
c <- 1
c
## [1] 1

Even after you redefine the symbol c, you can continue to use the “combine” function c() as before:

v <- c(1,2,3)

2.2 Operators

Many functions in R can be written as operators. An operator is a function that takes one or two arguments and can be written without parentheses.

One familiar set of operators is binary operators for arithmetic. R supports arithmetic operations:

1 + 19 # addition
## [1] 20
5 * 4 # multiplication
## [1] 20

R also includes notation for other mathematical operations, including modulus, exponents, and integer division:

41 %% 21 # modulus
## [1] 20
20 ^ 1 # exponents
## [1] 20
21 %/% 2  # integer division
## [1] 10

You can define your own binary operators. User-defined binary operators consist of a string of characters between two “%” characters. For example,

`%myop%` <- function(a, b) {2*a + 2*b}
1 %myop% 1
## [1] 4
1 %myop% 2
## [1] 6

Some language constructs are also binary operators. For example, assignment, indexing, and function calls are binary operators:

# assignment is a binary operator
# the left side is a symbol, the right is a value
x <- c(1,2,3,4,5)

# indexing is a binary operator too
# the left side is a symbol, the right is an index
x[3]
## [1] 3
# a function call is also a binary operator
# the left side is a symbol pointing to the function argument
# the right side are the arguments
max(1,2)
## [1] 2

There are also unary operators that take only one variable. Here are two familiar examples:

-7 # negation is a unary operator
## [1] -7
 ?`?` # ? (for help) is also a unary operator

2.2.1 Order of operations

In order to resolve ambiguity, operators in R are always interpreted in the same order. Here is a summary of the precedence rules: Function calls and grouping expressions

  • Index and lookup operators
  • Arithmetic
  • Comparison
  • Formulas
  • Assignment
  • Help

Table Operator precedence.

Operators (in order of priority) Description
( { Function calls and grouping expressions (respectively)
[ [[ Indexing
:: ::: Access variables in a namespace
$ @ Component / slot extraction
^ Exponentiation (right to left)
- + Unary minus and plus
: Sequence operator
%any% Special operators
* / Multiply, divide
+ - (Binary) add, subtract
< > <= >= == != Ordering and comparison
! Negation
& && And
| || Or
~ As in formulas
-> ->> Rightward assignment
= Assignment (right to left)
<- <<- Assignment (right to left)
? Help (unary and binary)

For a current list of built-in operators and their precedence, see the help file for syntax.

Assignments Most assignments that we’ve seen so far simply assign an object to a symbol. For example:

x <- 1
y <- list(shoes="loafers", hat="Yankees cap", shirt="white")
z <- function(a,b,c) {a ^ b / c}
v <- c(1,2,3,4,5,6,7,8)

There is an alternative type of assignment statement in R that acts differently: assignments with a function on the lefthand side of the assignment operator. These statements replace an object with a new object that has slightly different properties. Here are a few examples:

dim(v) <- c(2,4)
v[2,2] <- 10

z <- function(a,b,c) {a ^ b / c}
formals(z) <- alist(a=1,b=2,c=3)

There is a little bit of magic going on behind the scenes. An assignment statement of the form:

fun(sym) <- val

is really syntactic sugar for a function of the form:

`fun<-`(sym,val)

Each of these functions replaces the object associated with sym in the current environment. By convention, fun refers to a property of the object represented by sym. If you write a method with the name method_name<-, then R will allow you to place method_name on the lefthand side of an assignment statement.

2.3 Expressions

R provides different constructs for grouping together expressions: semicolons, parentheses, and curly braces.

2.3.1 Separating Expressions

You can write a series of expressions on separate lines:

x <- 1
y <- 2
z <- 3

Alternatively, you can place them on the same line, separated by semicolons:

x <- 1; y <- 2; z <- 3

2.3.2 Parentheses

The parentheses notation returns the result of evaluating the expression inside the parentheses:

    (expression)
## function (...)  .Primitive("expression")

The operator has the same precedence as a function call. In fact, grouping a set of expressions inside parentheses is equivalent to evaluating a function of one argument that just returns its argument:

2 * (5 + 1)
## [1] 12
f <- function (x) x # equivalent expression
2 * f(5 + 1)
## [1] 12

Grouping expressions with parentheses can be used to override the default order of operations. For example:

2 * 5 + 1
## [1] 11
2 * (5 + 1)
## [1] 12

2.3.3 Curly braces

Curly braces are used to evaluate a series of expressions (separated by new lines or semicolons) and return only the last expression:

    {expression_1; expression_2; ... expression_n}

Often, curly braces are used to group a set of operations in the body of a function:

f <- function() {x <- 1; y <- 2; x + y}
f()
## [1] 3

However, curly braces can also be used as expressions in other contexts:

{x <- 1; y <- 2; x + y}
## [1] 3

The contents of the curly braces are evaluated inside the current environment; a new environment is created by a function call but not by the use of curly braces:

# when evaluated in a function, u and v are assigned
# only inside the function environment
remove(list = ls())
f <- function() {u <- 1; v <- 2; u + v}
u
Error: object "u" not found
v
Error: object "v" not found
# when evaluated outside the function, u and v are
# assigned in the current environment
{u <- 1; v <- 2; u + v}
## [1] 3

2.4 Control structures

Nearly every operation in R can be written as a function, but it isn’t always convenient to do so. Therefore, R provides special syntax that you can use in common program structures. We’ve already described two important sets of constructions: operators and grouping brackets. This section describes a few other key language structures and explains what they do.

2.4.1 Conditional statements

Conditional statements take the form:

    if (condition) true_expression else false_expression

or, alternatively:

    if (condition) expression

Because the expressions expression, true_expression, and false_expression are not always evaluated, the function if has the type special:

typeof(`if`)
## [1] "special"

Here are a few examples of conditional statements:

if (FALSE) "this will not be printed"
if (FALSE) "this will not be printed" else "this will be printed"
## [1] "this will be printed"
x <-1
if (is(x, "numeric")) x/2 else print("x is not numeric")
## [1] 0.5

In R, conditional statements are not vector operations. If the condition statement is a vector of more than one logical value, only the first item will be used. For example:

x <- 10
y <- c(8, 10, 12, 3, 17)
if (x < y) x else y
## 8 10 12  3 17
Warning message:
In if (x < y) x else y :
  the condition has length > 1 and only the first element will be used

If you would like a vector operation, use the ifelse function instead:

a <- c("a","a","a","a","a")
b <- c("b","b","b","b","b")
ifelse(c(TRUE,FALSE,TRUE,FALSE,TRUE),a,b)
## [1] "a" "b" "a" "b" "a"

2.4.2 Loops

There are three different looping constructs in R. Simplest is repeat, which just repeats the same expression:

      repeat expression

To stop repeating the expression, you can use the keyword break. To skip to the next iteration in a loop, you can use the command next.

  • As an example, the following R code prints out multiples of 5 up to 25:
i <- 5
repeat {if (i > 25) break else {print(i); i <- i + 5;}}
## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25

If you do not include a break command, the R code will be an infinite loop. (This can be useful for creating an interactive application.)

  • Another useful construction is while loops, which repeat an expression while a condition is true:
    while (condition) expression

As a simple example, let’s rewrite the example above using a while loop:

i <- 5
while (i <= 25) {print(i); i <- i + 5}
## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25

You can also use break and next inside while loops. The break statement is used to stop iterating through a loop. The next statement skips to the next loop iteration without evaluating the remaining expressions in the loop body.

  • Finally, R provides for loops, which iterate through each item in a vector (or a list):
    for (var in list) expression

Let’s use the same example for a for loop:

for (i in seq(from=5,to=25,by=5)) print(i)
## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25

You can also use break and next inside for loops.

There are two important properties of looping statements to remember. First, results are not printed inside a loop unless you explicitly call the print function. For example:

for (i in seq(from=5,to=25,by=5)) i

Second, the variable var that is set in a for loop is changed in the calling environment:

i <- 1
for (i in seq(from=5,to=25,by=5)) i
i
## [1] 25

Like conditional statements, the looping functions repeat, while, and for have type special, because expression is not necessarily evaluated.

  • examples

Find the limit of the following sequence:

\(x_{n+1} = (x_n+2/x_n)/2\).

Using a for loop:

x0 <- 1
for (i in 1:100) {
  x0 <- (x0 + 2/x0)/2
}
x0
## [1] 1.414214

Using a while loop:

x0 <- 1; x1 <- 0;
while (abs(x1-x0) > 1e-8) {
  x1 = x0
  x0 <- (x1 + 2/x1)/2
}
x0
## [1] 1.414214

Using a repeat loop:

x0 <- 1
repeat {
  x1 = x0
  x0 <- (x1 + 2/x1)/2
  if (abs(x1-x0) < 1e-8) break
}
x0
## [1] 1.414214

2.4.3 Switch

Technically speaking, switch is just another function, but its semantics are close to those of control structures of other programming languages.

  • The syntax is
switch (statement, list)
  • If value is a number between 1 and the length of list then the corresponding element of list is evaluated and the result returned. If value is too large or too small NULL is returned.
x <- 3
switch(x, 2+2, mean(1:10), rnorm(5))
## [1]  0.2241614  0.3303874 -0.4828706 -0.4151718  1.3656318
switch(2, 2+2, mean(1:10), rnorm(5))
## [1] 5.5
switch(6, 2+2, mean(1:10), rnorm(5))
  • If value is a character vector then the element of ... with a name that exactly matches value is evaluated. If there is no match a single unnamed argument will be used as a default. If no default is specified, NULL is returned.
y <- "fruit"
switch(y, fruit = "banana", vegetable = "broccoli", "Neither")
## [1] "banana"
y <- "meat"
 switch(y, fruit = "banana", vegetable = "broccoli", "Neither")
## [1] "Neither"

Examples

Loop over data frame rows Imagine that you are interested in the days where the stock price of Apple rises above 117. If it goes above this value, you want to print out the current date and stock price.

# Define stock
date <- seq(from = as.Date("2016-12-01"), to = as.Date("2016-12-30"), by = "days")
date <- date[-c(3,4,10,11,17,18,24,25,26)]
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30,
           115.19, 115.19, 115.82, 115.97, 116.64, 116.95, 117.06, 116.29, 116.52,
           117.26, 116.76, 116.73, 115.82)
stock <- data.frame(date = date, apple = apple)

# Loop over stock rows
for (row in 1:nrow(stock)) {
    price <- stock[row, "apple"]
    date  <- stock[row, "date"]

    if(price > 116) {
        print(paste("On", date, 
                    "the stock price was", price))
    } else {
        print(paste("The date:", date, 
                    "is not an important day!"))
    }
}
## [1] "The date: 2016-12-01 is not an important day!"
## [1] "The date: 2016-12-02 is not an important day!"
## [1] "The date: 2016-12-05 is not an important day!"
## [1] "The date: 2016-12-06 is not an important day!"
## [1] "The date: 2016-12-07 is not an important day!"
## [1] "The date: 2016-12-08 is not an important day!"
## [1] "The date: 2016-12-09 is not an important day!"
## [1] "The date: 2016-12-12 is not an important day!"
## [1] "The date: 2016-12-13 is not an important day!"
## [1] "The date: 2016-12-14 is not an important day!"
## [1] "The date: 2016-12-15 is not an important day!"
## [1] "The date: 2016-12-16 is not an important day!"
## [1] "On 2016-12-19 the stock price was 116.64"
## [1] "On 2016-12-20 the stock price was 116.95"
## [1] "On 2016-12-21 the stock price was 117.06"
## [1] "On 2016-12-22 the stock price was 116.29"
## [1] "On 2016-12-23 the stock price was 116.52"
## [1] "On 2016-12-27 the stock price was 117.26"
## [1] "On 2016-12-28 the stock price was 116.76"
## [1] "On 2016-12-29 the stock price was 116.73"
## [1] "The date: 2016-12-30 is not an important day!"

2.5 Accessing data structures

R has some specialized syntax for accessing data structures. You can fetch a single item from a structure, or multiple items (possibly as a multidimensional array) using R’s index notation. You can fetch items by location within a data structure or by name.

2.5.1 Data structure operators

The following table shows the operators in R used for accessing objects in a data structure. Table Data structure access notation.

Syntax Objects Description
x[i] Vectors, lists Returns objects from object x, described by i. i may be an integer vector, character vector (of object names), or logical vector. Does not allow partial matches. When used with lists, returns a list. When used with vectors, returns a vector.
x[[i]] Vectors, lists Returns a single element of x, matching i. i may be an integer or character vector of length 1. Allows partial matches (with exact=FALSE option).
x$n Lists Returns object with name n from object x
x@n S4 objects Returns element stored in slot named n

Although the single-bracket notation and double-bracket notation look very similar, there are three important differences. First, double brackets always return a single element, while single brackets may return multiple elements. Second, when elements are referred to by name (as opposed to by index), single brackets only match named objects exactly, while double brackets allow partial matches. Finally, when used with lists, the single-bracket notation returns a list, but the double-bracket notation returns a vector. I’ll explain how to use this notation below.

2.5.2 Indexing by integer vector

The most familiar way to look up an element in R is by numeric vector. For example,

v <- 100:119
v[5]
## [1] 104
v[1:5]
## [1] 100 101 102 103 104
v[c(1,6,11,16)]
## [1] 100 105 110 115

As a special case, you can use the double-bracket notation to reference a single element:

v[[3]]
## [1] 102

The double-bracket notation works the same as the single-bracket notation in this case.

We can use negative integers to return a vector consisting of all elements except the specified elements:

# exclude elements 1:15 (by specifying indexes -1 to -15)
v[-15:-1]
## [1] 115 116 117 118 119

The same notation applies to lists:

obs <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)
obs[1:3]
## $a
## [1] 1
## 
## $b
## [1] 2
## 
## $c
## [1] 3
obs[-7:-1]
## $h
## [1] 8
## 
## $i
## [1] 9
## 
## $j
## [1] 10

We can use this notation to extract parts of multidimensional data structures:

m <- matrix(data=c(101:112),nrow=3,ncol=4)
m
##      [,1] [,2] [,3] [,4]
## [1,]  101  104  107  110
## [2,]  102  105  108  111
## [3,]  103  106  109  112
m[3]
## [1] 103
m[3,4]
## [1] 112
m[1:2,1:2]
##      [,1] [,2]
## [1,]  101  104
## [2,]  102  105

If you omit a vector specifying a set of indices for a dimension, then elements for all indices are returned:

m[1:2,]
##      [,1] [,2] [,3] [,4]
## [1,]  101  104  107  110
## [2,]  102  105  108  111
m[3:4]
## [1] 103 104
m[,3:4]
##      [,1] [,2]
## [1,]  107  110
## [2,]  108  111
## [3,]  109  112

When selecting a subset, R will automatically coerce the result to the most appropriate number of dimensions. If you select a subset of elements that corresponds to a matrix, R will return a matrix object; if you select a subset that corresponds to only a vector, R will return a vector object. To disable this behavior, you can use the drop=FALSE option:

a <- array(data=c(101:124),dim=c(2,3,4))
class(a[1,1,])
## [1] "integer"
class(a[1,,])
## [1] "matrix"
class(a[1:2,1:2,1:2])
## [1] "array"
class(a[1,1,1,drop=FALSE])
## [1] "array"

It is also possible to replace elements in a vector, matrix, or array using the same notation:

m[1] <- 1000
m
##      [,1] [,2] [,3] [,4]
## [1,] 1000  104  107  110
## [2,]  102  105  108  111
## [3,]  103  106  109  112
m[1:2,1:2] <- matrix(c(1001:1004),nrow=2,ncol=2)
m
##      [,1] [,2] [,3] [,4]
## [1,] 1001 1003  107  110
## [2,] 1002 1004  108  111
## [3,]  103  106  109  112

It is even possible to extend a data structure using this notation. A special NA element is used to represent values that are not defined:

v <- 1:12
v[15] <- 15
v
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 NA NA 15

The data structure can indexed by a factor; the factor is interpreted as an integer vector.

2.5.3 Indexing by logical vector

As an alternative to indexing by an integer vector, you can also index through a logical vector. For example,

rep(c(TRUE,FALSE),10)
##  [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
## [12] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
v[rep(c(TRUE,FALSE),10)]
##  [1]  1  3  5  7  9 11 NA 15 NA NA
v[(v==103)]
## [1] NA NA
v[(v %% 3 == 0)]
## [1]  3  6  9 12 NA NA 15
v[c(TRUE,FALSE,FALSE)]
## [1]  1  4  7 10 NA

As above, the same notation applies to lists:

obs <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)
obs[(obs > 7)]
## $h
## [1] 8
## 
## $i
## [1] 9
## 
## $j
## [1] 10

2.5.4 Indexing by name

With lists, each element may be assigned a name. You can index an element by name using the $ notation:

obs <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)
obs$j
## [1] 10
obs[c("a","b","c")]
## $a
## [1] 1
## 
## $b
## [1] 2
## 
## $c
## [1] 3

list can be indexed by name using the double-bracket notation when selecting a single element. It is even possible to index by partial name using the exact=FALSE option:

dairy <- list(milk="1 gallon", butter="1 pound", eggs=12)
dairy$milk
## [1] "1 gallon"
dairy[["milk"]]
## [1] "1 gallon"
dairy[["mil"]]
## NULL
dairy[["mil",exact=FALSE]]
## [1] "1 gallon"

Sometimes, an object is a list of lists. You can also use the double-bracket notation to reference an element in this type of data structure. To do this, use a vector as an argument. R will iterate through the elements in the vector, referencing sublists:

fruit <- list(apples=6, oranges=3, bananas=10)
dairy <- list(milk="1 gallon", butter="1 pound", eggs=12)
shopping.list <- list (dairy = dairy, fruit = fruit)
shopping.list
## $dairy
## $dairy$milk
## [1] "1 gallon"
## 
## $dairy$butter
## [1] "1 pound"
## 
## $dairy$eggs
## [1] 12
## 
## 
## $fruit
## $fruit$apples
## [1] 6
## 
## $fruit$oranges
## [1] 3
## 
## $fruit$bananas
## [1] 10
shopping.list[[c("dairy", "milk")]]
## [1] "1 gallon"
shopping.list[[c(1,2)]]
## [1] "1 pound"

2.6 R Code style standards

Standards for code style aren’t the same as syntax, although they are sort of related. It is usually wise to be careful about code style to maximize the readability of your code, making it easier for you and others to maintain.

Here, I’ve tried to stick to Google’s R Style Guide, which is available at http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html. Here is a summaryof its suggestions:

  • Indentation

    Indent lines with two spaces, not tabs. If code is inside parentheses, indent to the innermost parentheses.

  • Spacing

    Use only single spaces. Add spaces between binary operators and operands. Do not add spaces between a function name and the argument list. Add a single space between items in a list, after each comma.

  • Blocks

    Don’t place an opening brace (“{”) on its own line. Do place a closing brace (“}”) on its own line. Indent inner blocks (by two spaces).

  • Semicolons

    Omit semicolons at the end of lines when they are optional.

  • Naming

    Name objects with lowercase words, separated by periods. For function names, capitalize the name of each word that is joined together, with no periods. Try to make function names verbs.

3 Exercises

  • Ex1. Produce a 200*10 matrix m0 with each element normally distributed with mean 1 and variance 2. Using for or while, please transfer m0 to a new matrix m1 with same size, each column of which has sample mean 0 and sample variance 1.

  • Ex2. Read data from an xlsx file. Calculate sum and mean for each column and row, respectively. Treat each colum as a random variable, and calculate column’s correlation. And write the correlation to a new xlsx file. The data can be downloaded here.

4 References

  • R Language Definition
  • Kabacoff, R. I. . (2011). “R in Action”. Manning Publications Co.
  • Baeza, S. . (2015). “R For Beginners”. CreateSpace Independent Publishing Platform.
  • Adler, J. (2010). “R in a nutshell: A desktop quick reference”. O’Reilly Media, Inc.“.