This chapter gives an overview of the R language, designed to help you understand R code and write your own.
R code is composed of a series of expressions. Examples of expressions in R include assignment statements, conditional statements, and arithmetic expressions. Here are a few examples of expressions:
x <- 1if (1 > 2) "yes" else "no"## [1] "no"
127 %% 10## [1] 7
Expressions are composed of objects and functions. You may separate expressions with new lines or with semicolons. For example, here is a series of expressions separated by semicolons:
"this expression will be printed"; 7 + 13; exp(0+1i*pi)## [1] "this expression will be printed"
## [1] 20
## [1] -1+0i
All R code manipulates objects. Examples of objects in R include numeric vectors, character vectors, lists, and functions. Here are some examples of objects:
c(1,2,3,4,5) # a numerical vector (with five elements)## [1] 1 2 3 4 5
"This is an object too" # a character vector (with one element)## [1] "This is an object too"
list(c(1,2,3,4,5),"This is an object too", " this is a list") # a list## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] "This is an object too"
##
## [[3]]
## [1] " this is a list"
function(x,y) {x + y} # a function## function(x,y) {x + y}
Formally, variable names in R are called symbols. When you assign an object to a variable name, you are actually assigning the object to a symbol in the current environment. For example, the statement:
x <- 1assigns the symbol x to the object 1 in the current environment.
A function is an object in R that takes some input objects (called the arguments of the function) and returns an output object. All work in R is done by functions. Every statement in R (setting variables, doing arithmetic, repeating code in a loop) can be written as a function. Here are a few more examples of R syntax and the corresponding function calls:
apples <- 3 # pretty assignment
apples## [1] 3
`<-`(apples,3) # functional form of assignment
apples## [1] 3
`<-`(oranges,4) # another assignment statement, so that we can compare apples and oranges
oranges## [1] 4
apples + oranges # pretty arithmetic expression## [1] 7
`+`(apples,oranges) # functional form of arithmetic expression## [1] 7
# pretty form of if-then statement
if (apples > oranges) "apples are better" else "oranges are better" ## [1] "oranges are better"
# functional form of if-then statement
`if`(apples > oranges,"apples are better","oranges are better") ## [1] "oranges are better"
x <- c("apple","orange","banana","pear")x[2] # pretty form of vector reference## [1] "orange"
`[`(x,2) # functional form or vector reference## [1] "orange"
In assignment statements, most objects are immutable. R will copy the object, not just the reference to the object. For example:
u <- list(1)
v <- u
u[[1]] <- "hat"
u## [[1]]
## [1] "hat"
v## [[1]]
## [1] 1
This is also true in function calls. Consider the following function,
f <- function(x,i) {x[i] = 4}
w <- c(10, 11, 12, 13)
f(w,1)
w## [1] 10 11 12 13
The vector w is copied when it is passed to the function, so it is not modified by the function. The value x is modified inside the context of the function.
In the last few sections, most examples of objects were objects that stored data: vectors, lists, and other data structures. However, everything in R is an object: functions, symbols, and even R expressions.
For example, function names in R are really symbol objects that point to function objects. (That relationship is, in turn, stored in an environment object.) You can assign a symbol to refer to a numeric object and then change the symbol to refer to a function:
x <- 1
x## [1] 1
x(2)## Error in x(2): could not find function "x"
x <- function(i) i^2
x## function(i) i^2
x(2)## [1] 4
There are a few special values that are used in R.
NAIn R, the NA values are used to represent missing values. (NAstands for “not available”.) You may encounter NA values in text loaded into R (to represent missing values) or in data loaded from databases (to replace NULL values).
v <- c(1,2,3)
v## [1] 1 2 3
length(v) <- 4
v## [1] 1 2 3 NA
Inf and -InfIf a computation results in a number that is too big, R will return Inf for a positive number and -Inf for a negative number (meaning positive and negative infinity, respectively):
2 ^ 1024## [1] Inf
-2 ^ 1024## [1] -Inf
This is also the value returned when you divide by 0:
1 / 0## [1] Inf
NaNSometimes, a computation will produce a result that makes little sense. In these cases, R will often return NaN (meaning “not a number”):
Inf - Inf## [1] NaN
0 / 0## [1] NaN
NULLAdditionally, there is a null object in R, represented by the symbol NULL. (The symbol NULL always points to the same object.) NULL is often used as an argument in functions to mean that no value was assigned to the argument. Additionally, some functions may return NULL. Note that NULL is not the same as NA, Inf, -Inf, or NaN.
x <- NULL
for(i in 1:5) x <- c(x,i)
x## [1] 1 2 3 4 5
When you call a function with an argument of the wrong type, R will try to coerce values to a different type so that the function will work. There are two types of coercion that occur automatically in R: coercion with formal objects and coercion with built-in types.
With generic functions, R will look for a suitable method. If no exact match exists, R will search for a coercion method that converts the object to a type for which a suitable method does exist.
Additionally, R will automatically convert between built-in object types when appropriate. R will convert from more specific types to more general types.
x <- c(1, 2, 3, 4, 5)
x## [1] 1 2 3 4 5
typeof(x)## [1] "double"
class(x)## [1] "numeric"
x[2] <- "hat"
x## [1] "1" "hat" "3" "4" "5"
typeof(x)## [1] "character"
class(x)## [1] "character"
Here is an overview of the coercion rules:
TRUE is converted to 1 and FALSE to 0.logical < integer < numeric < complex < character < list.raw are not converted to other types.It is possible to write almost any R expression as a function call. However, it’s confusing reading lots of embedded function calls, so R provides some special syntax to make code for common operations more readable.
Constants are the basic building blocks for data objects in R: numbers, character values, and symbols.
Numbers are interpreted literally in R.
The sequence operator a:b will return a vector of integers between a and b. To combine an arbitrary set of numbers into a vector, use the c() function:
v <- c(173,12,1.12312,-93)R allows a lot of flexibility when entering numbers. However, there is a limit to the size and precision of numbers that R can represent:
(2^1023 + 1) == 2^1023 # limits of precision## [1] TRUE
2^1024 # limits of size## [1] Inf
R also supports complex numbers. Complex values are written as real_part+imaginary_parti. For example:
0+1i ^ 2## [1] -1+0i
sqrt(-1+0i)## [1] 0+1i
exp(0+1i * pi)## [1] -1+0i
Note that the function sqrt() returns a value of the same type as its input; it will return the value 0+1i when passed -1+0i but will return an NaN value when just passed the numeric value -1:
sqrt(-1)Warning message:
In sqrt(-1) : NaNs producedA character object contains all of the text between a pair of quotes.
"hello"## [1] "hello"
'hello'## [1] "hello"
identical("\"hello\"",'"hello"')## [1] TRUE
identical('\'hello\'',"'hello'")## [1] TRUE
numbers <- c("one","two","three","four","five")
numbers## [1] "one" "two" "three" "four" "five"
An important class of constants is symbols. A symbol is an object in R that refers to another object; a symbol is the name of a variable in R. For example, let’s assign the numeric value 1 to the symbol x:
x <- 1<-), you would use a command like this:?`<-``1+2=3` <- "hello"
`1+2=3`## [1] "hello"
Not all words are valid as symbols; some words are reserved in R. Specifically, you can’t use if, else, repeat, while, function, for, in, next, break, TRUE, FALSE, NULL, Inf, NaN, NA, NA_integer_, NA_real_, NA_complex_, NA_character_, ..., ..1, ..2, ..3, ..4, ..5, ..6, ..7, ..8, or ..9.
You can redefine primitive functions that are not on this list. For example,
c## function (...) .Primitive("c")
c <- 1
c## [1] 1
Even after you redefine the symbol c, you can continue to use the “combine” function c() as before:
v <- c(1,2,3)Many functions in R can be written as operators. An operator is a function that takes one or two arguments and can be written without parentheses.
One familiar set of operators is binary operators for arithmetic. R supports arithmetic operations:
1 + 19 # addition## [1] 20
5 * 4 # multiplication## [1] 20
R also includes notation for other mathematical operations, including modulus, exponents, and integer division:
41 %% 21 # modulus## [1] 20
20 ^ 1 # exponents## [1] 20
21 %/% 2 # integer division## [1] 10
You can define your own binary operators. User-defined binary operators consist of a string of characters between two “%” characters. For example,
`%myop%` <- function(a, b) {2*a + 2*b}
1 %myop% 1## [1] 4
1 %myop% 2## [1] 6
Some language constructs are also binary operators. For example, assignment, indexing, and function calls are binary operators:
# assignment is a binary operator
# the left side is a symbol, the right is a value
x <- c(1,2,3,4,5)
# indexing is a binary operator too
# the left side is a symbol, the right is an index
x[3]## [1] 3
# a function call is also a binary operator
# the left side is a symbol pointing to the function argument
# the right side are the arguments
max(1,2)## [1] 2
There are also unary operators that take only one variable. Here are two familiar examples:
-7 # negation is a unary operator## [1] -7
?`?` # ? (for help) is also a unary operatorIn order to resolve ambiguity, operators in R are always interpreted in the same order. Here is a summary of the precedence rules: Function calls and grouping expressions
Table Operator precedence.
| Operators (in order of priority) | Description |
|---|---|
( { |
Function calls and grouping expressions (respectively) |
[ [[ |
Indexing |
:: ::: |
Access variables in a namespace |
$ @ |
Component / slot extraction |
^ |
Exponentiation (right to left) |
- + |
Unary minus and plus |
: |
Sequence operator |
%any% |
Special operators |
* / |
Multiply, divide |
+ - |
(Binary) add, subtract |
< > <= >= == != |
Ordering and comparison |
! |
Negation |
& && |
And |
| || |
Or |
~ |
As in formulas |
-> ->> |
Rightward assignment |
= |
Assignment (right to left) |
<- <<- |
Assignment (right to left) |
? |
Help (unary and binary) |
For a current list of built-in operators and their precedence, see the help file for syntax.
R provides different constructs for grouping together expressions: semicolons, parentheses, and curly braces.
You can write a series of expressions on separate lines:
x <- 1
y <- 2
z <- 3Alternatively, you can place them on the same line, separated by semicolons:
x <- 1; y <- 2; z <- 3The parentheses notation returns the result of evaluating the expression inside the parentheses:
(expression)## function (...) .Primitive("expression")
The operator has the same precedence as a function call. In fact, grouping a set of expressions inside parentheses is equivalent to evaluating a function of one argument that just returns its argument:
2 * (5 + 1)## [1] 12
f <- function (x) x # equivalent expression
2 * f(5 + 1)## [1] 12
Grouping expressions with parentheses can be used to override the default order of operations. For example:
2 * 5 + 1## [1] 11
2 * (5 + 1)## [1] 12
Curly braces are used to evaluate a series of expressions (separated by new lines or semicolons) and return only the last expression:
{expression_1; expression_2; ... expression_n}Often, curly braces are used to group a set of operations in the body of a function:
f <- function() {x <- 1; y <- 2; x + y}
f()## [1] 3
However, curly braces can also be used as expressions in other contexts:
{x <- 1; y <- 2; x + y}## [1] 3
The contents of the curly braces are evaluated inside the current environment; a new environment is created by a function call but not by the use of curly braces:
# when evaluated in a function, u and v are assigned
# only inside the function environment
remove(list = ls())
f <- function() {u <- 1; v <- 2; u + v}uError: object "u" not foundvError: object "v" not found# when evaluated outside the function, u and v are
# assigned in the current environment
{u <- 1; v <- 2; u + v}## [1] 3
Nearly every operation in R can be written as a function, but it isn’t always convenient to do so. Therefore, R provides special syntax that you can use in common program structures. We’ve already described two important sets of constructions: operators and grouping brackets. This section describes a few other key language structures and explains what they do.
Conditional statements take the form:
if (condition) true_expression else false_expressionor, alternatively:
if (condition) expressionBecause the expressions expression, true_expression, and false_expression are not always evaluated, the function if has the type special:
typeof(`if`)## [1] "special"
Here are a few examples of conditional statements:
if (FALSE) "this will not be printed"
if (FALSE) "this will not be printed" else "this will be printed"## [1] "this will be printed"
x <-1
if (is(x, "numeric")) x/2 else print("x is not numeric")## [1] 0.5
In R, conditional statements are not vector operations. If the condition statement is a vector of more than one logical value, only the first item will be used. For example:
x <- 10
y <- c(8, 10, 12, 3, 17)
if (x < y) x else y## 8 10 12 3 17
Warning message:
In if (x < y) x else y :
the condition has length > 1 and only the first element will be usedIf you would like a vector operation, use the ifelse function instead:
a <- c("a","a","a","a","a")
b <- c("b","b","b","b","b")
ifelse(c(TRUE,FALSE,TRUE,FALSE,TRUE),a,b)## [1] "a" "b" "a" "b" "a"
There are three different looping constructs in R. Simplest is repeat, which just repeats the same expression:
repeat expressionTo stop repeating the expression, you can use the keyword break. To skip to the next iteration in a loop, you can use the command next.
i <- 5
repeat {if (i > 25) break else {print(i); i <- i + 5;}}## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25
If you do not include a break command, the R code will be an infinite loop. (This can be useful for creating an interactive application.)
while loops, which repeat an expression while a condition is true: while (condition) expressionAs a simple example, let’s rewrite the example above using a while loop:
i <- 5
while (i <= 25) {print(i); i <- i + 5}## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25
You can also use break and next inside while loops. The break statement is used to stop iterating through a loop. The next statement skips to the next loop iteration without evaluating the remaining expressions in the loop body.
for (var in list) expressionLet’s use the same example for a for loop:
for (i in seq(from=5,to=25,by=5)) print(i)## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25
You can also use break and next inside for loops.
There are two important properties of looping statements to remember. First, results are not printed inside a loop unless you explicitly call the print function. For example:
for (i in seq(from=5,to=25,by=5)) iSecond, the variable var that is set in a for loop is changed in the calling environment:
i <- 1
for (i in seq(from=5,to=25,by=5)) i
i## [1] 25
Like conditional statements, the looping functions repeat, while, and for have type special, because expression is not necessarily evaluated.
Find the limit of the following sequence:
\(x_{n+1} = (x_n+2/x_n)/2\).
Using a for loop:
x0 <- 1
for (i in 1:100) {
x0 <- (x0 + 2/x0)/2
}
x0## [1] 1.414214
Using a while loop:
x0 <- 1; x1 <- 0;
while (abs(x1-x0) > 1e-8) {
x1 = x0
x0 <- (x1 + 2/x1)/2
}
x0## [1] 1.414214
Using a repeat loop:
x0 <- 1
repeat {
x1 = x0
x0 <- (x1 + 2/x1)/2
if (abs(x1-x0) < 1e-8) break
}
x0## [1] 1.414214
Technically speaking, switch is just another function, but its semantics are close to those of control structures of other programming languages.
switch (statement, list)NULL is returned.x <- 3
switch(x, 2+2, mean(1:10), rnorm(5))## [1] 0.2241614 0.3303874 -0.4828706 -0.4151718 1.3656318
switch(2, 2+2, mean(1:10), rnorm(5))## [1] 5.5
switch(6, 2+2, mean(1:10), rnorm(5))... with a name that exactly matches value is evaluated. If there is no match a single unnamed argument will be used as a default. If no default is specified, NULL is returned.y <- "fruit"
switch(y, fruit = "banana", vegetable = "broccoli", "Neither")## [1] "banana"
y <- "meat"
switch(y, fruit = "banana", vegetable = "broccoli", "Neither")## [1] "Neither"
R has some specialized syntax for accessing data structures. You can fetch a single item from a structure, or multiple items (possibly as a multidimensional array) using R’s index notation. You can fetch items by location within a data structure or by name.
The following table shows the operators in R used for accessing objects in a data structure. Table Data structure access notation.
| Syntax | Objects | Description |
|---|---|---|
x[i] |
Vectors, lists | Returns objects from object x, described by i. i may be an integer vector, character vector (of object names), or logical vector. Does not allow partial matches. When used with lists, returns a list. When used with vectors, returns a vector. |
x[[i]] |
Vectors, lists | Returns a single element of x, matching i. i may be an integer or character vector of length 1. Allows partial matches (with exact=FALSE option). |
x$n |
Lists | Returns object with name n from object x |
x@n |
S4 objects | Returns element stored in slot named n |
Although the single-bracket notation and double-bracket notation look very similar, there are three important differences. First, double brackets always return a single element, while single brackets may return multiple elements. Second, when elements are referred to by name (as opposed to by index), single brackets only match named objects exactly, while double brackets allow partial matches. Finally, when used with lists, the single-bracket notation returns a list, but the double-bracket notation returns a vector. I’ll explain how to use this notation below.
The most familiar way to look up an element in R is by numeric vector. For example,
v <- 100:119
v[5]## [1] 104
v[1:5]## [1] 100 101 102 103 104
v[c(1,6,11,16)]## [1] 100 105 110 115
As a special case, you can use the double-bracket notation to reference a single element:
v[[3]]## [1] 102
The double-bracket notation works the same as the single-bracket notation in this case.
We can use negative integers to return a vector consisting of all elements except the specified elements:
# exclude elements 1:15 (by specifying indexes -1 to -15)
v[-15:-1]## [1] 115 116 117 118 119
The same notation applies to lists:
obs <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)
obs[1:3]## $a
## [1] 1
##
## $b
## [1] 2
##
## $c
## [1] 3
obs[-7:-1]## $h
## [1] 8
##
## $i
## [1] 9
##
## $j
## [1] 10
We can use this notation to extract parts of multidimensional data structures:
m <- matrix(data=c(101:112),nrow=3,ncol=4)
m## [,1] [,2] [,3] [,4]
## [1,] 101 104 107 110
## [2,] 102 105 108 111
## [3,] 103 106 109 112
m[3]## [1] 103
m[3,4]## [1] 112
m[1:2,1:2]## [,1] [,2]
## [1,] 101 104
## [2,] 102 105
If you omit a vector specifying a set of indices for a dimension, then elements for all indices are returned:
m[1:2,]## [,1] [,2] [,3] [,4]
## [1,] 101 104 107 110
## [2,] 102 105 108 111
m[3:4]## [1] 103 104
m[,3:4]## [,1] [,2]
## [1,] 107 110
## [2,] 108 111
## [3,] 109 112
When selecting a subset, R will automatically coerce the result to the most appropriate number of dimensions. If you select a subset of elements that corresponds to a matrix, R will return a matrix object; if you select a subset that corresponds to only a vector, R will return a vector object. To disable this behavior, you can use the drop=FALSE option:
a <- array(data=c(101:124),dim=c(2,3,4))
class(a[1,1,])## [1] "integer"
class(a[1,,])## [1] "matrix"
class(a[1:2,1:2,1:2])## [1] "array"
class(a[1,1,1,drop=FALSE])## [1] "array"
It is also possible to replace elements in a vector, matrix, or array using the same notation:
m[1] <- 1000
m## [,1] [,2] [,3] [,4]
## [1,] 1000 104 107 110
## [2,] 102 105 108 111
## [3,] 103 106 109 112
m[1:2,1:2] <- matrix(c(1001:1004),nrow=2,ncol=2)
m## [,1] [,2] [,3] [,4]
## [1,] 1001 1003 107 110
## [2,] 1002 1004 108 111
## [3,] 103 106 109 112
It is even possible to extend a data structure using this notation. A special NA element is used to represent values that are not defined:
v <- 1:12
v[15] <- 15
v## [1] 1 2 3 4 5 6 7 8 9 10 11 12 NA NA 15
The data structure can indexed by a factor; the factor is interpreted as an integer vector.
As an alternative to indexing by an integer vector, you can also index through a logical vector. For example,
rep(c(TRUE,FALSE),10)## [1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
## [12] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
v[rep(c(TRUE,FALSE),10)]## [1] 1 3 5 7 9 11 NA 15 NA NA
v[(v==103)]## [1] NA NA
v[(v %% 3 == 0)]## [1] 3 6 9 12 NA NA 15
v[c(TRUE,FALSE,FALSE)]## [1] 1 4 7 10 NA
As above, the same notation applies to lists:
obs <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)
obs[(obs > 7)]## $h
## [1] 8
##
## $i
## [1] 9
##
## $j
## [1] 10
With lists, each element may be assigned a name. You can index an element by name using the $ notation:
obs <- list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)
obs$j## [1] 10
obs[c("a","b","c")]## $a
## [1] 1
##
## $b
## [1] 2
##
## $c
## [1] 3
list can be indexed by name using the double-bracket notation when selecting a single element. It is even possible to index by partial name using the exact=FALSE option:
dairy <- list(milk="1 gallon", butter="1 pound", eggs=12)
dairy$milk## [1] "1 gallon"
dairy[["milk"]]## [1] "1 gallon"
dairy[["mil"]]## NULL
dairy[["mil",exact=FALSE]]## [1] "1 gallon"
Sometimes, an object is a list of lists. You can also use the double-bracket notation to reference an element in this type of data structure. To do this, use a vector as an argument. R will iterate through the elements in the vector, referencing sublists:
fruit <- list(apples=6, oranges=3, bananas=10)
dairy <- list(milk="1 gallon", butter="1 pound", eggs=12)
shopping.list <- list (dairy = dairy, fruit = fruit)
shopping.list## $dairy
## $dairy$milk
## [1] "1 gallon"
##
## $dairy$butter
## [1] "1 pound"
##
## $dairy$eggs
## [1] 12
##
##
## $fruit
## $fruit$apples
## [1] 6
##
## $fruit$oranges
## [1] 3
##
## $fruit$bananas
## [1] 10
shopping.list[[c("dairy", "milk")]]## [1] "1 gallon"
shopping.list[[c(1,2)]]## [1] "1 pound"
Standards for code style aren’t the same as syntax, although they are sort of related. It is usually wise to be careful about code style to maximize the readability of your code, making it easier for you and others to maintain.
Here, I’ve tried to stick to Google’s R Style Guide, which is available at http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html. Here is a summaryof its suggestions:
Indentation
Indent lines with two spaces, not tabs. If code is inside parentheses, indent to the innermost parentheses.
Spacing
Use only single spaces. Add spaces between binary operators and operands. Do not add spaces between a function name and the argument list. Add a single space between items in a list, after each comma.
Blocks
Don’t place an opening brace (“{”) on its own line. Do place a closing brace (“}”) on its own line. Indent inner blocks (by two spaces).
Semicolons
Omit semicolons at the end of lines when they are optional.
Naming
Name objects with lowercase words, separated by periods. For function names, capitalize the name of each word that is joined together, with no periods. Try to make function names verbs.
Ex1. Produce a 200*10 matrix m0 with each element normally distributed with mean 1 and variance 2. Using for or while, please transfer m0 to a new matrix m1 with same size, each column of which has sample mean 0 and sample variance 1.
Ex2. Read data from an xlsx file. Calculate sum and mean for each column and row, respectively. Treat each colum as a random variable, and calculate column’s correlation. And write the correlation to a new xlsx file. The data can be downloaded here.