Jason Freels
16 December 2016
Describe the types of elements used to create R objects
Define and create the primary classes of R objects
Vectors
Matrices
Data frames
Lists
Functions
Introduce more complex objects
Elements aren't really objects in R, but are single-valued vectors
However, the term 'element' is helpful to distinguish between types of objects in a vector
Elements Have One of Six Atomic modes
Function calls on objects containing elements with different atomic modes coerces the "higher" mode element to that of the "lowest" mode element
2 < "george"[1] TRUE
2 > "george"[1] FALSE
-2 < "-3"[1] TRUE
-2 < FALSE[1] TRUE
as.complex(-2)[1] -2+0i
as.character(-2)[1] "-2"
as.logical(-2) ### Only 0 returns as FALSE[1] TRUE
2 == 3[1] FALSE
2 > 3[1] FALSE
2 <= 3[1] TRUE
2 < 3 | 2 > 3 ### Is EITHER 2<3 OR 2>3 true?[1] TRUE
2 < 3 & 2 > 3 ### Are BOTH 2<3 AND 2>3 true?[1] FALSE
sqrt(4)[1] 2
exp(1)[1] 2.718282
log(1)[1] 0
log10(1)[1] 0
ceiling(pi)[1] 4
floor(pi)[1] 3
factorial(5)[1] 120
choose(4,2)[1] 6
Scalars are interpreted as a vectors containing a single element
Vectors can be created using one of four functions
a:b creates a vector of integer-differenced values \(\in [a,b]\)
c( ) concatenates various elements or vectors together
rep( ) repeats elements or patterns of elements
seq(m,n,o) generates a number sequence between \(m\) and \(n\) in \(o\) increments
"Binding" elements of different modes into a vector will coerce all of the elements into the lowest atomic mode
c( ) Coerces elements with various atomic modes into a vector object
Each element in the vector will have same atomic mode - whichever mode is lowest among the elements being concatenated
A <- c(1,2,3,4,5) ; A[1] 1 2 3 4 5
B <- c(6,7,8,9,10) ; B[1] 6 7 8 9 10
C <- c(A,B,"George") ; C ### Coerces the numbers in A & B to characters [1] "1" "2" "3" "4" "5" "6" "7"
[8] "8" "9" "10" "George"
D <- rep(1,10); D [1] 1 1 1 1 1 1 1 1 1 1
E <- rep(c(1,2,3,4),5); E [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
G <- rep(c(1,2,3,4), each=5); G [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
H <- seq(1,10) ; H [1] 1 2 3 4 5 6 7 8 9 10
I <- seq(4,20,by=4) ; I[1] 4 8 12 16 20
J <- seq(1,2, by=.1) ; J [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
K <- seq(10,2, by=-0.5) ; K [1] 10.0 9.5 9.0 8.5 8.0 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5
[15] 3.0 2.5 2.0
L <- seq(10,2, length=7) ; L[1] 10.000000 8.666667 7.333333 6.000000 4.666667 3.333333 2.000000
Let's look at the structure of the vector J defined previously
To do this we can use the structure function str( )
str(J) num [1:11] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
num - shows that this is a numeric vector
[1:11] - shows that J has 1 dimension with values in positions 1-11
As will be shown when discussing matrices, the dimensions of an \(11 \times 11\) matrix are expressed as [1:11, 1:11]
A < B | A > B[1] TRUE TRUE TRUE TRUE TRUE
A < B || A > B[1] TRUE
A < B & A > B[1] FALSE FALSE FALSE FALSE FALSE
A < B && A > B[1] FALSE
A + B[1] 7 9 11 13 15
A * B ## Scalar multiplication[1] 6 14 24 36 50
A%*%B ## Matrix multiplication [,1]
[1,] 130
round(sqrt(A),digits = 3)[1] 1.000 1.414 1.732 2.000 2.236
round(exp(A),digits = 2)[1] 2.72 7.39 20.09 54.60 148.41
round(log(A),digits = 3)[1] 0.000 0.693 1.099 1.386 1.609
round(log10(A),digits=3)[1] 0.000 0.301 0.477 0.602 0.699
sum(A)[1] 15
cumsum(A)[1] 1 3 6 10 15
prod(B)[1] 30240
cumprod(B)[1] 6 42 336 3024 30240
t(A) ### Returns the transpose of A [,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
t(t(A)) [,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
abs(-B/2)[1] 3.0 3.5 4.0 4.5 5.0
table(c(A,B/2)) ### displays how many times each unique value is observed
1 2 3 3.5 4 4.5 5
1 1 2 1 2 1 2
A ### Recall the value of the vector A defined earlier[1] 1 2 3 4 5
A[3] ## Returns the 3rd element of A[1] 3
A[1] <- 0.1 ; A ## Assigns the 1st element of A as 0.1[1] 0.1 2.0 3.0 4.0 5.0
A[-1] <- 4 ; A ## Assigns all but the 1st element of A as 4 [1] 0.1 4.0 4.0 4.0 4.0
A < 0.5[1] TRUE FALSE FALSE FALSE FALSE
A[4] == 2[1] FALSE
In R, matrices are created using one of three methods
edit( )Matrices are atomic, i.e. every element has the same atomic mode
Creating matrices from elements or vectors with different atomic modes will coerce every element to the lowest mode
The matrix( ) function is used to create a matrix object with the following arguments
data - Values to include in the matrix
ncol - Number of columns
nrow - Number of rows
byrow - Fill the matrix rows or by columns?
dimnames - Names applied to row and column headers
mat <- matrix(data = 1:9,
ncol = 3,
nrow = 3,
byrow = TRUE,
dimnames = list(c('A','B','C'),
c('D','E','F')))
mat D E F
A 1 2 3
B 4 5 6
C 7 8 9
edit( )This interactive 'GUI' method can be very fast, but requires user input
Start by creating a \(1 \times 1\) numeric matrix - the value you use doesn't matter
mat2 <- matrix(1)Then, use edit() to bring up a spreadsheet-style editor window
Make whatever changes you like to the matrix, then hit
Note
mat2 <- edit(mat2)Vectors
Note the use of c( ) to ensure that vec1, vec2, vec3, vec4 are all part of the argument data, and not considered as four separate arguments
vec1 <- 1:4
vec2 <- 5:8
vec3 <- 9:12
vec4 <- 13:16
vec.mat <- matrix(data = c(vec1, vec2, vec3, vec4),
ncol = 4)
vec.mat [,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
Matrices can also be built with rbind & cbind to merge vectors as rows or columns
Note that and cbind will coerce every element to the same atomic mode
rbind(vec1, vec2, vec3, vec4) [,1] [,2] [,3] [,4]
vec1 1 2 3 4
vec2 5 6 7 8
vec3 9 10 11 12
vec4 13 14 15 16
cbind(vec1, vec2, vec3, vec4) vec1 vec2 vec3 vec4
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
Matrix operations that are often of interest include
Returning the diagonal elements of a matrix
Computing the determinant of a matrix
Finding the inverse of a matrix
Finding the transpose of a matrix
Computing the Eigenvalues and Eigenvectors
To demonstrate matrix operations, let's first create a \(5 \times 5\) matrix of random integers in \([1,50]\)
mat1 <- matrix(sample(1:50,size = 25),
nrow = 5,
byrow = T)
mat1 [,1] [,2] [,3] [,4] [,5]
[1,] 33 14 25 19 49
[2,] 2 37 13 47 36
[3,] 44 21 39 4 46
[4,] 8 9 10 27 7
[5,] 12 15 18 11 17
diag(mat1) ### Diagonal elements `mat1`[1] 33 37 39 27 17
det(mat1) ### Determinant of `mat1`[1] 2390956
solve(mat1) ### Returns the inverse `mat1` if it exists [,1] [,2] [,3] [,4] [,5]
[1,] -0.07517830 0.036648102 0.13213627 0.070335046 -0.247424043
[2,] -0.09558018 0.060913710 0.10881254 0.006456413 -0.150590391
[3,] 0.06416847 -0.066029655 -0.11758058 -0.028874224 0.284919923
[4,] 0.01485264 -0.005037316 -0.01811660 0.037442763 0.001460504
[5,] 0.05984886 -0.006443448 -0.05306413 -0.049000065 0.063724719
t(mat1) ### Returns the tranpose of `mat1` [,1] [,2] [,3] [,4] [,5]
[1,] 33 2 44 8 12
[2,] 14 37 21 9 15
[3,] 25 13 39 10 18
[4,] 19 47 4 27 11
[5,] 49 36 46 7 17
eigen(mat1) ### Returns the eigenvalues and eigenvectors of `mat1`$values
[1] 108.921417+0.000000i 30.900944+0.000000i 18.157668+0.000000i
[4] -2.490014+5.737797i -2.490014-5.737797i
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.5278529+0i 0.31792088+0i -0.3921790+0i -0.50972262+0.24878917i
[2,] -0.4090286+0i -0.81783168+0i 0.8181752+0i -0.38372140+0.29275659i
[3,] -0.6561359+0i 0.47854674+0i -0.1987081+0i 0.57631369+0.00000000i
[4,] -0.2012046+0i -0.03029619+0i -0.3498562+0i 0.04229242-0.07909292i
[5,] -0.2882175+0i -0.01236186+0i 0.1220688+0i 0.13925026-0.29285803i
[,5]
[1,] -0.50972262-0.24878917i
[2,] -0.38372140-0.29275659i
[3,] 0.57631369+0.00000000i
[4,] 0.04229242+0.07909292i
[5,] 0.13925026+0.29285803i
Matrix elements can be accessed by specifying their [row,column] values
Matrix elements can also be accessed by specifying a single [element.index]
Matrix rows/columns can be accessed by specifying [row,] or [,column]
mat1[4,3][1] 10
mat1[5,5][1] 17
mat1[7][1] 37
mat1[24][1] 7
mat1[1,] ### Returns the first row of mat1[1] 33 14 25 19 49
mat1[,3] ### Returns the third column of mat1[1] 25 13 39 10 18
The syntax for function calls on arrays are similar to what was described for vectors and matrices
Arrays are atomic
Data frames are NOT atomic - but are comprised of atomic column vectors
Data frames are like atricies - but each column can have a different atomic mode
Data frames are primarily created by using the function data.frame()
NEVER use rbind or cbind to create a data.frame
Recall rbind & cbind coerces every element in every vector to the lowest atomic mode
age<-c(23, 35, 19) ### Numeric vector
sex<-c("Male", "Female", "Yes") ### Character vector
job<-c(TRUE, TRUE, FALSE) ### Logical vector
data.frame(age,sex,job, row.names = c("Jim", "Joe", "Ray")) age sex job
Jim 23 Male TRUE
Joe 35 Female TRUE
Ray 19 Yes FALSE
Data frames can also be created by loading local or online files
read.table loads tab-delimited data from .txt files (NotePad)
read.csv creates a data.frame from .csv files (installs w/ base R)
read_excel creates a data.frame from .xls and .xlsx files (requires the readxl package)
read.csv("http://www.fdic.gov/bank/individual/failed/banklist.csv", header = T)R installs with several data sets, such as mtcars
head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
class(mtcars)[1] "data.frame"
str(mtcars)'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
The $ operator is used to call data frame columns as mtcars$cyl
Think of lists as a suitcase - they can be used to store everything
Vectors
Matrices
Data Frames
Functions
Even other lists
Example: Function outputs You write script that inputs the cars data set & creates several outputs
The data set itself (a data.frame or matrix)
Some summary statistics of the data set summary(cars)
The maximum value of the log-likelihood function \(\left(\mathcal{L}\right)\) a \(1\times 1\) vector
You can assign each of these objects to a list
Concatenating numeric, logical and character elements results in an atomic-character vector
The vector is atomic because every element has the same atomic mode
Recall that c( ) coerces every element to the lowest atomic mode that will ensure homogeneity across the entire vector
V <- c(3,TRUE,"george") ; V[1] "3" "TRUE" "george"
Lists preserve the atomic mode of each object stored in the list
And because lists can hold entire objects, the structure of each object stored in the list is preserved
list1 <- list(3,TRUE,"george") ; list1[[1]]
[1] 3
[[2]]
[1] TRUE
[[3]]
[1] "george"
list2 <- list(head(mtcars),"george", A) ; list2 [[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
[[2]]
[1] "george"
[[3]]
[1] 0.1 4.0 4.0 4.0 4.0
str(list2)List of 3
$ :'data.frame': 6 obs. of 11 variables:
..$ mpg : num [1:6] 21 21 22.8 21.4 18.7 18.1
..$ cyl : num [1:6] 6 6 4 6 8 6
..$ disp: num [1:6] 160 160 108 258 360 225
..$ hp : num [1:6] 110 110 93 110 175 105
..$ drat: num [1:6] 3.9 3.9 3.85 3.08 3.15 2.76
..$ wt : num [1:6] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:6] 16.5 17 18.6 19.4 17 ...
..$ vs : num [1:6] 0 0 1 1 0 1
..$ am : num [1:6] 1 1 1 0 0 0
..$ gear: num [1:6] 4 4 4 3 3 3
..$ carb: num [1:6] 4 4 1 1 2 1
$ : chr "george"
$ : num [1:5] 0.1 4 4 4 4
Objects assigned to a list may be accessed by using double brackets [[ ]]
list1[[1]][1] 3
Components of objects stored in a list may also be accessed by
[[ ]] to call the vector, matrix, list, or data frame inside the list[ ], [[ ]], or $ to call the desired component of the object inside the list list2[[3]][1][1] 0.1
list2[[1]]$mpg[1] 21.0 21.0 22.8 21.4 18.7 18.1
Functions convert input objects into either output objects or plots
Functions can take any object as an argument, even other functions
Functions operate on distinct classes of objects (or functions)
Functions can return objects of the same class as the input objects or create a completely new class of object
Functions are comprised of
The function symbol used to call the function
The formal argument(s)
The informal argument(s)
The function body defining the operations to be performed
The code below creates an example function called foo( ), where...
foo \(\hspace{10pt}\) - is the function symbol used to call the function
x \(\hspace{18pt}\) - is a formal argument
a,b \(\hspace{10pt}\) - are informal arguments
a*x+b \(\hspace{1pt}\) - is the function body defining the operations to be performed
foo <- function(x) { ### Values inside ( ) are interpreted as function arguments
a <- 1 ;
b <- 2 ### Values separated by ";" are interpreted as separate lines
c <- a*x+b
return(c)
} ### Values inside { } are interpreted as the function bodyThe function foo is evaluated by specifying a value for the formal argument
In general, formal arguments can be numeric, character, or even other functions
For the function foo, the formal argument x must be either a vector or a matrix
Values are not specified for the informal arguments a and b since they we defined in the body of the function
foo(2)[1] 4
A problem arises, however, if I want to return the value of either a or b - an error is produced
This error results from the 'lexical scoping rules' on which R was built
Lexical scoping defines 'where' each object is defined and how we can interact with it
b ### Error: object "b" not foundWhen the function foo( ) was created, three things happened
A new environment was created
The body of the function foo( ) was assigned to this new environment
The symbol foo was assigned to the parent environment of this new environment
The lexical scoping rules define how R searches for a requested value
First, R searches the current environment for the requested value
If the value is not found in the current environment, R then searches the parent environment to the current environment
If necessary, R continues to search sucessive parent environments until the Global Environment is reached
If the value isn't found in the Global Environment, R moves to the parent of the Global Environment - the Empty Environment
Upon reaching the Empty Environment, R stops searching and returns an error that the value cannot be found
Thus, is the case of the function foo( ), a and b could not be found because...
They are local variables, assigned in the body of foo
Our request to find a and b was made in the parent environment to the body of foo( ), where only the symbol foo is available
foo livesFirst, let's determine what is the current environment
environment( )<environment: R_GlobalEnv>
Now, let's see in which environment the symbol foo is defined
This should be the same environment as that's listed above
BTW, if you refresh this presentation, this environment will be different
environment(foo)<environment: R_GlobalEnv>
The body of the function is defined in its own environment that is a child to the environment listed above
This can be seen by examining the structure of foo( )
str(foo) function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 8 8 1 8 1 1 8
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x000000001b9623b8>
When R/RStudio is opened the current environment will be the Global environment
Each new function created will have its own enclosing environment - only the function symbol can be accessed from the Global environment
a and b available in the current environment?Yes, there are at least three ways to do this
a and b outside of the body of foo( )Use Return( ) to ensure foo( ) returns a and b every time it is run
Use the deep-assignment operator <<-
Each of these methods are shown below
a <- 1 ; b <- 2 # Moving the informal arguments outside the function
foo <- function(x) {
a*x+b
}
foo(2) ; a ; b[1] 4
[1] 1
[1] 2
foo <- function(x) {
a <- 1 ; b <- 2
c <- a*x+b
return(list(a,b,c))
}
foo(2)[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 4
foo2<-function(x) {
a <<- 1 ; b <<- 2
c <- a*x+b
}
foo(2) ; a ; b[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 4
[1] 1
[1] 2
R arrives at the same result regardless of where a and b are defined
This is due to search procedure defined by the lexical scoping rules
a and ba and b or the search reaches the empty environmentThe empty environment is the parent to the global environment
Return( ) Allows Multiple Function Outputs<<- to Assign in Parent EnvironmentThe deep-assignment operator allows an inheritance to the parent environment
Use carefully, <<- can change base R values, giving unexpected side effects
What value do you think will be returned for bar(2)?
Don't scroll down until you've decided on an answer
a<-1
b<-2
foo<-function(x) {
a*x+b
}
bar<-function(x){
a<-2
b<-1
foo(x)
}
bar(2)The correct answer is 5
Rationale
foo cannot access the values of a and b defined within bar
The only values R can find for a and b are those defined in the global environment
The a and b values defined in bar are threfore ignored
bar(2)[1] 4
Thanks for sticking with it!
Review this presentation as needed and finally...checkout Special Functions