R is an object-oriened language.
It can run variables, data, functions, results, edtc. They are stored in active computer memory as “objects”
Classes are kinds of objects
R has five “atomic” classes:
Objects have attributes
Attributes are “metadata” or data about the data
attributes(): access the attributes of R objects
names(): access the names of an object
dim(): access the dimensions of matrices and arrrays
class(): access the class of an object
length: access the length of an object
There are different naming conventions:
Lower case separated by a dot:
Lower case separated by underscore:
Lower-Upper case:
Upper-Upper Case
Avoid using names identical to R functions
Object names are cae senstitive - BE CONSISTENT
In R, the most basic & fundamental type of object is the vector
Elements within one vector have to be the same “atomic” class
Numbers are treated as double precision real numbers as numeric objects
x <- 1 #Stores 1.00 in x
To explicitly store in integer, you need to specify suffix L
x <-1L
Inf represents infinity and allows us to represent values like 1/0
Individual numbers are one-element vectors
Like numbers, individual character strings are one-element vectors of character
R has many functions to manipulate strings. Many deal with putting strings together or taking them apart
u <- "abc" #A one-element vector
u <- paste("abc", "de", "f") #combines the strings together
u
## [1] "abc de f"
To created a vector, use c() function. “c” stands for concatenate
x <- c(0.5, 0.6, 0.7) #numeric
x <- c(TRUE, TRUE, FALSE) #logical
x <- c("0.5", "0.6", "0.7") #character
x <- vector("numeric", length=10) #will create the vector, but not display any content
is.na(10) #tests whether objects are or contain missing values, NA
## [1] FALSE
is.nan(10) #tests whether objects are or contain values that aren't numbers (Not a Number), NaN
## [1] FALSE
sqrt(-1)
## Warning in sqrt(-1): NaNs produced
## [1] NaN
Inf-Inf
## [1] NaN
NaN values are NA, but the converse is not true
x <- c(1,3,5,NA)
is.na(x)
## [1] FALSE FALSE FALSE TRUE
x <- c(1,3,NA,NaN)
is.nan(x)
## [1] FALSE FALSE FALSE TRUE
Null
Tests whether an object is NULL, a special R object. NULL is counted as non-existent
length(NA)
## [1] 1
length(NaN)
## [1] 1
length(NULL)
## [1] 0
The element in a vector can optionally be given name with names
x <- c(1,2,3)
names(x)
## NULL
names(x) <- c("KS", "MO", "IL")
names(x)
## [1] "KS" "MO" "IL"
#Names can be used to call the attached values
x["KS"]
## KS
## 1
#Names can be removed with NULL
names(x) <- NULL
x
## [1] 1 2 3
Elements of a vector must be from the same atomic class. When we mix object classes in a single vector, coercion occurs. It will assign all the characters in a vector the same type. This is called “implicit coercion.”
Implicit Coercion order: logical -> integer -> numeric -> complex -> character
You can force against this order using “as. functions”
y <- c(1.7, "a") #will become the "character" class
y <- c(TRUE, 2) #will become the "numeric" class
y <- c("a", TRUE) #will become the "character" class
You can also explicitly coerce objects. Strange things can happen when we force one basic data type into another:
x <- 0:6
class(x)
## [1] "integer"
as.numeric(x) #force x to be numeric
## [1] 0 1 2 3 4 5 6
as.logical(x) #force x to be logical (0 will be false)
## [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
as.character(x) #force x to be a character - the numbers will be stored as text
## [1] "0" "1" "2" "3" "4" "5" "6"
x <- c("a", "b")
as.numeric(x) #force x to be numeric, but these values are incompatible
## Warning: NAs introduced by coercion
## [1] NA NA
An operator does basic functions. For example, addition operator “+”
Numbers are “one-element” vectors
2+3 #Not acting on vectors
## [1] 5
a <- 2 #This is a vector
b <- 3 #This is also a vector
a+b
## [1] 5
Add/Multiply/Divide two vectors get element-wise result
c(1,2,4) + c(5,0,-1) #1+5, 2+0, 4+(-1)
## [1] 6 2 3
c(1,2,4) * c(5,0,-1) #1*5, 2*0, 4*(-1)
## [1] 5 0 -4
c(1,2,4) / c(5,0,-1)
## [1] 0.2 Inf -4.0
When applying an operation to two vectors that requires them to be the same length, R recycles or repeats the shorter vector, until it is long enough to match the longer one
c(1,2,3) + 1 #will add 1 to 1, 2, and 3
## [1] 2 3 4
c(1,2,4) + c(6, 0, 9, 20, 22) #will add 1+6, 2+0, 4+9, 1+20, 2+22
## Warning in c(1, 2, 4) + c(6, 0, 9, 20, 22): longer object length is not a
## multiple of shorter object length
## [1] 7 2 13 21 24
Indexing Vectors: “giving an address,” or forming a sub-vector by picking elements of a given vector for specific indices
y <- c(1.2, 3.9, 0.4, 0.12)
y[c(1,3)] #pull the values at index 1 and index 3
## [1] 1.2 0.4
v <- 3:4
y[v] #pull the values at index 3 and index 4
## [1] 0.40 0.12
y[c(1,3,1)] #pull the values at index 1, index 3, and index 1
## [1] 1.2 0.4 1.2
y[-1] #EXCLUDE the value at index 1
## [1] 3.90 0.40 0.12
Vectors can be created using the colon operator
1:3 #create a vector containing 1:3
## [1] 1 2 3
i <- 2
1:i-1 #This creates the vector 1:2 and multiplies it by the vector 1
## [1] 0 1
1:(i-1) #1:(2-1), or 1:1, vector 1
## [1] 1
Or using seq()
This is very important
seq(from=12, to=30, by=3)
## [1] 12 15 18 21 24 27 30
seq(from=12, to=13, by=0.1)
## [1] 12.0 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 13.0
rep() can be used to generate the same constant into long
vectors
rep(8,4)
## [1] 8 8 8 8
rep(c(5,12,13),3)
## [1] 5 12 13 5 12 13 5 12 13
rep(c(5,12,13), each=2)
## [1] 5 5 12 12 13 13
Among a vector, report if all/any of their elements are TRUE
x <- 1:10 #x is 1 - 10
x > 8 #x is greater than 8
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
any(x>8) #is x ever greater than 8?
## [1] TRUE
all(x>8) #is x always greater than 8?
## [1] FALSE
all(x>0) #is x always greater than 0?
## [1] TRUE
Suppose we have a function f()
We have to apply this on all elements in a vector x
In many cases, we can accomplish this by simply call f() on x
This process is called a vectorized operation - it’s simple and fast
u <- c(5,2,8)
v <- c(1,3,9)
u>v #will apply to each value in the vector
## [1] TRUE FALSE FALSE
u[1]>v[1] #Is the first element of u greater than the first element of v?
## [1] TRUE
v[3]<u[2] #Is the third element of v smaller than the second element of u?
## [1] FALSE
Some functions are also vectorized
sqrt(1:9)
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
## [9] 3.000000
Filtering allows us to extract a vector’s elements that satisfies certain conditions
In the following examples, we generate a Boolean vector first, them use the Boolean fector to filter elements in the original vector
z <- c(5,2,-3,8)
w<-z[z*z>8] #vector z where z*z is greater than 8
w
## [1] 5 -3 8
x <- 1:5
x[x>3] <- 0 #assign 0 to all values where x is greater than 3
x
## [1] 1 2 3 0 0
Subset returns the values that satisfy the rules.
Which returns the index of the values that satisfy the rules
x <- c(1:5, NA, 12)
x
## [1] 1 2 3 4 5 NA 12
x[x>5] #Elements of x where x is greater than 5
## [1] NA 12
subset(x, x>5) #Notice how the NA value is handled differently when using subset
## [1] 12
which(x>3) #Which returns the INDEX
## [1] 4 5 7
Test if two vectors are equal using “==”
x <- 1:3
y <- c (1,3,4)
x==y #checks each vector value independently
## [1] TRUE FALSE FALSE
all(x==y)
## [1] FALSE
Matrices are vectors with an additional dimension attribute describing their size
An R matrix corresponds to a mathematical matrix
x <- matrix(nrow = 3, ncol = 2)
dim(x) #dimension of x
## [1] 3 2
attributes(x) #what do I know about x?
## $dim
## [1] 3 2
class(x) #what is the class of x?
## [1] "matrix" "array"
x #print the matrix (currently empty)
## [,1] [,2]
## [1,] NA NA
## [2,] NA NA
## [3,] NA NA
Matrices can be created with vectors by adding a dimension attribute
Matrices are construct column-wise starting in the upper left corner and running down the columns
x <- 1:10
x
## [1] 1 2 3 4 5 6 7 8 9 10
dim(x) <- c(2,5) #Assign x the dimensions 2x5
x
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
m.values.2 <- seq(5,45, by=5) #Values can also be assigned with sequence
m.values.2
## [1] 5 10 15 20 25 30 35 40 45
dim(m.values.2) <- c(3,3)
m.values.2
## [,1] [,2] [,3]
## [1,] 5 20 35
## [2,] 10 25 40
## [3,] 15 30 45
dim(x) #check the dimensions of a matrix
## [1] 2 5
nrow(x) #check the number of rows in a matrix
## [1] 2
ncol(x) #check the number of columns in a matrix
## [1] 5
Matrices can be created by column-binding (cbind()) or row-binding (rbind()):
x <- 1:3
y <- 10:12
cbind(x,y) #bind x to column 1 and y to column 2
## x y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
Matrices are indexed using double subscripting
You can extract a submatrix from a matrix
x <- 1:10
dim(x) <- c(2,5)
x
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
x[1,2] #Pull value at row 1, column 2
## [1] 3
x[1:2, 3:4] #Pull rows 1-3 and columns 3-4
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
x.1 <- x[1:2, 3:4] #These values can be saved as a new matrix
Common Linear Algebra Operation:
Matrix multiplication
Matrix addition
Etc
y <- matrix(1:4, nrow=2)
y
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
y*y #BAD DON'T USE THIS
## [,1] [,2]
## [1,] 1 9
## [2,] 4 16
y%*%y #use this :)
## [,1] [,2]
## [1,] 7 15
## [2,] 10 22
y+y #matrix addition
## [,1] [,2]
## [1,] 2 6
## [2,] 4 8
A matrix can be transposed by using
t()
z <- matrix(1:6, nrow=3, ncol=2)
z
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
t(z) #transpose z
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
Sometimes, subset a row from a matrix may not work as expected
This seems natrual, but sometimes when you plan to ge a 1 by k matrix you get a k length vector which is k by 1. This could ruin a computation
In this example, z is not a 1 by 2 matrix as we expected. Z is displayed as a vector
z <- matrix(1:6, nrow=3, ncol=2)
z
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
z[2,] #This creates a vector, not a matrix
## [1] 2 5
z[2,,drop=FALSE] #This is corrected with drop
## [,1] [,2]
## [1,] 2 5
The apply function family is one of the most famous features in R:
apply()
lapply()
sapply()
tapply()
etc
We will show how to use apply() on the mean function of each
column of a matrix
apply(X, MARGIN, FUN)
X: an array, including a matrix.
MARGIN: a vector giving the subscripts which the function will be applied over. For a matrix:
1 indicates rows
2 indicates columns
c(1, 2) indicates rows and columns.
Where X has named dimnames, it can be a character vector selecting dimension names.
FUN: the function to be applied: see ‘Details’. In the case of functions like +, %*%, etc., the function name must be backquoted or quoted.
z <- matrix(1:6, nrow=3, ncol=2)
z
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
apply(z,2,mean) #apply to matrix z, columns, the function mean
## [1] 2 5
f <- function(x)
{
mean(x)/2
}
z <- matrix(1:6, nrow=3, ncol=2)
z
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
apply(z,2,f) #apply to matrix z, columns, the function f
## [1] 1.0 2.5
Arrays are vectors too
Arrays are one or more additional dimensions
y<- c(1:20)
y
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
dim(y) <- c(2,5,2) #Can be made with dim(). 2 rows, 5 columns, 2 levels
y
## , , 1
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
##
## , , 2
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 11 13 15 17 19
## [2,] 12 14 16 18 20
y<- c(1:20)
y
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
y <- array(y,c(2,5,2)) #can be made with array(). 2 rows, 5 columns, 2 levels
y
## , , 1
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
##
## , , 2
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 11 13 15 17 19
## [2,] 12 14 16 18 20
rm(list=ls()) #clear all previous objects in the environment
stu.name <- c("John", "Kelly", "Arav","Mahi","List","Mary","Xing","Josh", "Kim", "Dev", "Linda")
midterm.score <- c(72,71,83,86,79,90,85,92,74,89,NA)
final.score <- c(85,81,94,72,80,79,90,92,70,91,NA)
#class of each vector
class(stu.name)
## [1] "character"
class(midterm.score)
## [1] "numeric"
#basic operations
mean(midterm.score) #will output NA because of NA
## [1] NA
max(0)
## [1] 0
min(0)
## [1] 0
#ignore the NA in a dataset
mean(midterm.score, na.rm=TRUE)
## [1] 82.1
#Another way of handling NAs
is.na(midterm.score)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
table(is.na(midterm.score)) #How many NA's are there in a dataset
##
## FALSE TRUE
## 10 1
keep.tf <- !is.na(midterm.score) #! is not
keep.tf #all the values where there is not an NA are going to be true
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
mean(midterm.score[keep.tf]) #mean of the midterm score at the index where keep.tf is true
## [1] 82.1
#calculate the course grade by creating the matrix using midterm and final score
length(stu.name)
## [1] 11
all.score <- matrix(nrow=2,ncol=length(stu.name))
all.score[1,] <- midterm.score #insert midterm scores into the first row
all.score[2,] <- final.score #insert final scores into the second row
all.score
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## [1,] 72 71 83 86 79 90 85 92 74 89 NA
## [2,] 85 81 94 72 80 79 90 92 70 91 NA
#calculate the course grade
apply(all.score,2,mean)
## [1] 78.5 76.0 88.5 79.0 79.5 84.5 87.5 92.0 72.0 90.0 NA
course.grade <- apply(all.score,2,mean) #hold the values in course grade
names(course.grade) <- stu.name #assign the student names as names for the course grade values
course.grade["Arav"]
## Arav
## 88.5
course.grade[course.grade>90]
## Josh <NA>
## 92 NA
course.grade[course.grade<80]
## John Kelly Mahi List Kim <NA>
## 78.5 76.0 79.0 79.5 72.0 NA
#Remove the NA student from the table
na.omit(course.grade)
## John Kelly Arav Mahi List Mary Xing Josh Kim Dev
## 78.5 76.0 88.5 79.0 79.5 84.5 87.5 92.0 72.0 90.0
## attr(,"na.action")
## Linda
## 11
## attr(,"class")
## [1] "omit"