Directory



Demonstration Code

R Objects

R is an object-oriened language.

It can run variables, data, functions, results, edtc. They are stored in active computer memory as “objects”


Classes

Classes are kinds of objects

R has five “atomic” classes:

  1. character
  2. numeric
  3. integer
  4. logical
  5. complex (a combination of multiple classes)

Ways to Store Objects

Objects have attributes

Attributes are “metadata” or data about the data


Naming Conventions

There are different naming conventions:

Lower case separated by a dot:

Lower case separated by underscore:

Lower-Upper case:

Upper-Upper Case

Avoid using names identical to R functions

Object names are cae senstitive - BE CONSISTENT


Vectors

In R, the most basic & fundamental type of object is the vector

Elements within one vector have to be the same “atomic” class


Numbers

Numbers are treated as double precision real numbers as numeric objects

x <- 1 #Stores 1.00 in x

To explicitly store in integer, you need to specify suffix L

x <-1L

Inf represents infinity and allows us to represent values like 1/0

Individual numbers are one-element vectors


Character Strings

Like numbers, individual character strings are one-element vectors of character

R has many functions to manipulate strings. Many deal with putting strings together or taking them apart

u <- "abc" #A one-element vector
u <- paste("abc", "de", "f") #combines the strings together
u
## [1] "abc de f"

Creating Vectors

To created a vector, use c() function. “c” stands for concatenate

x <- c(0.5, 0.6, 0.7) #numeric
x <- c(TRUE, TRUE, FALSE) #logical
x <- c("0.5", "0.6", "0.7") #character
x <- vector("numeric", length=10) #will create the vector, but not display any content

Missing Values

is.na(10) #tests whether objects are or contain missing values, NA
## [1] FALSE
is.nan(10) #tests whether objects are or contain values that aren't numbers (Not a Number), NaN
## [1] FALSE
sqrt(-1)
## Warning in sqrt(-1): NaNs produced
## [1] NaN
Inf-Inf
## [1] NaN


NaN values are NA, but the converse is not true

x <- c(1,3,5,NA)
is.na(x)
## [1] FALSE FALSE FALSE  TRUE
x <- c(1,3,NA,NaN)
is.nan(x)
## [1] FALSE FALSE FALSE  TRUE


Null

Tests whether an object is NULL, a special R object. NULL is counted as non-existent

length(NA)
## [1] 1
length(NaN)
## [1] 1
length(NULL)
## [1] 0

Vector Names

The element in a vector can optionally be given name with names

x <- c(1,2,3)
names(x)
## NULL
names(x) <- c("KS", "MO", "IL")
names(x)
## [1] "KS" "MO" "IL"
#Names can be used to call the attached values
x["KS"] 
## KS 
##  1
#Names can be removed with NULL
names(x) <- NULL
x
## [1] 1 2 3

Coercion

Elements of a vector must be from the same atomic class. When we mix object classes in a single vector, coercion occurs. It will assign all the characters in a vector the same type. This is called “implicit coercion.”

Implicit Coercion order: logical -> integer -> numeric -> complex -> character

You can force against this order using “as. functions”

y <- c(1.7, "a") #will become the "character" class
y <- c(TRUE, 2) #will become the "numeric" class
y <- c("a", TRUE) #will become the "character" class

You can also explicitly coerce objects. Strange things can happen when we force one basic data type into another:

x <- 0:6
class(x)
## [1] "integer"
as.numeric(x) #force x to be numeric
## [1] 0 1 2 3 4 5 6
as.logical(x) #force x to be logical (0 will be false)
## [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
as.character(x) #force x to be a character - the numbers will be stored as text
## [1] "0" "1" "2" "3" "4" "5" "6"
x <- c("a", "b")
as.numeric(x) #force x to be numeric, but these values are incompatible
## Warning: NAs introduced by coercion
## [1] NA NA

Vector Operations


Atrithmetic and Logical Operations

An operator does basic functions. For example, addition operator “+”

Numbers are “one-element” vectors

2+3 #Not acting on vectors
## [1] 5
a <- 2 #This is a vector
b <- 3 #This is also a vector
a+b
## [1] 5

Add/Multiply/Divide two vectors get element-wise result

c(1,2,4) + c(5,0,-1) #1+5, 2+0, 4+(-1)
## [1] 6 2 3
c(1,2,4) * c(5,0,-1) #1*5, 2*0, 4*(-1)
## [1]  5  0 -4
c(1,2,4) / c(5,0,-1)
## [1]  0.2  Inf -4.0

Vector Recycling

When applying an operation to two vectors that requires them to be the same length, R recycles or repeats the shorter vector, until it is long enough to match the longer one

c(1,2,3) + 1 #will add 1 to 1, 2, and 3
## [1] 2 3 4
c(1,2,4) + c(6, 0, 9, 20, 22) #will add 1+6, 2+0, 4+9, 1+20, 2+22
## Warning in c(1, 2, 4) + c(6, 0, 9, 20, 22): longer object length is not a
## multiple of shorter object length
## [1]  7  2 13 21 24

Vector Indexing

Indexing Vectors: “giving an address,” or forming a sub-vector by picking elements of a given vector for specific indices

y <- c(1.2, 3.9, 0.4, 0.12)

y[c(1,3)] #pull the values at index 1 and index 3
## [1] 1.2 0.4
v <- 3:4
y[v] #pull the values at index 3 and index 4
## [1] 0.40 0.12
y[c(1,3,1)] #pull the values at index 1, index 3, and index 1
## [1] 1.2 0.4 1.2
y[-1] #EXCLUDE the value at index 1
## [1] 3.90 0.40 0.12

Create Vectors

Vectors can be created using the colon operator

1:3 #create a vector containing 1:3
## [1] 1 2 3
i <- 2

1:i-1 #This creates the vector 1:2 and multiplies it by the vector 1
## [1] 0 1
1:(i-1) #1:(2-1), or 1:1, vector 1
## [1] 1


Or using seq()

This is very important

seq(from=12, to=30, by=3)
## [1] 12 15 18 21 24 27 30
seq(from=12, to=13, by=0.1)
##  [1] 12.0 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 13.0


rep() can be used to generate the same constant into long vectors

rep(8,4)
## [1] 8 8 8 8
rep(c(5,12,13),3)
## [1]  5 12 13  5 12 13  5 12 13
rep(c(5,12,13), each=2)
## [1]  5  5 12 12 13 13

Using all() and any()

Among a vector, report if all/any of their elements are TRUE

x <- 1:10 #x is 1 - 10
x > 8 #x is greater than 8
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
any(x>8) #is x ever greater than 8?
## [1] TRUE
all(x>8) #is x always greater than 8?
## [1] FALSE
all(x>0) #is x always greater than 0?
## [1] TRUE

Vectorized Operation

Suppose we have a function f()

We have to apply this on all elements in a vector x

In many cases, we can accomplish this by simply call f() on x

This process is called a vectorized operation - it’s simple and fast

u <- c(5,2,8)
v <- c(1,3,9)
u>v #will apply to each value in the vector
## [1]  TRUE FALSE FALSE
u[1]>v[1] #Is the first element of u greater than the first element of v?
## [1] TRUE
v[3]<u[2] #Is the third element of v smaller than the second element of u?
## [1] FALSE


Some functions are also vectorized

sqrt(1:9)
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
## [9] 3.000000

Filtering

Filtering allows us to extract a vector’s elements that satisfies certain conditions

In the following examples, we generate a Boolean vector first, them use the Boolean fector to filter elements in the original vector

z <- c(5,2,-3,8)

w<-z[z*z>8] #vector z where z*z is greater than 8
w
## [1]  5 -3  8
x <- 1:5
x[x>3] <- 0 #assign 0 to all values where x is greater than 3
x
## [1] 1 2 3 0 0

Filtering with subset() and which()

Subset returns the values that satisfy the rules.

Which returns the index of the values that satisfy the rules

x <- c(1:5, NA, 12)
x
## [1]  1  2  3  4  5 NA 12
x[x>5] #Elements of x where x is greater than 5
## [1] NA 12
subset(x, x>5) #Notice how the NA value is handled differently when using subset
## [1] 12
which(x>3) #Which returns the INDEX
## [1] 4 5 7

Testing Equality

Test if two vectors are equal using “==”

x <- 1:3
y <- c (1,3,4)
x==y #checks each vector value independently
## [1]  TRUE FALSE FALSE
all(x==y)
## [1] FALSE

Matrices and Arrays


Matrices

Matrices are vectors with an additional dimension attribute describing their size

An R matrix corresponds to a mathematical matrix

x <- matrix(nrow = 3, ncol = 2)
dim(x) #dimension of x
## [1] 3 2
attributes(x) #what do I know about x?
## $dim
## [1] 3 2
class(x) #what is the class of x?
## [1] "matrix" "array"
x #print the matrix (currently empty)
##      [,1] [,2]
## [1,]   NA   NA
## [2,]   NA   NA
## [3,]   NA   NA

Create Matrices with dim()

Matrices can be created with vectors by adding a dimension attribute

Matrices are construct column-wise starting in the upper left corner and running down the columns

x <- 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
dim(x) <- c(2,5) #Assign x the dimensions 2x5
x
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
m.values.2 <- seq(5,45, by=5) #Values can also be assigned with sequence
m.values.2
## [1]  5 10 15 20 25 30 35 40 45
dim(m.values.2) <- c(3,3)
m.values.2
##      [,1] [,2] [,3]
## [1,]    5   20   35
## [2,]   10   25   40
## [3,]   15   30   45
dim(x) #check the dimensions of a matrix
## [1] 2 5
nrow(x) #check the number of rows in a matrix
## [1] 2
ncol(x) #check the number of columns in a matrix
## [1] 5

Create Matrices with bind()

Matrices can be created by column-binding (cbind()) or row-binding (rbind()):

x <- 1:3
y <- 10:12
cbind(x,y) #bind x to column 1 and y to column 2
##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12

Index Matrices

Matrices are indexed using double subscripting

You can extract a submatrix from a matrix

x <- 1:10
dim(x) <- c(2,5)
x
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
x[1,2] #Pull value at row 1, column 2
## [1] 3
x[1:2, 3:4] #Pull rows 1-3 and columns 3-4
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8
x.1 <- x[1:2, 3:4] #These values can be saved as a new matrix

Performing Linear Alegebra

Common Linear Algebra Operation:

y <- matrix(1:4, nrow=2)
y
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
y*y #BAD DON'T USE THIS
##      [,1] [,2]
## [1,]    1    9
## [2,]    4   16
y%*%y #use this :)
##      [,1] [,2]
## [1,]    7   15
## [2,]   10   22
y+y #matrix addition
##      [,1] [,2]
## [1,]    2    6
## [2,]    4    8


A matrix can be transposed by using t()

z <- matrix(1:6, nrow=3, ncol=2)
z
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
t(z) #transpose z
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Matrix sub setting

Sometimes, subset a row from a matrix may not work as expected

This seems natrual, but sometimes when you plan to ge a 1 by k matrix you get a k length vector which is k by 1. This could ruin a computation

In this example, z is not a 1 by 2 matrix as we expected. Z is displayed as a vector

z <- matrix(1:6, nrow=3, ncol=2)
z
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
z[2,] #This creates a vector, not a matrix
## [1] 2 5
z[2,,drop=FALSE] #This is corrected with drop
##      [,1] [,2]
## [1,]    2    5

Use Apply()

The apply function family is one of the most famous features in R:


We will show how to use apply() on the mean function of each column of a matrix


apply(X, MARGIN, FUN)

X: an array, including a matrix.

MARGIN: a vector giving the subscripts which the function will be applied over. For a matrix:

FUN: the function to be applied: see ‘Details’. In the case of functions like +, %*%, etc., the function name must be backquoted or quoted.

z <- matrix(1:6, nrow=3, ncol=2)
z
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
apply(z,2,mean) #apply to matrix z, columns, the function mean
## [1] 2 5

Apply() with user-defined funcions

f <- function(x)
{
  mean(x)/2
}
z <- matrix(1:6, nrow=3, ncol=2)
z
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
apply(z,2,f) #apply to matrix z, columns, the function f
## [1] 1.0 2.5

Arrays

Arrays are vectors too

Arrays are one or more additional dimensions

y<- c(1:20)
y
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
dim(y) <- c(2,5,2) #Can be made with dim(). 2 rows, 5 columns, 2 levels
y
## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   11   13   15   17   19
## [2,]   12   14   16   18   20
y<- c(1:20)
y
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
y <- array(y,c(2,5,2)) #can be made with array(). 2 rows, 5 columns, 2 levels
y
## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   11   13   15   17   19
## [2,]   12   14   16   18   20

Demonstration Code

rm(list=ls()) #clear all previous objects in the environment
stu.name <- c("John", "Kelly", "Arav","Mahi","List","Mary","Xing","Josh", "Kim", "Dev", "Linda")

midterm.score <- c(72,71,83,86,79,90,85,92,74,89,NA)

final.score <- c(85,81,94,72,80,79,90,92,70,91,NA)

#class of each vector
class(stu.name)
## [1] "character"
class(midterm.score)
## [1] "numeric"
#basic operations
mean(midterm.score) #will output NA because of NA
## [1] NA
max(0)
## [1] 0
min(0)
## [1] 0
#ignore the NA in a dataset
mean(midterm.score, na.rm=TRUE)
## [1] 82.1
#Another way of handling NAs
is.na(midterm.score)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
table(is.na(midterm.score)) #How many NA's are there in a dataset
## 
## FALSE  TRUE 
##    10     1
keep.tf <- !is.na(midterm.score) #! is not
keep.tf #all the values where there is not an NA are going to be true
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
mean(midterm.score[keep.tf]) #mean of the midterm score at the index where keep.tf is true
## [1] 82.1
#calculate the course grade by creating the matrix using midterm and final score
length(stu.name)
## [1] 11
all.score <- matrix(nrow=2,ncol=length(stu.name))
all.score[1,] <- midterm.score #insert midterm scores into the first row
all.score[2,] <- final.score #insert final scores into the second row
all.score
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## [1,]   72   71   83   86   79   90   85   92   74    89    NA
## [2,]   85   81   94   72   80   79   90   92   70    91    NA
#calculate the course grade
apply(all.score,2,mean)
##  [1] 78.5 76.0 88.5 79.0 79.5 84.5 87.5 92.0 72.0 90.0   NA
course.grade <- apply(all.score,2,mean) #hold the values in course grade

names(course.grade) <- stu.name #assign the student names as names for the course grade values

course.grade["Arav"]
## Arav 
## 88.5
course.grade[course.grade>90]
## Josh <NA> 
##   92   NA
course.grade[course.grade<80]
##  John Kelly  Mahi  List   Kim  <NA> 
##  78.5  76.0  79.0  79.5  72.0    NA
#Remove the NA student from the table
na.omit(course.grade)
##  John Kelly  Arav  Mahi  List  Mary  Xing  Josh   Kim   Dev 
##  78.5  76.0  88.5  79.0  79.5  84.5  87.5  92.0  72.0  90.0 
## attr(,"na.action")
## Linda 
##    11 
## attr(,"class")
## [1] "omit"