R Objects

R is an object-oriened language.

It can run variables, data, functions, results, edtc. They are stored in active computer memory as “objects”

Classes

Classes are kinds of objects

Ex: 1 and 100 are both numeric objects

R has five “atomic” classes:

character
numeric
integer
logical
complex (a combination of multiple classes)

Ways to Store Objects

Vectors
Matrices
Arrays
Lists
Dataframes
Factors

Objects have attributes

Attributes are “metadata” or data about the data

attributes(): access the attributes of R objects
names(): access the names of an object
dim(): access the dimensions of matrices and arrrays
class(): access the class of an object
length: access the length of an object

Naming Conventions

There are different naming conventions:

Lower case separated by a dot:

my.dat
new.color

Lower case separated by underscore:

my_dat
new_color

Lower-Upper case:

myDat
newColor

Upper-Upper Case

MyDat
NewColor

Avoid using names identical to R functions

min()
max()
var()
sd()

Object names are cae senstitive - BE CONSISTENT

Vectors

In R, the most basic & fundamental type of object is the vector

Elements within one vector have to be the same “atomic” class

Numbers

Numbers are treated as double precision real numbers as numeric objects

x <- 1 #Stores 1.00 in x

To explicitly store in integer, you need to specify suffix L

x <-1L

Inf represents infinity and allows us to represent values like 1/0

Individual numbers are one-element vectors

Character Strings

Like numbers, individual character strings are one-element vectors of character

R has many functions to manipulate strings. Many deal with putting strings together or taking them apart

u <- "abc" #A one-element vector
u <- paste("abc", "de", "f") #combines the strings together
u

## [1] "abc de f"

Creating Vectors

To created a vector, use c() function. “c” stands for concatenate

x <- c(0.5, 0.6, 0.7) #numeric
x <- c(TRUE, TRUE, FALSE) #logical
x <- c("0.5", "0.6", "0.7") #character
x <- vector("numeric", length=10) #will create the vector, but not display any content

Missing Values

is.na(10) #tests whether objects are or contain missing values, NA

## [1] FALSE

is.nan(10) #tests whether objects are or contain values that aren't numbers (Not a Number), NaN

## [1] FALSE

sqrt(-1)

## Warning in sqrt(-1): NaNs produced

## [1] NaN

Inf-Inf

## [1] NaN

NaN values are NA, but the converse is not true

x <- c(1,3,5,NA)
is.na(x)

## [1] FALSE FALSE FALSE  TRUE

x <- c(1,3,NA,NaN)
is.nan(x)

## [1] FALSE FALSE FALSE  TRUE

Null

Tests whether an object is NULL, a special R object. NULL is counted as non-existent

length(NA)

## [1] 1

length(NaN)

## [1] 1

length(NULL)

## [1] 0

Vector Names

The element in a vector can optionally be given name with names

x <- c(1,2,3)
names(x)

## NULL

names(x) <- c("KS", "MO", "IL")
names(x)

## [1] "KS" "MO" "IL"

#Names can be used to call the attached values
x["KS"]

## KS 
##  1

#Names can be removed with NULL
names(x) <- NULL
x

## [1] 1 2 3

Coercion

Elements of a vector must be from the same atomic class. When we mix object classes in a single vector, coercion occurs. It will assign all the characters in a vector the same type. This is called “implicit coercion.”

Implicit Coercion order: logical -> integer -> numeric -> complex -> character

You can force against this order using “as. functions”

y <- c(1.7, "a") #will become the "character" class
y <- c(TRUE, 2) #will become the "numeric" class
y <- c("a", TRUE) #will become the "character" class

You can also explicitly coerce objects. Strange things can happen when we force one basic data type into another:

x <- 0:6
class(x)

## [1] "integer"

as.numeric(x) #force x to be numeric

## [1] 0 1 2 3 4 5 6

as.logical(x) #force x to be logical (0 will be false)

## [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

as.character(x) #force x to be a character - the numbers will be stored as text

## [1] "0" "1" "2" "3" "4" "5" "6"

x <- c("a", "b")
as.numeric(x) #force x to be numeric, but these values are incompatible

## Warning: NAs introduced by coercion

## [1] NA NA

Vector Operations

Atrithmetic and Logical Operations

An operator does basic functions. For example, addition operator “+”

Numbers are “one-element” vectors

2+3 #Not acting on vectors

## [1] 5

a <- 2 #This is a vector
b <- 3 #This is also a vector
a+b

## [1] 5

Add/Multiply/Divide two vectors get element-wise result

c(1,2,4) + c(5,0,-1) #1+5, 2+0, 4+(-1)

## [1] 6 2 3

c(1,2,4) * c(5,0,-1) #1*5, 2*0, 4*(-1)

## [1]  5  0 -4

c(1,2,4) / c(5,0,-1)

## [1]  0.2  Inf -4.0

Vector Recycling

When applying an operation to two vectors that requires them to be the same length, R recycles or repeats the shorter vector, until it is long enough to match the longer one

c(1,2,3) + 1 #will add 1 to 1, 2, and 3

## [1] 2 3 4

c(1,2,4) + c(6, 0, 9, 20, 22) #will add 1+6, 2+0, 4+9, 1+20, 2+22

## Warning in c(1, 2, 4) + c(6, 0, 9, 20, 22): longer object length is not a
## multiple of shorter object length

## [1]  7  2 13 21 24

Vector Indexing

Indexing Vectors: “giving an address,” or forming a sub-vector by picking elements of a given vector for specific indices

y <- c(1.2, 3.9, 0.4, 0.12)

y[c(1,3)] #pull the values at index 1 and index 3

## [1] 1.2 0.4

v <- 3:4
y[v] #pull the values at index 3 and index 4

## [1] 0.40 0.12

y[c(1,3,1)] #pull the values at index 1, index 3, and index 1

## [1] 1.2 0.4 1.2

y[-1] #EXCLUDE the value at index 1

## [1] 3.90 0.40 0.12

Create Vectors

Vectors can be created using the colon operator

1:3 #create a vector containing 1:3

## [1] 1 2 3

i <- 2

1:i-1 #This creates the vector 1:2 and multiplies it by the vector 1

## [1] 0 1

1:(i-1) #1:(2-1), or 1:1, vector 1

## [1] 1

Or using seq()

This is very important

seq(from=12, to=30, by=3)

## [1] 12 15 18 21 24 27 30

seq(from=12, to=13, by=0.1)

##  [1] 12.0 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 13.0

rep() can be used to generate the same constant into long vectors

rep(8,4)

## [1] 8 8 8 8

rep(c(5,12,13),3)

## [1]  5 12 13  5 12 13  5 12 13

rep(c(5,12,13), each=2)

## [1]  5  5 12 12 13 13

Using all() and any()

Among a vector, report if all/any of their elements are TRUE

x <- 1:10 #x is 1 - 10
x > 8 #x is greater than 8

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

any(x>8) #is x ever greater than 8?

## [1] TRUE

all(x>8) #is x always greater than 8?

## [1] FALSE

all(x>0) #is x always greater than 0?

## [1] TRUE

Vectorized Operation

Suppose we have a function f()

We have to apply this on all elements in a vector x

In many cases, we can accomplish this by simply call f() on x

This process is called a vectorized operation - it’s simple and fast

u <- c(5,2,8)
v <- c(1,3,9)
u>v #will apply to each value in the vector

## [1]  TRUE FALSE FALSE

u[1]>v[1] #Is the first element of u greater than the first element of v?

## [1] TRUE

v[3]<u[2] #Is the third element of v smaller than the second element of u?

## [1] FALSE

Some functions are also vectorized

sqrt(1:9)

## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
## [9] 3.000000

Filtering

Filtering allows us to extract a vector’s elements that satisfies certain conditions

In the following examples, we generate a Boolean vector first, them use the Boolean fector to filter elements in the original vector

z <- c(5,2,-3,8)

w<-z[z*z>8] #vector z where z*z is greater than 8
w

## [1]  5 -3  8

x <- 1:5
x[x>3] <- 0 #assign 0 to all values where x is greater than 3
x

## [1] 1 2 3 0 0

Filtering with subset() and which()

Subset returns the values that satisfy the rules.

Which returns the index of the values that satisfy the rules

x <- c(1:5, NA, 12)
x

## [1]  1  2  3  4  5 NA 12

x[x>5] #Elements of x where x is greater than 5

## [1] NA 12

subset(x, x>5) #Notice how the NA value is handled differently when using subset

## [1] 12

which(x>3) #Which returns the INDEX

## [1] 4 5 7

Testing Equality

Test if two vectors are equal using “==”

x <- 1:3
y <- c (1,3,4)
x==y #checks each vector value independently

## [1]  TRUE FALSE FALSE

all(x==y)

## [1] FALSE

Matrices and Arrays

Matrices

Matrices are vectors with an additional dimension attribute describing their size

An R matrix corresponds to a mathematical matrix

x <- matrix(nrow = 3, ncol = 2)
dim(x) #dimension of x

## [1] 3 2

attributes(x) #what do I know about x?

## $dim
## [1] 3 2

class(x) #what is the class of x?

## [1] "matrix" "array"

x #print the matrix (currently empty)

##      [,1] [,2]
## [1,]   NA   NA
## [2,]   NA   NA
## [3,]   NA   NA

Create Matrices with dim()

Matrices can be created with vectors by adding a dimension attribute

Matrices are construct column-wise starting in the upper left corner and running down the columns

x <- 1:10
x

##  [1]  1  2  3  4  5  6  7  8  9 10

dim(x) <- c(2,5) #Assign x the dimensions 2x5
x

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

m.values.2 <- seq(5,45, by=5) #Values can also be assigned with sequence
m.values.2

## [1]  5 10 15 20 25 30 35 40 45

dim(m.values.2) <- c(3,3)
m.values.2

##      [,1] [,2] [,3]
## [1,]    5   20   35
## [2,]   10   25   40
## [3,]   15   30   45

dim(x) #check the dimensions of a matrix

## [1] 2 5

nrow(x) #check the number of rows in a matrix

## [1] 2

ncol(x) #check the number of columns in a matrix

## [1] 5

Create Matrices with bind()

Matrices can be created by column-binding (cbind()) or row-binding (rbind()):

x <- 1:3
y <- 10:12
cbind(x,y) #bind x to column 1 and y to column 2

##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12

Index Matrices

Matrices are indexed using double subscripting

You can extract a submatrix from a matrix

x <- 1:10
dim(x) <- c(2,5)
x

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10

x[1,2] #Pull value at row 1, column 2

## [1] 3

x[1:2, 3:4] #Pull rows 1-3 and columns 3-4

##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8

x.1 <- x[1:2, 3:4] #These values can be saved as a new matrix

Performing Linear Alegebra

Common Linear Algebra Operation:

Matrix multiplication
Matrix addition
Etc

y <- matrix(1:4, nrow=2)
y

##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

y*y #BAD DON'T USE THIS

##      [,1] [,2]
## [1,]    1    9
## [2,]    4   16

y%*%y #use this :)

##      [,1] [,2]
## [1,]    7   15
## [2,]   10   22

y+y #matrix addition

##      [,1] [,2]
## [1,]    2    6
## [2,]    4    8

A matrix can be transposed by using t()

z <- matrix(1:6, nrow=3, ncol=2)
z

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

t(z) #transpose z

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Matrix sub setting

Sometimes, subset a row from a matrix may not work as expected

This seems natrual, but sometimes when you plan to ge a 1 by k matrix you get a k length vector which is k by 1. This could ruin a computation

In this example, z is not a 1 by 2 matrix as we expected. Z is displayed as a vector

z <- matrix(1:6, nrow=3, ncol=2)
z

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

z[2,] #This creates a vector, not a matrix

## [1] 2 5

z[2,,drop=FALSE] #This is corrected with drop

##      [,1] [,2]
## [1,]    2    5

Use Apply()

The apply function family is one of the most famous features in R:

apply()
lapply()
sapply()
tapply()
etc

We will show how to use apply() on the mean function of each column of a matrix

apply(X, MARGIN, FUN)

X: an array, including a matrix.

MARGIN: a vector giving the subscripts which the function will be applied over. For a matrix:

1 indicates rows
2 indicates columns
c(1, 2) indicates rows and columns.
Where X has named dimnames, it can be a character vector selecting dimension names.

FUN: the function to be applied: see ‘Details’. In the case of functions like +, %*%, etc., the function name must be backquoted or quoted.

z <- matrix(1:6, nrow=3, ncol=2)
z

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

apply(z,2,mean) #apply to matrix z, columns, the function mean

## [1] 2 5

Apply() with user-defined funcions

f <- function(x)
{
  mean(x)/2
}
z <- matrix(1:6, nrow=3, ncol=2)
z

##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

apply(z,2,f) #apply to matrix z, columns, the function f

## [1] 1.0 2.5

Arrays

Arrays are vectors too

Arrays are one or more additional dimensions

y<- c(1:20)
y

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

dim(y) <- c(2,5,2) #Can be made with dim(). 2 rows, 5 columns, 2 levels
y

## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   11   13   15   17   19
## [2,]   12   14   16   18   20

y<- c(1:20)
y

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

y <- array(y,c(2,5,2)) #can be made with array(). 2 rows, 5 columns, 2 levels
y

## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   11   13   15   17   19
## [2,]   12   14   16   18   20

Demonstration Code

rm(list=ls()) #clear all previous objects in the environment

stu.name <- c("John", "Kelly", "Arav","Mahi","List","Mary","Xing","Josh", "Kim", "Dev", "Linda")

midterm.score <- c(72,71,83,86,79,90,85,92,74,89,NA)

final.score <- c(85,81,94,72,80,79,90,92,70,91,NA)

#class of each vector
class(stu.name)

## [1] "character"

class(midterm.score)

## [1] "numeric"

#basic operations
mean(midterm.score) #will output NA because of NA

## [1] NA

max(0)

## [1] 0

min(0)

## [1] 0

#ignore the NA in a dataset
mean(midterm.score, na.rm=TRUE)

## [1] 82.1

#Another way of handling NAs
is.na(midterm.score)

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

table(is.na(midterm.score)) #How many NA's are there in a dataset

## 
## FALSE  TRUE 
##    10     1

keep.tf <- !is.na(midterm.score) #! is not
keep.tf #all the values where there is not an NA are going to be true

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

mean(midterm.score[keep.tf]) #mean of the midterm score at the index where keep.tf is true

## [1] 82.1

#calculate the course grade by creating the matrix using midterm and final score
length(stu.name)

## [1] 11

all.score <- matrix(nrow=2,ncol=length(stu.name))
all.score[1,] <- midterm.score #insert midterm scores into the first row
all.score[2,] <- final.score #insert final scores into the second row
all.score

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## [1,]   72   71   83   86   79   90   85   92   74    89    NA
## [2,]   85   81   94   72   80   79   90   92   70    91    NA

#calculate the course grade
apply(all.score,2,mean)

##  [1] 78.5 76.0 88.5 79.0 79.5 84.5 87.5 92.0 72.0 90.0   NA

course.grade <- apply(all.score,2,mean) #hold the values in course grade

names(course.grade) <- stu.name #assign the student names as names for the course grade values

course.grade["Arav"]

## Arav 
## 88.5

course.grade[course.grade>90]

## Josh <NA> 
##   92   NA

course.grade[course.grade<80]

##  John Kelly  Mahi  List   Kim  <NA> 
##  78.5  76.0  79.0  79.5  72.0    NA

#Remove the NA student from the table
na.omit(course.grade)

##  John Kelly  Arav  Mahi  List  Mary  Xing  Josh   Kim   Dev 
##  78.5  76.0  88.5  79.0  79.5  84.5  87.5  92.0  72.0  90.0 
## attr(,"na.action")
## Linda 
##    11 
## attr(,"class")
## [1] "omit"

Module 2: Materials

Kayla Foht

2025-06-03

Directory