R OBJECTS: Elements

In R Elements Have an Atomic Structure

Elements aren't really objects in R, but are single-valued vectors
However, the term 'element' is helpful to distinguish between types of objects in a vector
Elements Have One of Six Atomic modes
Function calls on objects containing elements with different atomic modes coerces the "higher" mode element to that of the "lowest" mode element

raw $\rightarrow$ logical $\rightarrow$ integer $\rightarrow$ numeric $\rightarrow$ complex $\rightarrow$ character

2 < "george"

[1] TRUE

2 > "george"

[1] FALSE

-2 < "-3"

[1] TRUE

-2 < FALSE

[1] TRUE

Coercing Elements to Other Atomic Modes

In many cases you can coerce elements and objects to higher atomic modes

as.complex(-2)

[1] -2+0i

as.character(-2)

[1] "-2"

as.logical(-2) ### Only 0 returns as FALSE

[1] TRUE

Logical operators applied to numeric elements

2 == 3

[1] FALSE

2 > 3

[1] FALSE

2 <= 3

[1] TRUE

2 < 3 | 2 > 3  ### Is EITHER 2<3 OR 2>3 true?

[1] TRUE

2 < 3 & 2 > 3  ### Are BOTH 2<3 AND 2>3 true?

[1] FALSE

Mathematical Functions on Numeric Elements

sqrt(4)

[1] 2

exp(1)

[1] 2.718282

log(1)

[1] 0

log10(1)

[1] 0

ceiling(pi)

[1] 4

floor(pi)

[1] 3

factorial(5)

[1] 120

choose(4,2)

[1] 6

R Objects: Vectors

Vectors are the Base Data Structure in R

Scalars are interpreted as a vectors containing a single element
Vectors can be created using one of four functions
- a:b creates a vector of integer-differenced values $\in [a,b]$
- c( ) concatenates various elements or vectors together
- rep( ) repeats elements or patterns of elements
- seq(m,n,o) generates a number sequence between $m$ and $n$ in $o$ increments
"Binding" elements of different modes into a vector will coerce all of the elements into the lowest atomic mode
Functions can be applied to entire vectors - without loops

The Concatenate Function - c( )

c( ) Coerces elements with various atomic modes into a vector object
Each element in the vector will have same atomic mode - whichever mode is lowest among the elements being concatenated

A <- c(1,2,3,4,5) ; A

[1] 1 2 3 4 5

B <- c(6,7,8,9,10) ; B

[1]  6  7  8  9 10

C <- c(A,B,"George") ; C  ### Coerces the numbers in A & B to characters

 [1] "1"      "2"      "3"      "4"      "5"      "6"      "7"     
 [8] "8"      "9"      "10"     "George"

The Repeat Function - rep( )

Vector formed by repeating an element or pattern of elements

D <- rep(1,10); D

 [1] 1 1 1 1 1 1 1 1 1 1

E <- rep(c(1,2,3,4),5); E

 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

G <- rep(c(1,2,3,4), each=5); G

 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

The Sequence Function - seq( )

Vector formed by sequencing elements, in a specified interval or to a specified length

H <- seq(1,10) ; H

 [1]  1  2  3  4  5  6  7  8  9 10

I <- seq(4,20,by=4) ; I

[1]  4  8 12 16 20

J <- seq(1,2, by=.1) ; J

 [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

K <- seq(10,2, by=-0.5) ; K

 [1] 10.0  9.5  9.0  8.5  8.0  7.5  7.0  6.5  6.0  5.5  5.0  4.5  4.0  3.5
[15]  3.0  2.5  2.0

L <- seq(10,2, length=7) ; L

[1] 10.000000  8.666667  7.333333  6.000000  4.666667  3.333333  2.000000

Vector Structure

Let's look at the structure of the vector J defined previously
To do this we can use the structure function str( )
Remember, str( ) is every R users best friend

str(J)

 num [1:11] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...

num - shows that this is a numeric vector
[1:11] - shows that J has 1 dimension with values in positions 1-11
As will be shown when discussing matrices, the dimensions of an $11 \times 11$ matrix are expressed as [1:11, 1:11]

Logical Functions on Vectors

Is either $(\overline{A}<\overline{B})$ or $(\overline{A}>\overline{B})$ true?

A < B | A > B

[1] TRUE TRUE TRUE TRUE TRUE

A < B || A > B

[1] TRUE

Are both $(\overline{A}<\overline{B})$ and $(\overline{A}>\overline{B})$ true?

A < B & A > B

[1] FALSE FALSE FALSE FALSE FALSE

A < B && A > B

[1] FALSE

Mathematical Functions on Vectors

A + B

[1]  7  9 11 13 15

A * B ## Scalar multiplication

[1]  6 14 24 36 50

A%*%B ##  Matrix multiplication

     [,1]
[1,]  130

round(sqrt(A),digits = 3)

[1] 1.000 1.414 1.732 2.000 2.236

round(exp(A),digits = 2)

[1]   2.72   7.39  20.09  54.60 148.41

round(log(A),digits = 3)

[1] 0.000 0.693 1.099 1.386 1.609

round(log10(A),digits=3)

[1] 0.000 0.301 0.477 0.602 0.699

sum(A)

[1] 15

cumsum(A)

[1]  1  3  6 10 15

prod(B)

[1] 30240

cumprod(B)

[1]     6    42   336  3024 30240

t(A) ### Returns the transpose of A

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5

t(t(A))

     [,1]
[1,]    1
[2,]    2
[3,]    3
[4,]    4
[5,]    5

abs(-B/2)

[1] 3.0 3.5 4.0 4.5 5.0

table(c(A,B/2)) ### displays how many times each unique value is observed


  1   2   3 3.5   4 4.5   5 
  1   1   2   1   2   1   2

Accessing And Assigning Vector Elements Using Brackets

A ### Recall the value of the vector A defined earlier

[1] 1 2 3 4 5

A[3] ## Returns the 3rd element of A

[1] 3

A[1] <- 0.1 ; A ## Assigns the 1st element of A as 0.1

[1] 0.1 2.0 3.0 4.0 5.0

A[-1] <- 4 ; A ## Assigns all but the 1st element of A as 4

[1] 0.1 4.0 4.0 4.0 4.0

A < 0.5

[1]  TRUE FALSE FALSE FALSE FALSE

A[4] == 2

[1] FALSE

R OBJECTS: Matrices

Matricies combine Atomic Vectors in a 2-dimensional Framework

In R, matrices are created using one of three methods
- 1. Define the dimensions of a matrix and add elements later
- 1. Adding elements interactively with edit( )
- 1. Merge or bind vectors together (must be of equal length)
Matrices are atomic, i.e. every element has the same atomic mode
Creating matrices from elements or vectors with different atomic modes will coerce every element to the lowest mode

Building Matrices 1 - Specifying Individual Elements

The matrix( ) function is used to create a matrix object with the following arguments
- data - Values to include in the matrix
- ncol - Number of columns
- nrow - Number of rows
- byrow - Fill the matrix rows or by columns?
- dimnames - Names applied to row and column headers

mat <- matrix(data = 1:9, 
              ncol = 3, 
              nrow = 3, 
              byrow = TRUE, 
              dimnames = list(c('A','B','C'),
                              c('D','E','F'))) 
mat

Building matrices 2 - interactively with `edit( )`

This interactive 'GUI' method can be very fast, but requires user input
Start by creating a $1 \times 1$ numeric matrix - the value you use doesn't matter

mat2 <- matrix(1)

Then, use edit() to bring up a spreadsheet-style editor window
Make whatever changes you like to the matrix, then hit file $\rightarrow$ close
Note You must re-define the matrix object or your changes will not be saved

mat2 <- edit(mat2)

Building Matrices by Merging Vectors

Vectors of the same length can be merged to build matrices
Note the use of c( ) to ensure that vec1, vec2, vec3, vec4 are all part of the argument data, and not considered as four separate arguments

vec1 <-  1:4
vec2 <-  5:8
vec3 <-  9:12
vec4 <- 13:16
vec.mat <- matrix(data = c(vec1, vec2, vec3, vec4), 
                  ncol = 4) 
vec.mat

     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16

Matrices can also be built with rbind & cbind to merge vectors as rows or columns
Note that and cbind will coerce every element to the same atomic mode

rbind(vec1, vec2, vec3, vec4)

     [,1] [,2] [,3] [,4]
vec1    1    2    3    4
vec2    5    6    7    8
vec3    9   10   11   12
vec4   13   14   15   16

cbind(vec1, vec2, vec3, vec4)

     vec1 vec2 vec3 vec4
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16

Numerical operations on matricies

Matrix operations that are often of interest include
- Returning the diagonal elements of a matrix
- Computing the determinant of a matrix
- Finding the inverse of a matrix
- Finding the transpose of a matrix
- Computing the Eigenvalues and Eigenvectors
To demonstrate matrix operations, let's first create a $5 \times 5$ matrix of random integers in $[1,50]$

mat1 <- matrix(sample(1:50,size = 25), 
               nrow = 5, 
               byrow = T) 
mat1

     [,1] [,2] [,3] [,4] [,5]
[1,]   33   14   25   19   49
[2,]    2   37   13   47   36
[3,]   44   21   39    4   46
[4,]    8    9   10   27    7
[5,]   12   15   18   11   17

diag(mat1)  ### Diagonal elements `mat1`

[1] 33 37 39 27 17

det(mat1)   ### Determinant of `mat1`

[1] 2390956

solve(mat1) ### Returns the inverse `mat1` if it exists

            [,1]         [,2]        [,3]         [,4]         [,5]
[1,] -0.07517830  0.036648102  0.13213627  0.070335046 -0.247424043
[2,] -0.09558018  0.060913710  0.10881254  0.006456413 -0.150590391
[3,]  0.06416847 -0.066029655 -0.11758058 -0.028874224  0.284919923
[4,]  0.01485264 -0.005037316 -0.01811660  0.037442763  0.001460504
[5,]  0.05984886 -0.006443448 -0.05306413 -0.049000065  0.063724719

t(mat1) ### Returns the tranpose of `mat1`

     [,1] [,2] [,3] [,4] [,5]
[1,]   33    2   44    8   12
[2,]   14   37   21    9   15
[3,]   25   13   39   10   18
[4,]   19   47    4   27   11
[5,]   49   36   46    7   17

eigen(mat1) ### Returns the eigenvalues and eigenvectors of `mat1`

$values
[1] 108.921417+0.000000i  30.900944+0.000000i  18.157668+0.000000i
[4]  -2.490014+5.737797i  -2.490014-5.737797i

$vectors
              [,1]           [,2]          [,3]                    [,4]
[1,] -0.5278529+0i  0.31792088+0i -0.3921790+0i -0.50972262+0.24878917i
[2,] -0.4090286+0i -0.81783168+0i  0.8181752+0i -0.38372140+0.29275659i
[3,] -0.6561359+0i  0.47854674+0i -0.1987081+0i  0.57631369+0.00000000i
[4,] -0.2012046+0i -0.03029619+0i -0.3498562+0i  0.04229242-0.07909292i
[5,] -0.2882175+0i -0.01236186+0i  0.1220688+0i  0.13925026-0.29285803i
                        [,5]
[1,] -0.50972262-0.24878917i
[2,] -0.38372140-0.29275659i
[3,]  0.57631369+0.00000000i
[4,]  0.04229242+0.07909292i
[5,]  0.13925026+0.29285803i

Accessing & Assigning matrix components

Matrix elements can be accessed by specifying their [row,column] values
Matrix elements can also be accessed by specifying a single [element.index]
Matrix rows/columns can be accessed by specifying [row,] or [,column]

mat1[4,3]

[1] 10

mat1[5,5]

[1] 17

mat1[7]

[1] 37

mat1[24]

[1] 7

mat1[1,] ### Returns the first row of mat1

[1] 33 14 25 19 49

mat1[,3] ### Returns the third column of mat1

[1] 25 13 39 10 18

R OBJECTS: Arrays

Arrays are element Repositories With More than Two Dimensions

The syntax for function calls on arrays are similar to what was described for vectors and matrices
Arrays are atomic

R OBJECTS: Data Frames

Data Frames Are The Most Common Data Structure In R

Data frames are NOT atomic - but are comprised of atomic column vectors
Data frames are like atricies - but each column can have a different atomic mode

Creating data frames

Data frames are primarily created by using the function data.frame()
- NEVER use rbind or cbind to create a data.frame
- Recall rbind & cbind coerces every element in every vector to the lowest atomic mode

age<-c(23, 35, 19)              ### Numeric vector
sex<-c("Male", "Female", "Yes") ### Character vector
job<-c(TRUE, TRUE, FALSE)       ### Logical vector
data.frame(age,sex,job, row.names = c("Jim", "Joe", "Ray"))

    age    sex   job
Jim  23   Male  TRUE
Joe  35 Female  TRUE
Ray  19    Yes FALSE

Data frames can also be created by loading local or online files
- read.table loads tab-delimited data from .txt files (NotePad)
- read.csv creates a data.frame from .csv files (installs w/ base R)
- read_excel creates a data.frame from .xls and .xlsx files (requires the readxl package)

read.csv("http://www.fdic.gov/bank/individual/failed/banklist.csv", header = T)

Accessing data frames components

R installs with several data sets, such as mtcars

head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

class(mtcars)

[1] "data.frame"

Working With Data Frames

str(mtcars)

'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

The $ operator is used to call data frame columns as mtcars$cyl

R OBJECTS: Lists

Lists Are The Most General & Powerful Object Type in R

Think of lists as a suitcase - they can be used to store everything
- Vectors
- Matrices
- Data Frames
- Functions
- Even other lists

Example: Function outputs You write script that inputs the cars data set & creates several outputs

The data set itself (a data.frame or matrix)
Some summary statistics of the data set summary(cars)
The maximum value of the log-likelihood function $\left(\mathcal{L}\right)$ a $1\times 1$ vector

You can assign each of these objects to a list

Atomic Vectors & Lists

Concatenating numeric, logical and character elements results in an atomic-character vector
- The vector is atomic because every element has the same atomic mode
- Recall that c( ) coerces every element to the lowest atomic mode that will ensure homogeneity across the entire vector

V <- c(3,TRUE,"george") ; V

[1] "3"      "TRUE"   "george"

Lists preserve the atomic mode of each object stored in the list
And because lists can hold entire objects, the structure of each object stored in the list is preserved

list1 <- list(3,TRUE,"george") ; list1

[[1]]
[1] 3

[[2]]
[1] TRUE

[[3]]
[1] "george"

list2 <- list(head(mtcars),"george", A) ; list2

[[1]]
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

[[2]]
[1] "george"

[[3]]
[1] 0.1 4.0 4.0 4.0 4.0

str(list2)

List of 3
 $ :'data.frame':   6 obs. of  11 variables:
  ..$ mpg : num [1:6] 21 21 22.8 21.4 18.7 18.1
  ..$ cyl : num [1:6] 6 6 4 6 8 6
  ..$ disp: num [1:6] 160 160 108 258 360 225
  ..$ hp  : num [1:6] 110 110 93 110 175 105
  ..$ drat: num [1:6] 3.9 3.9 3.85 3.08 3.15 2.76
  ..$ wt  : num [1:6] 2.62 2.88 2.32 3.21 3.44 ...
  ..$ qsec: num [1:6] 16.5 17 18.6 19.4 17 ...
  ..$ vs  : num [1:6] 0 0 1 1 0 1
  ..$ am  : num [1:6] 1 1 1 0 0 0
  ..$ gear: num [1:6] 4 4 4 3 3 3
  ..$ carb: num [1:6] 4 4 1 1 2 1
 $ : chr "george"
 $ : num [1:5] 0.1 4 4 4 4

Accessing & Assigning List Components

Objects assigned to a list may be accessed by using double brackets [[ ]]

 list1[[1]]

[1] 3

Accessing & Assigning Components Of Objects Inside A List

Components of objects stored in a list may also be accessed by
- 1. Using double brackets [[ ]] to call the vector, matrix, list, or data frame inside the list
- 1. Either [ ], [[ ]], or $ to call the desired component of the object inside the list

 list2[[3]][1]

[1] 0.1

 list2[[1]]$mpg

[1] 21.0 21.0 22.8 21.4 18.7 18.1

R OBJECTS: Functions

One Object to Rule Them All

Functions convert input objects into either output objects or plots
- Functions can take any object as an argument, even other functions
- Functions operate on distinct classes of objects (or functions)
- Functions can return objects of the same class as the input objects or create a completely new class of object

Parts of an R Function

Functions are comprised of four basic parts
- The function symbol used to call the function
- The formal argument(s)
- The informal argument(s)
- The function body defining the operations to be performed

Creating and examining a function

The code below creates an example function called foo( ), where...
- foo $\hspace{10pt}$ - is the function symbol used to call the function
- x $\hspace{18pt}$ - is a formal argument
- a,b $\hspace{10pt}$ - are informal arguments
- a*x+b $\hspace{1pt}$ - is the function body defining the operations to be performed

foo <- function(x) { ### Values inside ( ) are interpreted as function arguments
  
  a <- 1 ; 
  b <- 2      ### Values separated by ";" are interpreted as separate lines 
  c <- a*x+b
  
  return(c)
}                  ### Values inside { } are interpreted as the function body

The function foo is evaluated by specifying a value for the formal argument
- In general, formal arguments can be numeric, character, or even other functions
- For the function foo, the formal argument x must be either a vector or a matrix
- Values are not specified for the informal arguments a and b since they we defined in the body of the function

foo(2)

[1] 4

Lexical Scoping Rules

A problem arises, however, if I want to return the value of either a or b - an error is produced
- This error results from the 'lexical scoping rules' on which R was built
- Lexical scoping defines 'where' each object is defined and how we can interact with it

b  ### Error: object "b" not found

When the function foo( ) was created, three things happened
- A new environment was created
- The body of the function foo( ) was assigned to this new environment
- The symbol foo was assigned to the parent environment of this new environment
The lexical scoping rules define how R searches for a requested value
- First, R searches the current environment for the requested value
- If the value is not found in the current environment, R then searches the parent environment to the current environment
- If necessary, R continues to search sucessive parent environments until the Global Environment is reached
- If the value isn't found in the Global Environment, R moves to the parent of the Global Environment - the Empty Environment
- Upon reaching the Empty Environment, R stops searching and returns an error that the value cannot be found
Thus, is the case of the function foo( ), a and b could not be found because...
- They are local variables, assigned in the body of foo
- Our request to find a and b was made in the parent environment to the body of foo( ), where only the symbol foo is available

Lexical Scoping Example

Let's demonstrate where `foo` lives

First, let's determine what is the current environment
Note that this entire presentation has it's own environment

environment( )

<environment: R_GlobalEnv>

Now, let's see in which environment the symbol foo is defined
This should be the same environment as that's listed above
BTW, if you refresh this presentation, this environment will be different

environment(foo)

<environment: R_GlobalEnv>

The body of the function is defined in its own environment that is a child to the environment listed above
This can be seen by examining the structure of foo( )

str(foo)

function (x)  
 - attr(*, "srcref")=Class 'srcref'  atomic [1:8] 1 8 8 1 8 1 1 8
  .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x000000001b9623b8>

What about when you're not in a presentation

When R/RStudio is opened the current environment will be the Global environment
Each new function created will have its own enclosing environment - only the function symbol can be accessed from the Global environment

Are there ways to make `a` and `b` available in the current environment?

Yes, there are at least three ways to do this
- 1. Define the a and b outside of the body of foo( )
- Use Return( ) to ensure foo( ) returns a and b every time it is run
- Use the deep-assignment operator <<-
Each of these methods are shown below

a <- 1 ; b <- 2 # Moving the informal arguments outside the function
foo <- function(x) { 
        a*x+b
}
foo(2) ; a ; b

[1] 4

[1] 1

[1] 2

foo <- function(x) { 
    a <- 1 ; b <- 2
    c <- a*x+b
    return(list(a,b,c))
}
foo(2)

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 4

foo2<-function(x) { 
    a <<- 1 ; b <<- 2
    c <- a*x+b
}
foo(2) ; a ; b

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 4

[1] 1

[1] 2

R Environments

R arrives at the same result regardless of where a and b are defined

This is due to search procedure defined by the lexical scoping rules

R first searches the function's enclosing environment for a and b
If not found in the enclosing environment, R searches the parent environment
This continues until R finds a and b or the search reaches the empty environment

The empty environment is the parent to the global environment

`Return( )` Allows Multiple Function Outputs

Use `<<-` to Assign in Parent Environment

The deep-assignment operator allows an inheritance to the parent environment

Use carefully, <<- can change base R values, giving unexpected side effects

Lexical Scoping Rules Example - 2

Look at the code chunk below

What value do you think will be returned for bar(2)?
Don't scroll down until you've decided on an answer

a<-1
b<-2

foo<-function(x) {
    a*x+b
}

bar<-function(x){
  a<-2
  b<-1
  foo(x)
}

bar(2)

What value did you choose?

The correct answer is 5
Rationale
- foo cannot access the values of a and b defined within bar
- The only values R can find for a and b are those defined in the global environment
- The a and b values defined in bar are threfore ignored

bar(2)

[1] 4

Introduction to R

Environments and Objects

OVERVIEW

In this presentation...