Introduction to R

Timothy Tickle
October 17th, 2015

What is RStudio?

  • IDE Interactive Development Environment
  • Console + Help + Figures + Project Management
  • Let's use it!

Crunching Numbers

  • 5 + 2
  • 5 - 2
  • 5 * 2
  • 5 / 2
  • 5 ^ 2
  • 5 ** 2
  • 5 %% 2
  • 5 %/% 2
  • addition
  • subtraction
  • multiplication
  • division
  • exponentiation
  • exponentiation
  • modulus ( remainder )
  • integer division (divisor )

Try it yourself

  • 1. 5 plus 4 then times 2
  • 1. 5 plus 7 then divided by 2
  • 1. 4 minus 2 then to the power of 2
  • 1. 5 plus 5 then modulus 5

Question Cat

Adding Logic to Your Code

  • 5 < x
  • 5 <= x
  • 5 > x
  • 5 >= x
  • 5 == x
  • 5 != x
  • !x
  • True || False
  • True && False
  • isTrue( x )
  • less than
  • less than or equal to
  • greater than
  • greater than or equal to
  • equal to
  • not equal to
  • Logical NOT
  • Logical OR
  • Logical AND
  • Test for True

Warning for Precision

Angry Cat

Warning for Precision

sqrt(4) * sqrt(4) == 4
[1] TRUE
sqrt(2) * sqrt(2) == 2
[1] FALSE
all.equal( sqrt(2) * sqrt(2), 2)
[1] TRUE

Storing Data / Making Variables

The rules:

  • Letters, numbers, dots, underscores
  • Must start with a letter or a dot not followed by a number
  • No reserve words, No spaces
x <- 5
x
[1] 5
x * 2
[1] 10

Storing Data / Making Variables

x <- 2
x <- x + 1
x
[1] 3

Storing Data / Making Variables

x <- 2
x <- x + 1
y <- 4
x * y
[1] 12

Basic Data Types

  • integer
  • numeric
  • character (Strings)
  • logical
  • factor
  • ordered factor
  • Missing Values and others
  • 1
  • 1.0
  • “hello” 'hello' “a” 'a'
  • TRUE FALSE
  • factor(“GroupOne”)
  • factor( c(“d”,“o”,“g”), ordered = TRUE)
  • NA, NaN, -Inf, Inf
  • class()

Group Similar Data with Vectors

Vector - Single collection of the same data mode

c(1,2,3,4,5,6,7)
[1] 1 2 3 4 5 6 7
5:9
[1] 5 6 7 8 9
9:1
[1] 9 8 7 6 5 4 3 2 1

Character Vectors

c( "a", "a", "a", "a", "a" )
[1] "a" "a" "a" "a" "a"
rep( "a", 5 )
[1] "a" "a" "a" "a" "a"
c( "Cats","are","amazing" )
[1] "Cats"    "are"     "amazing"

Logical Vectors

c( TRUE, FALSE, TRUE, TRUE, FALSE )
[1]  TRUE FALSE  TRUE  TRUE FALSE

Factor Vector

factor( c( "Cats","are","still", "amazing" ) )
[1] Cats    are     still   amazing
Levels: amazing are Cats still

Vectors Must Be of One Data Mode

c( 1, "2", FALSE)
[1] "1"     "2"     "FALSE"
c( 1, FALSE )
[1] 1 0

"Confused Cat"

Combining Vectors

x <- 1:4
y <- 5:10
c( x, y )
 [1]  1  2  3  4  5  6  7  8  9 10
c( 1:4, 5:10 )
 [1]  1  2  3  4  5  6  7  8  9 10

Selecting Vector Elements

  • One element
x <- 1:4
x[ 2 ]
[1] 2
  • A slice of a vector
x <- 1:10
x[ 4:7 ]
[1] 4 5 6 7

Selecting Vector Elements

  • Multiple elements ( not contiguous )
x <- c( "a", "b", "c", "d", "e", "f" )
x[ c(5,3,1) ]
[1] "e" "c" "a"
  • Removing elements
x <- 5:1
x[ -1 ]
[1] 4 3 2 1

Selecting Vector Elements

  • Using logical vector
# Start with vector from 1 - 10
x <- 1:10
# Get indices of even elements
y <- x%%2 == 0
# Pull out those even element by index
x[y]
[1]  2  4  6  8 10

2-Dimensional Vectors are Matrices

matrix( 1:20, nrow = 5, ncol = 4 )
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

Indexing Matrices

  • matrix[ r, c ]
boring.matrix <- matrix( 1:20, nrow = 5, ncol = 4 )
dim( boring.matrix )
[1] 5 4
boring.matrix[ ,1 ]
[1] 1 2 3 4 5
boring.matrix[ 2, ]
[1]  2  7 12 17

Indexing Matrices

  • matrix[ r, c ]
boring.matrix
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
boring.matrix[ 2, 1 ]
[1] 2

Updating Data in Matrices

boring.matrix[ 2, 1 ] <- 99 
boring.matrix
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]   99    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

Matrix Operations

  • Transpose
boring.matrix <- matrix(1:9, nrow = 3)
boring.matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
t(boring.matrix)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Matrix Operations

  • Adding
boring.matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
boring.matrix + 1
     [,1] [,2] [,3]
[1,]    2    5    8
[2,]    3    6    9
[3,]    4    7   10

Matrix Operations

  • Adding
boring.matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
boring.matrix + boring.matrix
     [,1] [,2] [,3]
[1,]    2    8   14
[2,]    4   10   16
[3,]    6   12   18

Matrix Operations

  • Multiplying
boring.matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
boring.matrix * 2
     [,1] [,2] [,3]
[1,]    2    8   14
[2,]    4   10   16
[3,]    6   12   18

Matrix Operations

  • Multiplying
boring.matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
boring.matrix * boring.matrix
     [,1] [,2] [,3]
[1,]    1   16   49
[2,]    4   25   64
[3,]    9   36   81

Matrix Operations

  • Linear Algebra
boring.matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
boring.matrix %*% boring.matrix
     [,1] [,2] [,3]
[1,]   30   66  102
[2,]   36   81  126
[3,]   42   96  150

Naming Matrices

colnames( boring.matrix ) <- c( "c1","c2","c3" )
rownames( boring.matrix ) <- c( "r1", "r2", "r3" )
boring.matrix
   c1 c2 c3
r1  1  4  7
r2  2  5  8
r3  3  6  9
boring.matrix["r1",]
c1 c2 c3 
 1  4  7 

Relax, Break!

Happy cat

Data Frames: Matrices of Many Data Types

x <- 11:16
y <- seq(0,1,.2)
z <- c( "one", "two", "three", "four", "five", "six" )
a <- factor( z )

Data Frames: Matrices of Many Data Types

  • Data frames are column major
data.frame(x,y,z,a)
   x   y     z     a
1 11 0.0   one   one
2 12 0.2   two   two
3 13 0.4 three three
4 14 0.6  four  four
5 15 0.8  five  five
6 16 1.0   six   six
test.dataframe <- data.frame(x,y,z,a)

Data Frames are Column Major

test.dataframe
   x   y     z     a
1 11 0.0   one   one
2 12 0.2   two   two
3 13 0.4 three three
4 14 0.6  four  four
5 15 0.8  five  five
6 16 1.0   six   six
class( test.dataframe[3] )
[1] "data.frame"

Data Frames Hold Different Data Modes

class( test.dataframe[[1]] )
[1] "integer"
class( test.dataframe[[2]] )
[1] "numeric"
class( test.dataframe[[3]] )
[1] "factor"

Warning!

  • Data frames can change your data modes

Mad Cat

Warning!

LETTERS
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
[18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
class( LETTERS )
[1] "character"
data.mode.df <- data.frame( LETTERS )
class( data.mode.df[[ 1 ]] )
[1] "factor"

Combining Data Frames

mini.frame.one <- data.frame( "one" = 1:4 )
mini.frame.two <- data.frame( "two" = 6:9 )

Combining Data Frames

  • binding columns with common row names
cbind( mini.frame.one, mini.frame.two )
  one two
1   1   6
2   2   7
3   3   8
4   4   9
  • rbind for binding rows ( with common column names )

Updating Data Frames

test.dataframe
   x   y     z     a
1 11 0.0   one   one
2 12 0.2   two   two
3 13 0.4 three three
4 14 0.6  four  four
5 15 0.8  five  five
6 16 1.0   six   six

Updating Data Frames

test.dataframe[[1]] = 21:26
test.dataframe
   x   y     z     a
1 21 0.0   one   one
2 22 0.2   two   two
3 23 0.4 three three
4 24 0.6  four  four
5 25 0.8  five  five
6 26 1.0   six   six

Updating Data Frames

test.dataframe[[3,1]] = 111
test.dataframe
    x   y     z     a
1  21 0.0   one   one
2  22 0.2   two   two
3 111 0.4 three three
4  24 0.6  four  four
5  25 0.8  five  five
6  26 1.0   six   six

Lists are Filing Cabinets

So I have a person with:

  • 5 medical measurements, 10 self-reported measurements, no children, Two parent names
[1]  1.3  1.6  3.2  9.8 10.2
 [1] 13  6  4  7  6  5  8  9  7  4
[1] FALSE
[1] "Parent1.name" "Parent2.name"

Lists are Filing Cabinets

measurements <- c( 1.3, 1.6, 3.2, 9.8, 10.2 )
self.reporting <- c( 13, 6, 4, 7, 6, 5, 8, 9, 7, 4 )
children <- FALSE
parents <- c( "Parent1.name", "Parent2.name" )

Lists are Filing Cabinets

my.person <- list( measurements, self.reporting, children, parents )
my.person
[[1]]
[1]  1.3  1.6  3.2  9.8 10.2

[[2]]
 [1] 13  6  4  7  6  5  8  9  7  4

[[3]]
[1] FALSE

[[4]]
[1] "Parent1.name" "Parent2.name"

Lists are Filing Cabinets

  • Single bracket accessing
my.person[4]
[[1]]
[1] "Parent1.name" "Parent2.name"
my.person[1:2]
[[1]]
[1]  1.3  1.6  3.2  9.8 10.2

[[2]]
 [1] 13  6  4  7  6  5  8  9  7  4

Lists are Filing Cabinets

  • Double bracket accessing
my.person[[1]]
[1]  1.3  1.6  3.2  9.8 10.2

Lists are Filing Cabinets

  • Access by name
my.person <- list( measure = measurements, self.measure = self.reporting, child = children, parents = parents )
my.person
$measure
[1]  1.3  1.6  3.2  9.8 10.2

$self.measure
 [1] 13  6  4  7  6  5  8  9  7  4

$child
[1] FALSE

$parents
[1] "Parent1.name" "Parent2.name"

Lists are Filing Cabinets

  • Access by name
my.person$parents
[1] "Parent1.name" "Parent2.name"

A Data Object for Every Occasion

What data type would you use?

  • Basic Data Type
  • Vector
  • Matrix
  • Data Frame
  • List

Data Structures the Others

Oops, I may have missed somethings…

  • Named Vectors
  • Table
  • Raw
  • S3 objects
  • S4 objects

Control Structures

  • Running code conditionally
  • Always use brackets
x = 1
if( x < 5){
  print( "Mew." )
}
[1] "Mew."

Control Structures

  • Multiple Conditional operation
x = 10
if( x < 3 ){
  print( "Less than three!")
} else {
  print( "Greater than or equal to three!")
}
[1] "Greater than or equal to three!"

Control Structures

  • Multiple Conditional operation
x = 3
if( x < 3 ){
  print( "Less than three!" )
} else if( x > 3 ) {
  print( "Greater than three!")
} else {
  print( "Equal to three." )
}
[1] "Equal to three."

Control Structures

  • For loops
measurements <- 1:10
for( value in measurements ){
  print( value * 10 )
}
[1] 10
[1] 20
[1] 30
[1] 40
[1] 50
[1] 60
[1] 70
[1] 80
[1] 90
[1] 100

Control Structures

  • For loops
  • Avoid as much as possible
  • Nested for loops are possible but slower
print( measurements * 10 )
 [1]  10  20  30  40  50  60  70  80  90 100
  • First: builtin functions
  • Second: vectorized methods
  • Third: spin your own loop

Control Structures

Switches run code based on a key word

measures = rlnorm(1000)
centrality = "Mean"
#centrality = "Median"
#centrality = "Mew"
switch( centrality,
  Mean = mean( measures ),
  Median = median( measures ),
  stop("Dave, I don't understand.")
)
[1] 1.661554

Control Structures

  • Vectorized ifelse operation
  • This is NOT equivalent to if else
measures = 1:10
ifelse( measures < 5, 0, 1)
 [1] 0 0 0 0 1 1 1 1 1 1

Control Structures the Others

  • Yeah, forgot about…
  • while loops

Control Structures the R Way

Boss Cat

Control Structures the R Way

You should avoid loops as much as possible.

  • builtin functions
   c1 c2 c3
r1  1  4  7
r2  2  5  8
r3  3  6  9
colSums( boring.matrix )
min( boring.matrix )
max( boring.matrix )

Control Structures the R Way

You should avoid loops as much as possible.

  • Vector indexing
x = 11:20
x < 14
 [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x[ x < 14 ]
[1] 11 12 13

Control Structures the R Way

You should avoid loops as much as possible.

  • Vector indexing
x
 [1] 11 12 13 14 15 16 17 18 19 20
which( x < 14 )
[1] 1 2 3
  • How can we leverage this to impute NAs ?

Imputing NAs ... Like a Boss

x = 1:10
x[ c(1,3,5,7) ] <- NA
x
 [1] NA  2 NA  4 NA  6 NA  8  9 10
x[ x == NA ] <- mean( x, na.rm = TRUE )
  • ???

Control Structures the R Way

You should avoid loops as much as possible.

  • apply, sapply, tapply, mapply
  • matrix[r,c]
  • matrix[1,2]
colSums( boring.matrix )
c1 c2 c3 
 6 15 24 
apply( boring.matrix, 2, sum )
c1 c2 c3 
 6 15 24 

Control Structures the R Way

You should avoid loops as much as possible.

  • apply, sapply, tapply, mapply
  • matrix[r,c]
  • matrix[1,2]
rowSums( boring.matrix )
r1 r2 r3 
12 15 18 
apply( boring.matrix, 1, sum )
r1 r2 r3 
12 15 18 

Reusing Code with Functions

  • Never copy paste code
  • Modularity is important to maintaining code
  • Modularity is everywhere
  • Builtin functions, custom functions, sourced scripts, proper libraries

Reusing Code with Functions

arithmetic.means <- function( values.to.measure ){
  measure.mean = sum( values.to.measure ) / length( values.to.measure )
  return( measure.mean )
}

measurements <- 1:10
arithmetic.means( measurements )
[1] 5.5
mean( measurements )
[1] 5.5

Make Your First Function

  • Name the function “critical.cat”
  • Should expect a number
  • If the number is even should print “Mew”
  • If the number is odd, should print “Eww”
  • Make a comment using '#'
  • Returns the number minus 1

Make Your First Function

critical.cat <- function( number ){
  # Making functions like a boss
  if( number %% 2 == 0 ){
    print( "Mew" )
  } else {
    print( "Eww")
  }
  return( number - 1 )
}

Questions on Functions?

Questions

Reading Tables

  • Most likely the hardest task in R
  • Mostly because of human error…I mean collaboration.

Working with other peoples data

Reading Tables

  • read.table
  • read.csv
  • read.delim
new.df = read.table( "data/super_fun.txt" )
dim( new.df )
[1] 3 6
head( new.df )
new.df

Reading Tables

new.df = read.table( "data/not_so_fun.txt" )
dim( new.df )
[1] 4 7
head( new.df )
     V1    V2    V3    V4    V5    V6    V7
1    ID col_1 col_2 col_3 col_4 col_5 col_6
2 row_1    11    12    13    14    15    16
3 row_2    21    22    23    24    25    26
4 row_3    31    32    33    34    35    36

Reading Tables

Surprised Cat

Writing Tables

  • write.table
  • write.csv
write.table( boring.matrix, "data/boring_matrix.txt")
write.csv( test.dataframe, "data/test_dataframe.csv", quote=FALSE)

Importing Excel

  • install.packages(“xlsx”)
  • library( xlsx )
library(xlsx)
read.xlsx( "data/super_fun.xlsx", 1 )
    NA. col_1 col_2 col_3 col_4 col_5 col_6
1 row_1    11    12    13    14    15    16
2 row_2    21    22    23    24    25    26
3 row_3    31    32    33    34    35    36

Getting Help

  • ? mean
  • help.search( “mean” )
  • find( “lowess”)
  • apropos( “lm” )
  • CRAN
  • Vignettes
  • Manuals

Questions?

"win"