Whats Covered

   


Conditionals and Control Flow

Relational Operators

  • These are used to compare objects
  • They are common in any langage and are a basis of programming

Compare matrices

##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]
## [1,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
##       [,1] [,2] [,3]  [,4] [,5]  [,6] [,7]
## [1,] FALSE TRUE TRUE  TRUE TRUE FALSE TRUE
## [2,] FALSE TRUE TRUE FALSE TRUE  TRUE TRUE
## [1] 2

Conditional Statements

  • Used to execute statements based on result of relational statements
  • All of this is basis of programming and can be used in all the languages

Loops

While loop

  • it runs as long as the while condition is true

Write a while loop

## [1] "Slow down!"
## [1] "Slow down!"
## [1] "Slow down!"
## [1] "Slow down!"
## [1] "Slow down!"
## [1] 29

Throw in more conditionals

## [1] "Your speed is 64"
## [1] "Slow down big time!"
## [1] "Your speed is 53"
## [1] "Slow down big time!"
## [1] "Your speed is 42"
## [1] "Slow down!"
## [1] "Your speed is 36"
## [1] "Slow down!"

Build a while loop from scratch

## [1] 3
## [1] 6
## [1] 9
## [1] 12
## [1] 15
## [1] 18
## [1] 21
## [1] 24

For loop

  • Run once for each variable in the list
    • This list can just be a sequence like 1:5 or variables in a list like a bunch of names
    • This is so fundamental to programming in any language
  • However I actualy never use loops in R
    • I always modify data arrays with array functions. They are WAY faster
    • The apply functions covered later are one way to do it.
    • dplyr is the best way. All the loopig happens in C code so its super fast.
    • If you are looping on an array in R, you should probably rethink your approach.

Loop over a vector

## [1] 16
## [1] 9
## [1] 13
## [1] 5
## [1] 2
## [1] 17
## [1] 14
## [1] 16
## [1] 9
## [1] 13
## [1] 5
## [1] 2
## [1] 17
## [1] 14

Loop over a list

## [1] 8405837
## [1] "Manhattan"     "Bronx"         "Brooklyn"      "Queens"       
## [5] "Staten Island"
## [1] FALSE
## [1] 8405837
## [1] "Manhattan"     "Bronx"         "Brooklyn"      "Queens"       
## [5] "Staten Island"
## [1] FALSE

Loop over a matrix

## [1] "On row 1 and column 1 the board contains O"
## [1] "On row 1 and column 2 the board contains NA"
## [1] "On row 1 and column 3 the board contains X"
## [1] "On row 2 and column 1 the board contains NA"
## [1] "On row 2 and column 2 the board contains O"
## [1] "On row 2 and column 3 the board contains O"
## [1] "On row 3 and column 1 the board contains X"
## [1] "On row 3 and column 2 the board contains NA"
## [1] "On row 3 and column 3 the board contains X"

Mix it up with control flow

## [1] "You're popular!"
## [1] 16
## [1] "Be more visible!"
## [1] 9
## [1] "You're popular!"
## [1] 13
## [1] "Be more visible!"
## [1] 5
## [1] "Be more visible!"
## [1] 2
## [1] "You're popular!"
## [1] 17
## [1] "You're popular!"
## [1] 14

Next, you break it

## [1] "You're popular!"
## [1] 16
## [1] "Be more visible!"
## [1] 9
## [1] "You're popular!"
## [1] 13
## [1] "Be more visible!"
## [1] 5
## [1] "Be more visible!"
## [1] "This is too embarrassing!"
## [1] "You're popular!"
## [1] "This is ridiculous, I'm outta here!"

Build a loop from scratch

##  [1] "R" "'" "s" " " "i" "n" "t" "e" "r" "n" "a" "l" "s" " " "a" "r" "e" " " "i"
## [20] "r" "r" "e" "f" "u" "t" "a" "b" "l" "y" " " "i" "n" "t" "r" "i" "g" "u" "i"
## [39] "n" "g"
## [1] 5

   


Functions

Introduction to Functions

  • You have already been using these a bunch.

Required, or optional?

  • There are required and optional arguments to a function
  • Some (or actually most) of the optional arguments have a default value
    • this will be set automatically
    • for example, read.table has header = FALSE by default

The begining of the read.table function definition looks like this:

read.table(file, header = FALSE, sep = "", quote = "\"",...

In the read.table() function:

  • file is required
  • header, sep, and quote are optional arguments
    • header is defaulted to FALSE
    • sep is defaulted to and empty string ""
    • quote is defaulted to "
  • You do not have to write all the argument names
    • read.table("myfile.txt", TRUE, "-") will work.
  • But the order matters if you use that shortcut
    • read.table("myfile.txt", "-", TRUE) will throw an error.
  • You can use any order if you specify all the names and values
    • read.table("myfile.txt", sep = "-", header = TRUE) will work.

Writing Functions

Function scoping

  • Variables defined inside a function are not available outside of that function
    • Nor are the variable names given to the input
    • Calling x or y outside of this function would fail

R passes arguments by value

  • In other words, a function won’t change the original variable passed in
## [1] 15
## [1] 5

R you functional? (2)

## [1] "You're popular!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "You're popular!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] 33
## [1] "You're popular!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "You're popular!"
## [1] "Try to be more visible!"
## [1] 33

R Packages

  • Many great functions are available through packages
  • The base package is installed with R and loaded when you start R
  • The others you need to load yourself when you want them
  • I think the reason R is so useful today is becasue it has amazing packages
    • Especially the ones made by Hadley Wicklam or the folks at Rstudio

Load an R Package

##  [1] ".GlobalEnv"        "package:codetools" "package:shiny"    
##  [4] "package:stats"     "package:graphics"  "package:grDevices"
##  [7] "package:utils"     "package:datasets"  "package:methods"  
## [10] "Autoloads"         "package:base"

##  [1] ".GlobalEnv"        "package:ggplot2"   "package:codetools"
##  [4] "package:shiny"     "package:stats"     "package:graphics" 
##  [7] "package:grDevices" "package:utils"     "package:datasets" 
## [10] "package:methods"   "Autoloads"         "package:base"

The apply family

lapply

  • applies a function to each element of a list/vector
  • less code than writting out a whole loop
  • also faster
  • always returns a list
    • wrap in unlist() if you want a vector

Use Lapply with a built-in R function

## [[1]]
## [1] "GAUSS" "1777" 
## 
## [[2]]
## [1] "BAYES" "1702" 
## 
## [[3]]
## [1] "PASCAL" "1623"  
## 
## [[4]]
## [1] "PEARSON" "1857"
## [[1]]
## [1] "gauss" "1777" 
## 
## [[2]]
## [1] "bayes" "1702" 
## 
## [[3]]
## [1] "pascal" "1623"  
## 
## [[4]]
## [1] "pearson" "1857"
## List of 4
##  $ : chr [1:2] "gauss" "1777"
##  $ : chr [1:2] "bayes" "1702"
##  $ : chr [1:2] "pascal" "1623"
##  $ : chr [1:2] "pearson" "1857"
## [[1]]
## [1] "gauss" "1777" 
## 
## [[2]]
## [1] "bayes" "1702" 
## 
## [[3]]
## [1] "pascal" "1623"  
## 
## [[4]]
## [1] "pearson" "1857"

lapply and anonymous functions

## [[1]]
## [1] "gauss"
## 
## [[2]]
## [1] "bayes"
## 
## [[3]]
## [1] "pascal"
## 
## [[4]]
## [1] "pearson"
## [[1]]
## [1] "1777"
## 
## [[2]]
## [1] "1702"
## 
## [[3]]
## [1] "1623"
## 
## [[4]]
## [1] "1857"

Apply functions that return NULL

##  num 1
##  chr "a"
##  logi TRUE
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## List of 3
##  $ : num 1
##  $ : chr "a"
##  $ : logi TRUE
##  logi TRUE

sapply

  • Stands for Simple apply
  • Like lapply, but returns a vetor if it can
  • It will name the vector by default
  • If there are multiple outputs it will return a matrix
  • If it can’t return a vector or matrix it will return a list
  • Becareful with this if you expect a certain data type returned in a program

How to use sapply

## [[1]]
## [1] -1
## 
## [[2]]
## [1] 5
## 
## [[3]]
## [1] -3
## 
## [[4]]
## [1] -2
## 
## [[5]]
## [1] 2
## 
## [[6]]
## [1] -3
## 
## [[7]]
## [1] 1
## [1] -1  5 -3 -2  2 -3  1
## [[1]]
## [1] 9
## 
## [[2]]
## [1] 13
## 
## [[3]]
## [1] 8
## 
## [[4]]
## [1] 7
## 
## [[5]]
## [1] 9
## 
## [[6]]
## [1] 9
## 
## [[7]]
## [1] 9
## [1]  9 13  8  7  9  9  9

sapply with function returning vector

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]   -1    5   -3   -2    2   -3    1
## [2,]    9   13    8    7    9    9    9
## [[1]]
## [1] -1  9
## 
## [[2]]
## [1]  5 13
## 
## [[3]]
## [1] -3  8
## 
## [[4]]
## [1] -2  7
## 
## [[5]]
## [1] 2 9
## 
## [[6]]
## [1] -3  9
## 
## [[7]]
## [1] 1 9

sapply can’t simplify, now what?

## [1] -1 -1 -3
## [[1]]
## [1] -1
## 
## [[2]]
## numeric(0)
## 
## [[3]]
## [1] -1 -3
## 
## [[4]]
## [1] -2
## 
## [[5]]
## numeric(0)
## 
## [[6]]
## [1] -3
## 
## [[7]]
## numeric(0)
## [[1]]
## [1] -1
## 
## [[2]]
## numeric(0)
## 
## [[3]]
## [1] -1 -3
## 
## [[4]]
## [1] -2
## 
## [[5]]
## numeric(0)
## 
## [[6]]
## [1] -3
## 
## [[7]]
## numeric(0)
## [1] TRUE

sapply with functions that return NULL

## The average temperature is 4.8 
## The average temperature is 9 
## The average temperature is 2.2 
## The average temperature is 2.4 
## The average temperature is 5.4 
## The average temperature is 4.6 
## The average temperature is 4.6
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## The average temperature is 4.8 
## The average temperature is 9 
## The average temperature is 2.2 
## The average temperature is 2.4 
## The average temperature is 5.4 
## The average temperature is 4.6 
## The average temperature is 4.6
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL

Reverse engineering sapply

  • this uses an anonymous function
  • the result will have 3 rows and 2 columns
##           [,1]       [,2]
## min  0.4230623 0.01313717
## mean 0.7350797 0.46165963
## max  0.9431425 0.74547888

vapply

  • this is safer than sapply
    • becasue sapply can return a vector or a list (if result lengths differ)
  • you can tell it what the return data type should be

Use vapply

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] -1.0    5 -3.0 -2.0  2.0 -3.0  1.0
## [2,]  4.8    9  2.2  2.4  5.4  4.6  4.6
## [3,]  9.0   13  8.0  7.0  9.0  9.0  9.0

Use vapply (2)

##        [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## min    -1.0    5 -3.0 -2.0  2.0 -3.0  1.0
## mean    4.8    9  2.2  2.4  5.4  4.6  4.6
## median  6.0    9  3.0  2.0  5.0  5.0  4.0
## max     9.0   13  8.0  7.0  9.0  9.0  9.0

From sapply to vapply

## [[1]]
## [1]  3  7  9  6 -1
## 
## [[2]]
## [1]  6  9 12 13  5
## 
## [[3]]
## [1]  4  8  3 -1 -3
## 
## [[4]]
## [1]  1  4  7  2 -2
## 
## [[5]]
## [1] 5 7 9 4 2
## 
## [[6]]
## [1] -3  5  8  9  4
## 
## [[7]]
## [1] 3 6 9 4 1
## [1]  9 13  8  7  9  9  9
## [1]  9 13  8  7  9  9  9
## [1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
## [1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
## [1] "Pretty cold!"  "Not too cold!" "Pretty cold!"  "Pretty cold!" 
## [5] "Not too cold!" "Pretty cold!"  "Pretty cold!"
## [1] "Pretty cold!"  "Not too cold!" "Pretty cold!"  "Pretty cold!" 
## [5] "Not too cold!" "Pretty cold!"  "Pretty cold!"

   


Utilities

Useful Functions

The class video goes through some useful functions. I’ll list them here. There are many more than this.

  • abs()
    • calc the absolute value of a vector
  • sum()
    • calc the sum of a vector
  • mean()
    • calc the mean value of a vector
  • seq()
    • create a sequence
  • rep()
    • repeat an array
  • sort()
    • sort a vector
  • str()
    • see the structure of your datatype
  • is.*(), as.*()
    • check your data type of change it
  • append(), rev()
    • append and reverse vectors

Find the error (2)

## [1] 9
##  [1] 1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7 1 3 5 7

Beat Gauss using R

##   [1]   1   4   7  10  13  16  19  22  25  28  31  34  37  40  43  46  49  52
##  [19]  55  58  61  64  67  70  73  76  79  82  85  88  91  94  97 100 103 106
##  [37] 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151 154 157 160
##  [55] 163 166 169 172 175 178 181 184 187 190 193 196 199 202 205 208 211 214
##  [73] 217 220 223 226 229 232 235 238 241 244 247 250 253 256 259 262 265 268
##  [91] 271 274 277 280 283 286 289 292 295 298 301 304 307 310 313 316 319 322
## [109] 325 328 331 334 337 340 343 346 349 352 355 358 361 364 367 370 373 376
## [127] 379 382 385 388 391 394 397 400 403 406 409 412 415 418 421 424 427 430
## [145] 433 436 439 442 445 448 451 454 457 460 463 466 469 472 475 478 481 484
## [163] 487 490 493 496 499
##  [1] 1200 1193 1186 1179 1172 1165 1158 1151 1144 1137 1130 1123 1116 1109 1102
## [16] 1095 1088 1081 1074 1067 1060 1053 1046 1039 1032 1025 1018 1011 1004  997
## [31]  990  983  976  969  962  955  948  941  934  927  920  913  906
## [1] 87029

Regular Expressions

  • These are just sequences of characters and meta-characters that can match a pattern
  • They are used for
    • pattern existence
    • pattern replacement
    • pattern extraction

sub and gsub

## [1] "john.doedatacamp.edu"     "education@world.gov"     
## [3] "dalai.lama@peace.org"     "invalid.edu"             
## [5] "quantdatacamp.edu"        "cookie.monster@sesame.tv"
## [1] "john.doe@datacamp.edu"    "education@world.gov"     
## [3] "dalai.lama@peace.org"     "invalid.edu"             
## [5] "quant@datacamp.edu"       "cookie.monster@sesame.tv"

Times and Dates

Time is of the essence

##        spring        summer          fall        winter 
## "20-Mar-2015" "25-Jun-2015" "23-Sep-2015" "22-Dec-2015"
##            spring            summer              fall            winter 
##     "March 1, 15"      "June 1, 15" "September 1, 15"  "December 1, 15"
##  Named chr [1:4] "20-Mar-2015" "25-Jun-2015" "23-Sep-2015" "22-Dec-2015"
##  - attr(*, "names")= chr [1:4] "spring" "summer" "fall" "winter"
##  Date[1:4], format: "2015-03-20" "2015-06-25" "2015-09-23" "2015-12-22"
##  Named chr [1:4] "March 1, 15" "June 1, 15" "September 1, 15" ...
##  - attr(*, "names")= chr [1:4] "spring" "summer" "fall" "winter"
##  Date[1:4], format: "2015-03-01" "2015-06-01" "2015-09-01" "2015-12-01"
## Time difference of 24 days

   


The End

  • Woof, thats a lot!
    • And thats just a sample of some key programming topics in R
  • Theres a lot more to learn in each of those topics
    • Especially the funtions and packages
    • There are a lot of key packages to work with dataframes or times
    • There are packages for everything really
  • This doc is useful to me as a reference
    • Now I just need to see it again a few times to really get it
    • And I can look back here if ever need a little refresher in the future