HMC R Bootcamp 2016

Jeho Park
June 1, 2016

HMC R Bootcamp 2016

DAY 2

Where We Left Off

  • Wrap-up Exercise (subsetting)
  • Git/Github try out (homework)

Wrap-up exercise (from Day 1)

Basics
1) Try installing a new library (lmtest) from Console
2) Try installing another library (ggplot2) from Package pane

Challenges
3) From the dataset, hflights, create a subsample based on delay times greater than 10 hours (600 min) from ArrDelay and name it, hflightsSub.
4) Set all of the extreme delays in ArrDelay (more than 300 minutes) in hflightsSub to NA.
5) Convert the data frame, d, back to a matrix named m1 and compare the size of m and m1

And the Next

Agenda (Day 2)

  • Module 2: Working with Data (Part 2)
    • Graphics (base graphics)
    • Manipulating graphs
  • Module 3: Managing R Project
    • R workspace, working directory
    • R packages
  • Module 4: Programming in R
    • Functions
    • Loops and vectorization
    • If-else
    • Debugging
  • Module 5: Getting help and beyond

Module 2: Working with data (Part 2)

  • Graphics (base graphics)
  • Manipulating graphs

Graphics (base graphics)

  • Creating a graph
attach(mtcars) # Attach mtcars to search path
plot(wt, mpg) # notice objects are called by their names, not mtcars$wt
plot(wt, mpg, 
     main = "Regression of MPG on Weight",
     xlab = "Weight", 
     ylab = "MPG")
plot(wt, mpg, ann = FALSE) 
  • Changing/adding the details afterwards
abline(h=25) # a reference line
abline(lm(mpg~wt)) # look at the argument, what's lm?
title(main = "Regression of MPG on Weight", xlab = "Weight", ylab = "MPG")

Manipulating graphs (base package)

  • par() customizes many features of graphs such as fonts, colors, axes, and titles
  • par(optionname=value, optionname=value, …)
par()              # view current settings
orig_par <- par()  # save current settings
par(col.lab="red") # red x and y labels 
plot(wt, mpg)      # create a plot with these new settings 
par(orig_par)      # restore original settings
plot(wt, mpg)
plot(wt, mpg, col.lab="red") # change settings withing plot()
?par # see all the options

Wrap-up exercise

May use a smaller version of the hflights dataset,
e.g., hflightsSub <- hflights[sample(1:nrow(hflights), 10000, replace = FALSE), ]

Basics
1) Plot a histogram of the flight delays with negative delays set to zero, censoring delay times at a maximum of 60 minutes.
2) Plot the arrival delay against the departure delay as a scatterplot. And give it a main title and axis labels.
3) Output it as a PDF and see if you'd be comfortable with including it in a report/paper.
4) Make a boxplot of the departure delay as a function of the day of week.

Managing R Project

  • R Workspace
    • Current R working environment including any user-defined objects
    • Save and load workspace
  save.image("r-bootcamp.Rdata") # save workspace  
  rm(list=ls()) # remove all objects
  load("r-bootcamp.Rdata") # bring the workspace back
  save.image() # by default it saves workspace to .Rdata 

Managing R Project (cont.)

  • R working directory
    • the folder where R extracts and save files
    • getwd() and setwd()
curr_wd <- getwd() # returns absolute path to the working directory
setwd("data") # change working directory to data folder
setwd(file.path('~', 'Desktop'))

Managing R Project (cont.)

R Packages

  • Installing packages
  • Loading and attaching packages
    • library() v.s. require()
  • Pakcage namespace
require("datasets") # load/attach datasets
ls('package:datasets') 
airmiles # airmiles object in datasets package
airmiles <- 0 # Oops! overwritten?
datasets::airmiles # package namespace
rm(airmiles) # removes user defined object airmiles

Wrap-up Exercise

Basics
1) Figure out what your current working directory is.
2) Change your working directory to another folder.

Challenges
3) Make a plot with the airline data. Save it as a PDF in Desktop. Now see what happens if you set the width and height arguments to be very small and see how it affects the resulting PDF. Do the same but setting width and height to be very large.

Solutions and Discussions

Module 3: Managing R Project

Break

Take a break!

Module 4: Programming in R

  • Functions
  • Loops and vectorization
  • if-else
  • Debugging

User-defined Functions

  • Modulize your code by encapsulating a set of operations
  • Eliminate redundancy
  • Increase readability

User-defined Functions (cont)

mult_fun <- function(a = 1, b = 1) {
  return(a*b)
}

mult_fun  # show the function's code
mult_fun(2,3) # function call
mult_fun() # would this be an error?
  • A function returns the last value operated
  • A function returns only one object; use a list to return multiple objects
  • Every operation in R is a function call
x <- 10; y <- 20
x + y
`+`(x, y)

Loops

  • For loops
for(i in 1:10) {
  print(i)
}
  • While loops
i <- 0
while(i < 5) {
  i <- i + 1 
  print(i)
}

Vectorization

########## a bad loop, with 'growing' data
set.seed(42);
m=1000; n=1000;
mymat <- replicate(m, rnorm(n)) # create matrix of normal random numbers
system.time(
  for (i in 1:m) {
    for (j in 1:n) {
      mymat[i,j] <- mymat[i,j] + 10*sin(0.75*pi)
    }
  }
)
#### vectorized version
set.seed(42);
m=1000; n=1000;
mymat1 <- replicate(m, rnorm(n))
system.time(
  mymat1 <- mymat1 + 10*sin(0.75*pi)
)

If-else statement

if (condition1) {
  # do this when condition1 == TRUE
} else if (condition2) {
  # do this when condition2 == TRUE
} else {
  # else do this
}

Debugging R Code

Good Coding Style Practices

  • Always give meaningful names
    • fit-models.R instead of foo.r
    • day_1 instead of d1
  • Spacing around all infix operators (=, +, -, *, etc)
    • average <- mean(feet / 12 + inches, na.rm = TRUE)
    • average<-mean(feet/12+inches,na.rm=TRUE)
  • Always indent the code inside curly braces.
if (y < 0 && debug) {
  message("Y is negative")
} else {
  message("Y is not negative")
}

Wrap-up Exercise

Basics
1) Write an R function that will take an input vector and set any negative values in the vector to zero.

Challenges
2) Write an R function that will take an input vector and set any value below a threshold to be the value of threshold. Optionally, the function should instead set values above a threshold to the value of the threshold.

3) Augment your function so that it checks that the input is a numeric vector and return an error if not. (See the help information for stop().)

Solutions and discussions

1)

2)

3)

Module 5: Getting Help and Beyond