Jeho Park
June 1, 2016
HMC R Bootcamp 2016
DAY 2
Basics
1) Try installing a new library (lmtest) from Console
2) Try installing another library (ggplot2) from Package pane
Challenges
3) From the dataset, hflights, create a subsample based on delay times greater than 10 hours (600 min) from ArrDelay and name it, hflightsSub.
4) Set all of the extreme delays in ArrDelay (more than 300 minutes) in hflightsSub to NA.
5) Convert the data frame, d, back to a matrix named m1 and compare the size of m and m1
attach(mtcars) # Attach mtcars to search path
plot(wt, mpg) # notice objects are called by their names, not mtcars$wt
plot(wt, mpg,
main = "Regression of MPG on Weight",
xlab = "Weight",
ylab = "MPG")
plot(wt, mpg, ann = FALSE)
abline(h=25) # a reference line
abline(lm(mpg~wt)) # look at the argument, what's lm?
title(main = "Regression of MPG on Weight", xlab = "Weight", ylab = "MPG")
par() # view current settings
orig_par <- par() # save current settings
par(col.lab="red") # red x and y labels
plot(wt, mpg) # create a plot with these new settings
par(orig_par) # restore original settings
plot(wt, mpg)
plot(wt, mpg, col.lab="red") # change settings withing plot()
?par # see all the options
May use a smaller version of the hflights dataset,
e.g., hflightsSub <- hflights[sample(1:nrow(hflights), 10000, replace = FALSE), ]
Basics
1) Plot a histogram of the flight delays with negative delays set to zero, censoring delay times at a maximum of 60 minutes.
2) Plot the arrival delay against the departure delay as a scatterplot. And give it a main title and axis labels.
3) Output it as a PDF and see if you'd be comfortable with including it in a report/paper.
4) Make a boxplot of the departure delay as a function of the day of week.
save.image("r-bootcamp.Rdata") # save workspace
rm(list=ls()) # remove all objects
load("r-bootcamp.Rdata") # bring the workspace back
save.image() # by default it saves workspace to .Rdata
curr_wd <- getwd() # returns absolute path to the working directory
setwd("data") # change working directory to data folder
setwd(file.path('~', 'Desktop'))
R Packages
require("datasets") # load/attach datasets
ls('package:datasets')
airmiles # airmiles object in datasets package
airmiles <- 0 # Oops! overwritten?
datasets::airmiles # package namespace
rm(airmiles) # removes user defined object airmiles
Basics
1) Figure out what your current working directory is.
2) Change your working directory to another folder.
Challenges
3) Make a plot with the airline data. Save it as a PDF in Desktop. Now see what happens if you set the width and height arguments to be very small and see how it affects the resulting PDF. Do the same but setting width and height to be very large.
Module 3: Managing R Project
Take a break!
mult_fun <- function(a = 1, b = 1) {
return(a*b)
}
mult_fun # show the function's code
mult_fun(2,3) # function call
mult_fun() # would this be an error?
x <- 10; y <- 20
x + y
`+`(x, y)
for(i in 1:10) {
print(i)
}
i <- 0
while(i < 5) {
i <- i + 1
print(i)
}
########## a bad loop, with 'growing' data
set.seed(42);
m=1000; n=1000;
mymat <- replicate(m, rnorm(n)) # create matrix of normal random numbers
system.time(
for (i in 1:m) {
for (j in 1:n) {
mymat[i,j] <- mymat[i,j] + 10*sin(0.75*pi)
}
}
)
#### vectorized version
set.seed(42);
m=1000; n=1000;
mymat1 <- replicate(m, rnorm(n))
system.time(
mymat1 <- mymat1 + 10*sin(0.75*pi)
)
if (condition1) {
# do this when condition1 == TRUE
} else if (condition2) {
# do this when condition2 == TRUE
} else {
# else do this
}
Stopping on a line
Read https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio
if (y < 0 && debug) {
message("Y is negative")
} else {
message("Y is not negative")
}
Basics
1) Write an R function that will take an input vector and set any negative values in the vector to zero.
Challenges
2) Write an R function that will take an input vector and set any value below a threshold to be the value of threshold. Optionally, the function should instead set values above a threshold to the value of the threshold.
3) Augment your function so that it checks that the input is a numeric vector and return an error if not. (See the help information for stop().)
1)
2)
3)