11 February 2016

What's the point?

The statistical language R has many advantages: it

  • provides a complete system for carrying out and communicating data science
  • is incredibly versatile (free, lots of user-contributed packages)
  • is easy to learn (a good first language?)
  • is widely used and growing in popularity

This talk will present some of the ways I've used R as a teaching tool and give you some hands-on examples to work with.

Who am I?

  • Instructor in Earth Sciences at Penn State DuBois
  • Soon to be a Scientific Programmer at main campus
  • Things I like: rocks, writing code, teaching people to write code
  • Author/editor of a free textbook on R programming
  • You can find me at http://sites.psu.edu/papplegate/

What I used to do

Where I am now

A day in the life (photo by Steve Harmic)

Something I made (leanpub.com/raes)

Each chapter in the book includes

  • an introduction to the statistical and/or Earth science topics being taught
  • an explanation of an existing piece of R code
  • an exercise that often involves modifying the existing code
  • questions about the exercise (useful for homework or classroom discussion)
  • a bibliography including links to additional reading on the chapter's subject

Sea level: From emissions to increased flooding risks

A sample exercise, part 1

Sea level data from Jevrejeva et al (2008).

Sea level data from Jevrejeva et al (2008).

Data from Jevrejeva et al (2008). How much will sea level rise in the future?

What this looks like in R

# Make a directory, go get a data file from
# the Internet, and put the file in the directory.  
dir.create("data")
download.file("http://www.psmsl.org/products/reconstructions/gslGRL2008.txt", 
              "data/gslGRL2008.txt", method = "curl")

# Read in the data file.  
sl.data <- read.table("data/gslGRL2008.txt", skip = 14)

# Plot the data.  
plot(sl.data[, 1], sl.data[, 2], # time, sea level
     type = "l", # plot a curve (not points)
     xlab = "Time (years)", # x-axis label
     ylab = "Sea level anomaly (mm)", # y-axis label
     xlim = c(1700, 2100), # x-axis extent
     ylim = c(-250, 500)) # y-axis extent

A sample exercise, part 2

Sea level data from Jevrejeva et al (2008).

Sea level data from Jevrejeva et al (2008).

Fitting a polynomial to the data gives us an idea about future changes, but how sure can we be about this answer?

A sample exercise, part 3

We create replicates of the data with similar statistical properties by bootstrapping and re-fit the polynomial to each replicate.

A sample exercise, part 4

Fitting polynomials to many bootstrap replicates gives a cone of uncertainty for possible future sea level rise. This method is too simple for professional use, but a good classroom exercise.

What's the point?

The statistical language R has many advantages: it

  • provides a complete system for carrying out and communicating data science
  • is incredibly versatile (free, lots of user-contributed packages)
  • is easy to learn (a good first language?)
  • is widely used and growing in popularity

This talk will present some of the ways I've used R as a teaching tool and give you some hands-on examples to work with.

A complete system for data science

  • Has a clearly-best IDE, RStudio
  • Excellent plotting capabilities, extended by ggplot2
  • Has a system for integrating text, pictures, code, and code output into pretty documents (R Markdown)
  • Has a system for making simple Web apps from R code (shiny)

Incredibly versatile

  • Free and open-source
  • Works on all major platforms
  • Over 4,000 user-contributed packages on CRAN
  • Package installation is trivial (e.g., install.packages("ggplot2"))

Easy to learn (a good first language?)

  • Interpreted, not compiled – quick feedback on syntax errors
  • Can make plots out of the box
  • Excellent user community (StackOverflow)
  • Lots of tutorial materials out there; here are some good ones

Widely used and growing in popularity

How you can get started using R

  • Download and install R
  • Download and install RStudio
  • Run the following commands from within RStudio:
install.packages("swirl")
library(swirl)
install_from_swirl("R Programming Alt")
swirl()

So, how are you feeling?

  • I've never used R before, but I'm sold! How do I get started?
    • Fantastic! Go download and install R and RStudio. Install and run swirl using the instructions on the preceding slide.
  • I've used R before, but I'm curious about what you're doing with it.
    • Thanks for your interest! Check out our textbook at leanpub.com/raes and maybe try Exercise #1.
  • I'm interested, but I don't want to install anything right now.