R Resources

Scott Chamberlain
2013-07-30

Links

This presentation can be viewed at

http://bit.ly/rresources

You can use/modify/etc. the code behind this presentation at

http://bit.ly/rresources_code

R can be your entire workflow

  • Data
  • Manipulation
  • Visualization
  • Analysis
  • Writing

Where do I start?!

  • Install R from here
  • Don't know what a function does?

?plot or ??plot for fuzzy search

or

help("plot")
help(package="ggplot2")

or just execute name of function to see the code

plot

More help please!

More help please! - continued

These are the best place to ask questions, and where you will get the fastest response to a specific query (search the R tag with “[r]”)

Googling R? Use “cran” instead of “R” (cran=comprehensive R archive network)

  • Good for very broad searches, but you often just find stuff on the R help list or one one of the above sites

More on StackOverflow and related sites

Some helpful tips:

  • Do your homework. Search around the internet, and SO itself to make sure your question hasn't been answered already.
  • Reproducible examples. Abstract your specific problem to a very simple case, include all data and code needed to reproduce your problem.
  • Learn Markdown (aka MD). A very simple markup language to embed links, highlight code, emphasis, etc.
    • MD is used to write stuff in SO, & many other places

Development versions of R packages

Install the devtools package

install.packages("devtools")

Install a package from Github

install_github("ggplot2")

Etc. for Bitbucket, gitorious, and locally

IDEs

def: Integrated development environments

These can make R easier if you are a beginner by bringing all the pieces together (plots, code, help), and autocompleting text for you, etc.

Highly recommend RStudio because

  • Cross-platform
  • Free
  • Server versions (can run in a browser = great for teaching, running on another computer)

Tasks in R

  • Getting data

  • Manipulating data

  • Visualization

  • Analysis

  • Writing

Getting local data

  • CSV files best
read.csv("mycoolfile.csv")
  • Can import from XLS/XLSX too
install.packages("gdata")
library(gdata)
read.xls("mycoolfile.xls", sheet="Sheet1")

Data on the web

Um, why would I do this?

Getting data directly in R allows for reproducible workflows = data + analysis + visualizations + writing (hint: see R pkg knitr)

Data/taxonomy/etc. constantly changing = makes sense to query for newest data

rOpenSci at http://ropensci.org/

  • We are building bridges between data on the web and R
  • GBIF, Dryad, ITIS, NCBI, Genbank, eLife, US National Phenology Network, PLOS literature, etc.

Manipulating data

Definitely learn tools for the split-apply-combine strategy

  • plyr split apart objects, do some operation, and summarise
  • reshape2 melt and cast data.frames & other R objects

plyr

library(plyr)
head(iris)[, c(1:2, 5)]
  Sepal.Length Sepal.Width Species
1          5.1         3.5  setosa
2          4.9         3.0  setosa
3          4.7         3.2  setosa
4          4.6         3.1  setosa
5          5.0         3.6  setosa
6          5.4         3.9  setosa
ddply(iris, .(Species), colwise(mean))[, 1:3]
     Species Sepal.Length Sepal.Width
1     setosa        5.006       3.428
2 versicolor        5.936       2.770
3  virginica        6.588       2.974

reshape

library(reshape2)
head(iris)[1:3,]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
iris_m <- melt(iris)
head(iris_m)[1:3,1:3]
  Species     variable value
1  setosa Sepal.Length   5.1
2  setosa Sepal.Length   4.9
3  setosa Sepal.Length   4.7
dcast(iris_m, Species ~ variable)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa           50          50           50          50
2 versicolor           50          50           50          50
3  virginica           50          50           50          50

Visualizations - base plots

plot(hp ~ mpg, data=mtcars, cex=3, cex.axis=2, cex.lab=2)

plot of chunk unnamed-chunk-9

Visualizations - ggplot2 (learn it)

library(ggplot2)
ggplot(mtcars, aes(mpg, hp)) + 
  geom_point(size=4) +
  theme_grey(base_size=20)

plot of chunk unnamed-chunk-10

ggplot2 - but why learn it? This ->

library(ggplot2)
ggplot(mtcars, aes(mpg, hp, colour=gear)) +
  geom_point(size=4) +
  facet_wrap(~carb) +
  theme_grey(base_size=20)

plot of chunk unnamed-chunk-11

Analysis

Way too much to cover here, there can be a lot of intracacies to analyses:

Pulling it all together - Writing

Highly recommend learning knitr

knitr: Mix text w/ code = reproducible documents

Bonus: it's integrated in to RStudio

You can combine LaTeX or Markdown with your code

For Word users = try Markdown with knitr first, shallower learning curve relative to LaTeX.

library(knitr) then in Toolbar do New File then R Markdown to get started

Various other resources