The Possibilities R Endless

Created on November 3rd, 2014

Purpose: This webpage was created to give people that are interested in using R an idea of some of the power and versatility this open source programming language offers. If you’d like to try the code yourself, the examples below can be copied and pasted in your R console and run. Note that it is best to run through the code from the top of this webpage to the bottom, because some of the functions rely on libraries that are loaded at specific times throughout the document.


Topic One: Simulate data

I will begin by simulating some data that represent outmigrating juvenile salmon.
Some fish left early in the season and others left later.
I want the fish lengths to be normally distrubuted and for the fish that left earlier to be smaller.

#making some fake data using rnorm (I used round() so that I didn't have soo many decimal places)
early.juvs <- round(rnorm(n = 100,mean = 80,sd = 15),1)
late.juvs <- round(rnorm(n = 100,mean = 100,sd = 10),1)
juv.lengths <- c(early.juvs,late.juvs)

#rep() repeats items however many times you want
type <- rep(c("early","late"),each=100)

#putting all that stuff into a data.frame
tmp <- data.frame(type,
                  juv.lengths,stringsAsFactors = F)

#now looking at the data
head(tmp)
##    type juv.lengths
## 1 early        92.3
## 2 early        55.7
## 3 early        68.0
## 4 early        67.0
## 5 early        93.2
## 6 early        63.8
table(tmp$type)
## 
## early  late 
##   100   100

Let’s have a look at the data we just simulated. I use a library called ggplot2 for most of my graphics.

#install.packages("ggplot2")
library(ggplot2)
ggplot(tmp, aes(x=juv.lengths))+
  facet_wrap(~type, ncol=1)+
  geom_histogram(binwidth=5)

Now I want to simulate data for 100 sampling seasons.

out <- NULL
years <- 2000:2099
head(years)
## [1] 2000 2001 2002 2003 2004 2005
for(i in 1:length(years)){
  #making some fake data
  early.juvs <- round(rnorm(n = 100,mean = 80,sd = 15),1)
  late.juvs <- round(rnorm(n = 100,mean = 100,sd = 10),1)
  juv.lengths <- c(early.juvs,late.juvs)
  type <- rep(c("early","late"),each=100)
  
  #putting it into a data.frame, adding the year, and adding it to a object
  tmp <- data.frame(type,juv.lengths,stringsAsFactors = F)
  tmp$year <- years[i]
  out <- rbind(out,tmp)
}

head(out)
##    type juv.lengths year
## 1 early        87.9 2000
## 2 early        97.0 2000
## 3 early        59.1 2000
## 4 early        78.3 2000
## 5 early        94.6 2000
## 6 early        71.0 2000
dim(out)
## [1] 20000     3
#Adding a graphic at the end
ggplot(tmp, aes(x=juv.lengths))+
  facet_wrap(~type, ncol=1)+
  geom_histogram(binwidth=5)

Moving on to statistically analyzing our data.

Topic Two: Statistics

#are they different in size?
t.test(juv.lengths~type,data=out)
## 
##  Welch Two Sample t-test
## 
## data:  juv.lengths by type
## t = -109.8604, df = 17497.36, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -20.08488 -19.38074
## sample estimates:
## mean in group early  mean in group late 
##            80.28659           100.01940
#did year have an effect?
anova(lm(juv.lengths~ type + year, data=out))
## Analysis of Variance Table
## 
## Response: juv.lengths
##              Df  Sum Sq Mean Sq    F value Pr(>F)    
## type          1 1946919 1946919 12068.6988 <2e-16 ***
## year          1       1       1     0.0075 0.9309    
## Residuals 19997 3225910     161                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There are many statistical functions in R and in libraries written by others.
There are the typical stats for normally distributed data - anova(), lm(), t.test().
Here are some for non-normal data: kruskal.test(), wilcox.test(). But there are many more!

Here are some examples of other types of stats commonly done in R - with a focus on GLMs and mixed-effect regression:
http://www.ats.ucla.edu/stat/dae/
If you don’t see the stats you commonly use, just Google the stats you want and most pop up!

Topic Three: Personalized functions

You can make your own functions in R too!

Here is an example using the code I just wrote to simulate data. My goal is to create a function that will simulate juvenile length data from early and late outmigrating fish. I want to be able to change the sample size, means, and standard devaiations to whatever I want. All of this can can be done by defining variables in a function()

length.tester <- function(n=100, mean1=100, mean2=80, sd1=15, sd2=20){
  #making some fake data
  early.juvs <- rnorm(n = n,mean = mean1, sd = sd1)
  late.juvs <- rnorm(n = n,mean = mean2, sd = sd2)
  juv.lengths <- c(early.juvs,late.juvs)
  type <- rep(c("early","late"),each=n)
  
  #putting it into a data.frame
  tmp <- data.frame(type,juv.lengths,stringsAsFactors = F)
  return(tmp)
}

my.data <- length.tester(n = 100,mean1 = 97,mean2 = 95,sd1 = 13,sd2 = 15)
head(my.data)
##    type juv.lengths
## 1 early    81.57565
## 2 early   112.85256
## 3 early    97.12794
## 4 early   111.61760
## 5 early    87.21090
## 6 early    89.56074

Topic Four: Graphics

There are many types of graphics R can make using default functions.
Here is an example using our simulated data.

head(tmp)
##    type juv.lengths year
## 1 early        67.6 2099
## 2 early        85.7 2099
## 3 early        92.0 2099
## 4 early        54.8 2099
## 5 early        71.0 2099
## 6 early        89.1 2099
boxplot(juv.lengths~type,data=tmp,
        ylab= "Juvenile fish length (cm)",
        xlab= "Time caught in the season")

But I prefer a library called ggplot2.

To install a package from the command line use install.packages().
####Remember to remove the # sign before install.packages() - otherwise R just ignores whatever is after the hashtag.

#install.packages("ggplot2")

I already loaded the ggplot library earlier so I do not need to do it again here.

ggplot(tmp, aes(x=type,y=juv.lengths))+
  geom_boxplot()+
  labs(x="Time caught in the season",y="Juvenile fish length (cm)")+
  theme_bw()

Here are some more fun plots in ggplot!
Note that in the examples below I am using two default datasets that get preloaded with ggplot2.

ggplot(movies, aes(x=rating))+
  geom_histogram(aes(fill = ..count..)) +
  scale_fill_gradient("Count", low = "green", high = "red")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

ggplot(diamonds, aes(x=carat, y=price, color=cut))+
  geom_point(size=3,alpha=0.5)

#Making some more fake data
df <- data.frame(x = rnorm(n = 10000,mean = 5,sd = 3),
                 y = rnorm(n = 10000,mean = 4,sd = 3))

ggplot(df, aes(x=x, y=y))+
  geom_bin2d()+
  scale_fill_gradient("Count", low = "blue", high = "red")

I am going to load the library below to use one of its preloaded datasets in a graphic.

#install.packages("rrcov")
library(rrcov)
## Warning: package 'rrcov' was built under R version 3.1.2
## Loading required package: robustbase
## Loading required package: pcaPP
## Scalable Robust Estimators with High Breakdown Point (version 1.3-4)
data(fish)
head(fish)
##   Weight Length1 Length2 Length3 Height Width Species
## 1    242    23.2    25.4    30.0   38.4  13.4       1
## 2    290    24.0    26.3    31.2   40.0  13.8       1
## 3    340    23.9    26.5    31.1   39.8  15.1       1
## 4    363    26.3    29.0    33.5   38.0  13.3       1
## 5    430    26.5    29.0    34.0   36.6  15.1       1
## 6    450    26.8    29.7    34.7   39.2  14.2       1
#changing the generic names to common names
fish$Species[fish$Species == 1] <- "Bream"
fish$Species[fish$Species == 2] <- "Whitewish"
fish$Species[fish$Species == 3] <- "Roach"
fish$Species[fish$Species == 4] <- "Parkki"
fish$Species[fish$Species == 5] <- "Smelt"
fish$Species[fish$Species == 6] <- "Pike"
fish$Species[fish$Species == 7] <- "Perch"

#removing all Smelt observations (so there are only 6 types of fish)
fish <- fish[fish$Species != "Smelt",]

ggplot(fish, aes(x=Length1, y=Weight, color= Species))+
  facet_wrap(~Species, ncol=2)+
  geom_point(size=3)+
  labs(x="Length (cm)",y="Weight (g)",color="Species")+
  theme_bw()+
  theme(axis.text = element_text(size=18, color ="black"),
        axis.title = element_text(size=22))
## Warning: Removed 1 rows containing missing values (geom_point).

Note there are many more examples with great documentation at the website

http://docs.ggplot2.org/current/

Here I will use another library to make 3-D graphics.
These RGL plots will not show up in this webpage yet, this task is currently in development, but if you run the code in R it will work.

library(rgl)
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
plot3d(mtcars,
       box=FALSE,
       col="steelblue",
       type="s",
       size=3,alpha=0.75)

Here is one example with simulated data. Note the use of three types of simulated data distributions.

x <- rnorm(1000)
y <- rpois(1000,2)
z <- runif(1000)
plot3d(x, y, z, cex=1.5, size=1, type="s", col=rainbow(1000) )

Topic Five: Maps

R also has several libraries that are used to make maps. Here is an example using crimes data in the US.

#Getting some data and putting the format that I want using al libary called reshape2
library(reshape2)
crimes <- data.frame(state = tolower(rownames(USArrests)), USArrests)
crimesm <- melt(crimes, id = 1)
head(crimesm)
##        state variable value
## 1    alabama   Murder  13.2
## 2     alaska   Murder  10.0
## 3    arizona   Murder   8.1
## 4   arkansas   Murder   8.8
## 5 california   Murder   9.0
## 6   colorado   Murder   7.9
#Getting some map data in library(maps)
library(maps)

#getting just the states
states_map <- map_data("state")

#now the plot
ggplot(crimes, aes(map_id = state)) + 
  geom_map(aes(fill = Murder), map = states_map) +
  expand_limits(x = states_map$long, y = states_map$lat)

last_plot() + coord_map()

We can brake down the stats by different types of crimes and urban population.

ggplot(crimesm, aes(map_id = state)) +
  geom_map(aes(fill = value), map = states_map) +
  expand_limits(x = states_map$long, y = states_map$lat) +
  facet_wrap( ~ variable)

Here is an example with New Zealand.

nz <- map_data("nz")
nzmap <- ggplot(nz, aes(x=long, y=lat, group=group)) +
  geom_polygon(fill="white", colour="black")+
  theme_bw()

# Use cartesian coordinates
nzmap

# Here are two other projections
nzmap + coord_map("cylindrical")

Here is one example using the states.

states <- map_data("state")
usamap <- ggplot(states, aes(x=long, y=lat, group=group)) +
  geom_polygon(fill="white", colour="black")

# See ?mapproject for coordinate systems and their parameters
usamap + coord_map("gilbert")

And another with a the world map, using geom_path instead of geom_polygon.

world <- map_data("world")
worldmap <- ggplot(world, aes(x=long, y=lat, group=group)) +
  geom_path() +
  scale_y_continuous(breaks=(-2:2) * 30) +
  scale_x_continuous(breaks=(-4:4) * 45) +
  theme_bw()

# Orthographic projection with default orientation (looking down at North pole)
worldmap + coord_map("ortho")

# Looking up up at South Pole
worldmap + coord_map("ortho", orientation=c(-90, 0, 0))

# Centered on New York (currently has issues with closing polygons)
worldmap + coord_map("ortho", orientation=c(41, -74, 0))

#getting the map data for the wold
map.world <- map_data(map = "world")
head(map.world)
##        long      lat group order region subregion
## 1 -133.3664 58.42416     1     1 Canada      <NA>
## 2 -132.2681 57.16308     1     2 Canada      <NA>
## 3 -132.0498 56.98610     1     3 Canada      <NA>
## 4 -131.8797 56.74001     1     4 Canada      <NA>
## 5 -130.2492 56.09945     1     5 Canada      <NA>
## 6 -130.0131 55.91169     1     6 Canada      <NA>
#plotting it out
ggplot(map.world, aes(x = long, y = lat, group = group))+
  geom_polygon()

Note that you can search for Google Maps too using library(ggmaps).

library(ggmap)

First you need to search for the map use ‘?getmap’ to get more information on the topic.

map <- get_map(
  # google search string
  location = "Hatfield Marine Science Center, Newport, Oregon" ,
  # larger is closer
  zoom = 13,
  maptype = "hybrid" # map type
  )
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hatfield+Marine+Science+Center,+Newport,+Oregon&zoom=13&size=%20640x640&scale=%202&maptype=hybrid&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hatfield+Marine+Science+Center,+Newport,+Oregon&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
#now plotting it
ggmap(map)+
  labs(y= "Latitude", x="Longitude")

Now let’s say we sampled in the bay - here I will simulate some.

samples <- data.frame(lat = rnorm(100,mean = 44.620, sd = 0.003),
                      long = rnorm(100,mean = -124.015, sd = 0.003))
head(samples)
##        lat      long
## 1 44.61888 -124.0165
## 2 44.62003 -124.0147
## 3 44.61928 -124.0095
## 4 44.62062 -124.0126
## 5 44.62205 -124.0143
## 6 44.61553 -124.0149
#we can plot those locations using geom_point()
ggmap(map)+
  geom_point(data=samples, aes(x=long,y=lat), color="Orange", size=3)+
  labs(y= "Latitude", x="Longitude")

Note there are several maptypes, here is another example.

map2 <- get_map(
  # google search string
  location = "Hatfield Marine Science Center, Newport, Oregon" ,
  # larger is closer
  zoom = 13,
  # map type
  maptype = 'terrain' )
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hatfield+Marine+Science+Center,+Newport,+Oregon&zoom=13&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hatfield+Marine+Science+Center,+Newport,+Oregon&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
#make contour plots and add annotation
ggmap(map2)+
  geom_density2d(data=samples, aes(x=long,y=lat))+
  labs(y= "Latitude", x="Longitude")+
  geom_rect(xmin=-124.047, ymin=44.620,
            xmax=-124.041, ymax=44.624,
            color="Black",fill="white", alpha=0,size=2)

Here is an example of plotting copepod data that Jessica Luo made

library("stringr")
library("animation")
library("matlab",quietly = T)
## 
## Attaching package: 'matlab'
## 
## The following objects are masked from 'package:rrcov':
## 
##     ones, repmat, zeros
## 
## The following object is masked from 'package:stats':
## 
##     reshape
## 
## The following objects are masked from 'package:utils':
## 
##     find, fix
## 
## The following object is masked from 'package:base':
## 
##     sum
library("oce")
library("ggplot2")
library("plyr")
library("reshape2")
library("gridExtra",quietly = T)
## Warning: package 'gridExtra' was built under R version 3.1.2
# load in coastline data from oce package
data(coastlineWorld)
coastline.world=data.frame(lon=coastlineWorld[["longitude"]],lat=coastlineWorld[["latitude"]]) 

# a little function to plot some global biomass data. Must give function the variable to be plotted, the dataset, and the title of the plot
plotting <- function(var, data, title){
  print(var)
  p <- ggplot(mapping=aes(x=lon, y=lat)) + 
    geom_path(data=coastline.world) + 
    geom_point(aes_string(colour=var), size=3, data=data) + theme_bw() + scale_color_gradientn("Biomass, mg C m-3", colours=jet.colors(16), na.value=NA) + labs(title=paste(title, var, sep=" "))  
  return(p)
}

oopt <- animation::ani.options(interval=5)

# function to animate the plotting using the animate package
FUN <- function(var, data, title){
  lapply(var, function(i){
    print(plotting(i, data, title))
    animation::ani.pause()
  })
}


# read in gridded mesozooplankton data from the COPEPOD database
# download zipped file
temp <- tempfile()
fileurl <- "http://www.st.nmfs.noaa.gov/plankton/2012/copepod-2012__biomass-fields.zip"
download.file(fileurl, temp)
files_to_extract <- c("copepod-2012__biomass-fields/data/copepod-2012__cmass-m00-qtr.csv", 
                      "copepod-2012__biomass-fields/data/copepod-2012__cmass-m13-qtr.csv",
                      "copepod-2012__biomass-fields/data/copepod-2012__cmass-m14-qtr.csv",
                      "copepod-2012__biomass-fields/data/copepod-2012__cmass-m15-qtr.csv",
                      "copepod-2012__biomass-fields/data/copepod-2012__cmass-m16-qtr.csv")

unzip(temp, files = files_to_extract) # set exdir = "." # default
unlink(temp) # deletes temporary file, leaves the actual extracted data in place for now

# lists the files within the extracted folder
files <- list.files("copepod-2012__biomass-fields/data/", pattern="copepod-2012", full=T)
# uses adply to read all the csv files within the listed file addresses
cope <- adply(files, 1, read.csv, stringsAsFactors=F)

# deletes unzipped files from working directory
unlink("copepod-2012__biomass-fields/", recursive=T) 

cope <- cope[,2:6]
names(cope) <- c("lon", "lat", "n", "biomass.mgCm3", "fileid") 
cope$fileid <- str_trim(cope$fileid, side="both")

# cast
cope <- dcast(cope, lon+lat~fileid, value.var="biomass.mgCm3")
# set names
cope_names <- c("Winter", "Spring", "Summer", "Fall")
names(cope) <- c("lon", "lat", "Annual", cope_names)

# Plotting
# winter
pw <- plotting(cope_names[1], data=cope, title="Biomass of Copepods in")
## [1] "Winter"
# spring
ps <- plotting(cope_names[2], data=cope, title="Biomass of Copepods in")
## [1] "Spring"
# summer
psu <- plotting(cope_names[3], data=cope, title="Biomass of Copepods in")
## [1] "Summer"
# fall
pf <- plotting(cope_names[4], data=cope, title="Biomass of Copepods in")
## [1] "Fall"
require("gridExtra")
# using grid.arrange from the gridExtra package to plot the ggplot plots in a 2x2 grid
grid.arrange(pw, ps, psu, pf, nrow=2)
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).

# IF YOU WANTED ANIMATION INSTEAD....
# call the animation function - remove the hashtag
#FUN(var=cope_names, data=cope, title="Biomass of Copepods in")

Topic Six: Connecting to the web

Finally some of the newest and most exciting, at least in my opinion, libraries in R have to do with integrating with the web.

Unfortunately these libraries do not allow me to embed the examples into a RMarkdown file
so please visit the websites at the top of the webpage for examples or email me for the code I used.

The first example will use the library shiny.

library(shiny)
#runApp("Iris",display.mode = "showcase")

We can do the same thing with 3D graphics.

library(shinyRGL)
#runApp("CROOS", display.mode = "showcase")

Finally, here is a example of a web graphic that is interactive with your cursor via the plot.ly.

library(plotly)
## Loading required package: RCurl
## Warning: package 'RCurl' was built under R version 3.1.2
## Loading required package: bitops
## Warning: package 'bitops' was built under R version 3.1.2
## Loading required package: RJSONIO
#finally a plot.ly example
nmmaps<-read.csv("http://zevross.com/blog/wp-content/uploads/2014/08/chicago-nmmaps.csv", as.is=T)
nmmaps$date<-as.Date(nmmaps$date)
nmmaps<-nmmaps[nmmaps$date>as.Date("1996-12-31"),]
nmmaps$year<-substring(nmmaps$date,1,4)
head(nmmaps)
##      city       date death temp dewpoint      pm10        o3 time season
## 3654 chic 1997-01-01   137 36.0    37.50 13.052268  5.659256 3654 winter
## 3655 chic 1997-01-02   123 45.0    47.25 41.948600  5.525417 3655 winter
## 3656 chic 1997-01-03   127 40.0    38.00 27.041751  6.288548 3656 winter
## 3657 chic 1997-01-04   146 51.5    45.50 25.072573  7.537758 3657 winter
## 3658 chic 1997-01-05   102 27.0    11.25 15.343121 20.760798 3658 winter
## 3659 chic 1997-01-06   127 17.0     5.75  9.364655 14.940874 3659 winter
##      year
## 3654 1997
## 3655 1997
## 3656 1997
## 3657 1997
## 3658 1997
## 3659 1997
ggplot(nmmaps, aes(date, temp, color=factor(season)))+  geom_point() +   
  scale_color_manual(values=c("dodgerblue4", "darkolivegreen4",  
                              "darkorchid3", "goldenrod1"))

#note that USERNAME and KEY are things that are specific and need to change when you run this code - note remove the hashtags at the beginning of the next two lines too
#py <- plotly(username="USERNAME", key="KEY")
#response<-py$ggplotly()
url<-"https://plot.ly/~nick.sard/5"
plotly_iframe <- paste("<iframe scrolling='no' seamless='seamless' src='", url, 
    "/800/600' width='800' height='600'></iframe>", sep = "")

Thanks for checking out the webpage!

The end!