I will begin by simulating some data that represent outmigrating juvenile salmon.
Some fish left early in the season and others left later.
I want the fish lengths to be normally distrubuted and for the fish that left earlier to be smaller.
#making some fake data using rnorm (I used round() so that I didn't have soo many decimal places)
early.juvs <- round(rnorm(n = 100,mean = 80,sd = 15),1)
late.juvs <- round(rnorm(n = 100,mean = 100,sd = 10),1)
juv.lengths <- c(early.juvs,late.juvs)
#rep() repeats items however many times you want
type <- rep(c("early","late"),each=100)
#putting all that stuff into a data.frame
tmp <- data.frame(type,
juv.lengths,stringsAsFactors = F)
#now looking at the data
head(tmp)
## type juv.lengths
## 1 early 92.3
## 2 early 55.7
## 3 early 68.0
## 4 early 67.0
## 5 early 93.2
## 6 early 63.8
table(tmp$type)
##
## early late
## 100 100
Let’s have a look at the data we just simulated. I use a library called ggplot2 for most of my graphics.
#install.packages("ggplot2")
library(ggplot2)
ggplot(tmp, aes(x=juv.lengths))+
facet_wrap(~type, ncol=1)+
geom_histogram(binwidth=5)
Now I want to simulate data for 100 sampling seasons.
out <- NULL
years <- 2000:2099
head(years)
## [1] 2000 2001 2002 2003 2004 2005
for(i in 1:length(years)){
#making some fake data
early.juvs <- round(rnorm(n = 100,mean = 80,sd = 15),1)
late.juvs <- round(rnorm(n = 100,mean = 100,sd = 10),1)
juv.lengths <- c(early.juvs,late.juvs)
type <- rep(c("early","late"),each=100)
#putting it into a data.frame, adding the year, and adding it to a object
tmp <- data.frame(type,juv.lengths,stringsAsFactors = F)
tmp$year <- years[i]
out <- rbind(out,tmp)
}
head(out)
## type juv.lengths year
## 1 early 87.9 2000
## 2 early 97.0 2000
## 3 early 59.1 2000
## 4 early 78.3 2000
## 5 early 94.6 2000
## 6 early 71.0 2000
dim(out)
## [1] 20000 3
#Adding a graphic at the end
ggplot(tmp, aes(x=juv.lengths))+
facet_wrap(~type, ncol=1)+
geom_histogram(binwidth=5)
Moving on to statistically analyzing our data.
#are they different in size?
t.test(juv.lengths~type,data=out)
##
## Welch Two Sample t-test
##
## data: juv.lengths by type
## t = -109.8604, df = 17497.36, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -20.08488 -19.38074
## sample estimates:
## mean in group early mean in group late
## 80.28659 100.01940
#did year have an effect?
anova(lm(juv.lengths~ type + year, data=out))
## Analysis of Variance Table
##
## Response: juv.lengths
## Df Sum Sq Mean Sq F value Pr(>F)
## type 1 1946919 1946919 12068.6988 <2e-16 ***
## year 1 1 1 0.0075 0.9309
## Residuals 19997 3225910 161
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There are many statistical functions in R and in libraries written by others.
There are the typical stats for normally distributed data - anova(), lm(), t.test().
Here are some for non-normal data: kruskal.test(), wilcox.test(). But there are many more!
Here are some examples of other types of stats commonly done in R - with a focus on GLMs and mixed-effect regression:
http://www.ats.ucla.edu/stat/dae/
If you don’t see the stats you commonly use, just Google the stats you want and most pop up!
You can make your own functions in R too!
Here is an example using the code I just wrote to simulate data. My goal is to create a function that will simulate juvenile length data from early and late outmigrating fish. I want to be able to change the sample size, means, and standard devaiations to whatever I want. All of this can can be done by defining variables in a function()
length.tester <- function(n=100, mean1=100, mean2=80, sd1=15, sd2=20){
#making some fake data
early.juvs <- rnorm(n = n,mean = mean1, sd = sd1)
late.juvs <- rnorm(n = n,mean = mean2, sd = sd2)
juv.lengths <- c(early.juvs,late.juvs)
type <- rep(c("early","late"),each=n)
#putting it into a data.frame
tmp <- data.frame(type,juv.lengths,stringsAsFactors = F)
return(tmp)
}
my.data <- length.tester(n = 100,mean1 = 97,mean2 = 95,sd1 = 13,sd2 = 15)
head(my.data)
## type juv.lengths
## 1 early 81.57565
## 2 early 112.85256
## 3 early 97.12794
## 4 early 111.61760
## 5 early 87.21090
## 6 early 89.56074
There are many types of graphics R can make using default functions.
Here is an example using our simulated data.
head(tmp)
## type juv.lengths year
## 1 early 67.6 2099
## 2 early 85.7 2099
## 3 early 92.0 2099
## 4 early 54.8 2099
## 5 early 71.0 2099
## 6 early 89.1 2099
boxplot(juv.lengths~type,data=tmp,
ylab= "Juvenile fish length (cm)",
xlab= "Time caught in the season")
But I prefer a library called ggplot2.
To install a package from the command line use install.packages().
####Remember to remove the # sign before install.packages() - otherwise R just ignores whatever is after the hashtag.
#install.packages("ggplot2")
I already loaded the ggplot library earlier so I do not need to do it again here.
ggplot(tmp, aes(x=type,y=juv.lengths))+
geom_boxplot()+
labs(x="Time caught in the season",y="Juvenile fish length (cm)")+
theme_bw()
Here are some more fun plots in ggplot!
Note that in the examples below I am using two default datasets that get preloaded with ggplot2.
ggplot(movies, aes(x=rating))+
geom_histogram(aes(fill = ..count..)) +
scale_fill_gradient("Count", low = "green", high = "red")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
ggplot(diamonds, aes(x=carat, y=price, color=cut))+
geom_point(size=3,alpha=0.5)
#Making some more fake data
df <- data.frame(x = rnorm(n = 10000,mean = 5,sd = 3),
y = rnorm(n = 10000,mean = 4,sd = 3))
ggplot(df, aes(x=x, y=y))+
geom_bin2d()+
scale_fill_gradient("Count", low = "blue", high = "red")
I am going to load the library below to use one of its preloaded datasets in a graphic.
#install.packages("rrcov")
library(rrcov)
## Warning: package 'rrcov' was built under R version 3.1.2
## Loading required package: robustbase
## Loading required package: pcaPP
## Scalable Robust Estimators with High Breakdown Point (version 1.3-4)
data(fish)
head(fish)
## Weight Length1 Length2 Length3 Height Width Species
## 1 242 23.2 25.4 30.0 38.4 13.4 1
## 2 290 24.0 26.3 31.2 40.0 13.8 1
## 3 340 23.9 26.5 31.1 39.8 15.1 1
## 4 363 26.3 29.0 33.5 38.0 13.3 1
## 5 430 26.5 29.0 34.0 36.6 15.1 1
## 6 450 26.8 29.7 34.7 39.2 14.2 1
#changing the generic names to common names
fish$Species[fish$Species == 1] <- "Bream"
fish$Species[fish$Species == 2] <- "Whitewish"
fish$Species[fish$Species == 3] <- "Roach"
fish$Species[fish$Species == 4] <- "Parkki"
fish$Species[fish$Species == 5] <- "Smelt"
fish$Species[fish$Species == 6] <- "Pike"
fish$Species[fish$Species == 7] <- "Perch"
#removing all Smelt observations (so there are only 6 types of fish)
fish <- fish[fish$Species != "Smelt",]
ggplot(fish, aes(x=Length1, y=Weight, color= Species))+
facet_wrap(~Species, ncol=2)+
geom_point(size=3)+
labs(x="Length (cm)",y="Weight (g)",color="Species")+
theme_bw()+
theme(axis.text = element_text(size=18, color ="black"),
axis.title = element_text(size=22))
## Warning: Removed 1 rows containing missing values (geom_point).
http://docs.ggplot2.org/current/
Here I will use another library to make 3-D graphics.
These RGL plots will not show up in this webpage yet, this task is currently in development, but if you run the code in R it will work.
library(rgl)
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
plot3d(mtcars,
box=FALSE,
col="steelblue",
type="s",
size=3,alpha=0.75)
Here is one example with simulated data. Note the use of three types of simulated data distributions.
x <- rnorm(1000)
y <- rpois(1000,2)
z <- runif(1000)
plot3d(x, y, z, cex=1.5, size=1, type="s", col=rainbow(1000) )
R also has several libraries that are used to make maps. Here is an example using crimes data in the US.
#Getting some data and putting the format that I want using al libary called reshape2
library(reshape2)
crimes <- data.frame(state = tolower(rownames(USArrests)), USArrests)
crimesm <- melt(crimes, id = 1)
head(crimesm)
## state variable value
## 1 alabama Murder 13.2
## 2 alaska Murder 10.0
## 3 arizona Murder 8.1
## 4 arkansas Murder 8.8
## 5 california Murder 9.0
## 6 colorado Murder 7.9
#Getting some map data in library(maps)
library(maps)
#getting just the states
states_map <- map_data("state")
#now the plot
ggplot(crimes, aes(map_id = state)) +
geom_map(aes(fill = Murder), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat)
last_plot() + coord_map()
We can brake down the stats by different types of crimes and urban population.
ggplot(crimesm, aes(map_id = state)) +
geom_map(aes(fill = value), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat) +
facet_wrap( ~ variable)
Here is an example with New Zealand.
nz <- map_data("nz")
nzmap <- ggplot(nz, aes(x=long, y=lat, group=group)) +
geom_polygon(fill="white", colour="black")+
theme_bw()
# Use cartesian coordinates
nzmap
# Here are two other projections
nzmap + coord_map("cylindrical")
Here is one example using the states.
states <- map_data("state")
usamap <- ggplot(states, aes(x=long, y=lat, group=group)) +
geom_polygon(fill="white", colour="black")
# See ?mapproject for coordinate systems and their parameters
usamap + coord_map("gilbert")
And another with a the world map, using geom_path instead of geom_polygon.
world <- map_data("world")
worldmap <- ggplot(world, aes(x=long, y=lat, group=group)) +
geom_path() +
scale_y_continuous(breaks=(-2:2) * 30) +
scale_x_continuous(breaks=(-4:4) * 45) +
theme_bw()
# Orthographic projection with default orientation (looking down at North pole)
worldmap + coord_map("ortho")
# Looking up up at South Pole
worldmap + coord_map("ortho", orientation=c(-90, 0, 0))
# Centered on New York (currently has issues with closing polygons)
worldmap + coord_map("ortho", orientation=c(41, -74, 0))
#getting the map data for the wold
map.world <- map_data(map = "world")
head(map.world)
## long lat group order region subregion
## 1 -133.3664 58.42416 1 1 Canada <NA>
## 2 -132.2681 57.16308 1 2 Canada <NA>
## 3 -132.0498 56.98610 1 3 Canada <NA>
## 4 -131.8797 56.74001 1 4 Canada <NA>
## 5 -130.2492 56.09945 1 5 Canada <NA>
## 6 -130.0131 55.91169 1 6 Canada <NA>
#plotting it out
ggplot(map.world, aes(x = long, y = lat, group = group))+
geom_polygon()
Note that you can search for Google Maps too using library(ggmaps).
library(ggmap)
First you need to search for the map use ‘?getmap’ to get more information on the topic.
map <- get_map(
# google search string
location = "Hatfield Marine Science Center, Newport, Oregon" ,
# larger is closer
zoom = 13,
maptype = "hybrid" # map type
)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hatfield+Marine+Science+Center,+Newport,+Oregon&zoom=13&size=%20640x640&scale=%202&maptype=hybrid&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hatfield+Marine+Science+Center,+Newport,+Oregon&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
#now plotting it
ggmap(map)+
labs(y= "Latitude", x="Longitude")
Now let’s say we sampled in the bay - here I will simulate some.
samples <- data.frame(lat = rnorm(100,mean = 44.620, sd = 0.003),
long = rnorm(100,mean = -124.015, sd = 0.003))
head(samples)
## lat long
## 1 44.61888 -124.0165
## 2 44.62003 -124.0147
## 3 44.61928 -124.0095
## 4 44.62062 -124.0126
## 5 44.62205 -124.0143
## 6 44.61553 -124.0149
#we can plot those locations using geom_point()
ggmap(map)+
geom_point(data=samples, aes(x=long,y=lat), color="Orange", size=3)+
labs(y= "Latitude", x="Longitude")
Note there are several maptypes, here is another example.
map2 <- get_map(
# google search string
location = "Hatfield Marine Science Center, Newport, Oregon" ,
# larger is closer
zoom = 13,
# map type
maptype = 'terrain' )
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hatfield+Marine+Science+Center,+Newport,+Oregon&zoom=13&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hatfield+Marine+Science+Center,+Newport,+Oregon&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
#make contour plots and add annotation
ggmap(map2)+
geom_density2d(data=samples, aes(x=long,y=lat))+
labs(y= "Latitude", x="Longitude")+
geom_rect(xmin=-124.047, ymin=44.620,
xmax=-124.041, ymax=44.624,
color="Black",fill="white", alpha=0,size=2)
Here is an example of plotting copepod data that Jessica Luo made
library("stringr")
library("animation")
library("matlab",quietly = T)
##
## Attaching package: 'matlab'
##
## The following objects are masked from 'package:rrcov':
##
## ones, repmat, zeros
##
## The following object is masked from 'package:stats':
##
## reshape
##
## The following objects are masked from 'package:utils':
##
## find, fix
##
## The following object is masked from 'package:base':
##
## sum
library("oce")
library("ggplot2")
library("plyr")
library("reshape2")
library("gridExtra",quietly = T)
## Warning: package 'gridExtra' was built under R version 3.1.2
# load in coastline data from oce package
data(coastlineWorld)
coastline.world=data.frame(lon=coastlineWorld[["longitude"]],lat=coastlineWorld[["latitude"]])
# a little function to plot some global biomass data. Must give function the variable to be plotted, the dataset, and the title of the plot
plotting <- function(var, data, title){
print(var)
p <- ggplot(mapping=aes(x=lon, y=lat)) +
geom_path(data=coastline.world) +
geom_point(aes_string(colour=var), size=3, data=data) + theme_bw() + scale_color_gradientn("Biomass, mg C m-3", colours=jet.colors(16), na.value=NA) + labs(title=paste(title, var, sep=" "))
return(p)
}
oopt <- animation::ani.options(interval=5)
# function to animate the plotting using the animate package
FUN <- function(var, data, title){
lapply(var, function(i){
print(plotting(i, data, title))
animation::ani.pause()
})
}
# read in gridded mesozooplankton data from the COPEPOD database
# download zipped file
temp <- tempfile()
fileurl <- "http://www.st.nmfs.noaa.gov/plankton/2012/copepod-2012__biomass-fields.zip"
download.file(fileurl, temp)
files_to_extract <- c("copepod-2012__biomass-fields/data/copepod-2012__cmass-m00-qtr.csv",
"copepod-2012__biomass-fields/data/copepod-2012__cmass-m13-qtr.csv",
"copepod-2012__biomass-fields/data/copepod-2012__cmass-m14-qtr.csv",
"copepod-2012__biomass-fields/data/copepod-2012__cmass-m15-qtr.csv",
"copepod-2012__biomass-fields/data/copepod-2012__cmass-m16-qtr.csv")
unzip(temp, files = files_to_extract) # set exdir = "." # default
unlink(temp) # deletes temporary file, leaves the actual extracted data in place for now
# lists the files within the extracted folder
files <- list.files("copepod-2012__biomass-fields/data/", pattern="copepod-2012", full=T)
# uses adply to read all the csv files within the listed file addresses
cope <- adply(files, 1, read.csv, stringsAsFactors=F)
# deletes unzipped files from working directory
unlink("copepod-2012__biomass-fields/", recursive=T)
cope <- cope[,2:6]
names(cope) <- c("lon", "lat", "n", "biomass.mgCm3", "fileid")
cope$fileid <- str_trim(cope$fileid, side="both")
# cast
cope <- dcast(cope, lon+lat~fileid, value.var="biomass.mgCm3")
# set names
cope_names <- c("Winter", "Spring", "Summer", "Fall")
names(cope) <- c("lon", "lat", "Annual", cope_names)
# Plotting
# winter
pw <- plotting(cope_names[1], data=cope, title="Biomass of Copepods in")
## [1] "Winter"
# spring
ps <- plotting(cope_names[2], data=cope, title="Biomass of Copepods in")
## [1] "Spring"
# summer
psu <- plotting(cope_names[3], data=cope, title="Biomass of Copepods in")
## [1] "Summer"
# fall
pf <- plotting(cope_names[4], data=cope, title="Biomass of Copepods in")
## [1] "Fall"
require("gridExtra")
# using grid.arrange from the gridExtra package to plot the ggplot plots in a 2x2 grid
grid.arrange(pw, ps, psu, pf, nrow=2)
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).
# IF YOU WANTED ANIMATION INSTEAD....
# call the animation function - remove the hashtag
#FUN(var=cope_names, data=cope, title="Biomass of Copepods in")
Finally some of the newest and most exciting, at least in my opinion, libraries in R have to do with integrating with the web.
Unfortunately these libraries do not allow me to embed the examples into a RMarkdown file
so please visit the websites at the top of the webpage for examples or email me for the code I used.
The first example will use the library shiny.
library(shiny)
#runApp("Iris",display.mode = "showcase")
We can do the same thing with 3D graphics.
library(shinyRGL)
#runApp("CROOS", display.mode = "showcase")
Finally, here is a example of a web graphic that is interactive with your cursor via the plot.ly.
library(plotly)
## Loading required package: RCurl
## Warning: package 'RCurl' was built under R version 3.1.2
## Loading required package: bitops
## Warning: package 'bitops' was built under R version 3.1.2
## Loading required package: RJSONIO
#finally a plot.ly example
nmmaps<-read.csv("http://zevross.com/blog/wp-content/uploads/2014/08/chicago-nmmaps.csv", as.is=T)
nmmaps$date<-as.Date(nmmaps$date)
nmmaps<-nmmaps[nmmaps$date>as.Date("1996-12-31"),]
nmmaps$year<-substring(nmmaps$date,1,4)
head(nmmaps)
## city date death temp dewpoint pm10 o3 time season
## 3654 chic 1997-01-01 137 36.0 37.50 13.052268 5.659256 3654 winter
## 3655 chic 1997-01-02 123 45.0 47.25 41.948600 5.525417 3655 winter
## 3656 chic 1997-01-03 127 40.0 38.00 27.041751 6.288548 3656 winter
## 3657 chic 1997-01-04 146 51.5 45.50 25.072573 7.537758 3657 winter
## 3658 chic 1997-01-05 102 27.0 11.25 15.343121 20.760798 3658 winter
## 3659 chic 1997-01-06 127 17.0 5.75 9.364655 14.940874 3659 winter
## year
## 3654 1997
## 3655 1997
## 3656 1997
## 3657 1997
## 3658 1997
## 3659 1997
ggplot(nmmaps, aes(date, temp, color=factor(season)))+ geom_point() +
scale_color_manual(values=c("dodgerblue4", "darkolivegreen4",
"darkorchid3", "goldenrod1"))
#note that USERNAME and KEY are things that are specific and need to change when you run this code - note remove the hashtags at the beginning of the next two lines too
#py <- plotly(username="USERNAME", key="KEY")
#response<-py$ggplotly()
url<-"https://plot.ly/~nick.sard/5"
plotly_iframe <- paste("<iframe scrolling='no' seamless='seamless' src='", url,
"/800/600' width='800' height='600'></iframe>", sep = "")