In part 1 of this tutorial we tidied up some data from the IDEMdata package. In this part of the tutorial we will use that data to create precipitation graphs. And in part 3 we will use functions and loops to automate the creation of these graphs.
We will be using the wide.precip.df data frame to create the precipitation graphs. See part 1 for instructions on how this data frame was created. Use the data() function to load the data from the IDEMdata package.
library(devtools)
install_github("InDEM/IDEMdata")
library(IDEMdata)
data(wide_precip)
For the graphs, we will first use the base barplot() function to display precipitation levels over time, with the chemistry sampling values overlayed as points. Then we will use the lattice and ggplot2 packages to display the same data in alternative ways.
For the base graphics plot, we first display the precipitation data as a bar plot. We also save the plot as a variable to retain information about its dimensions.
# set the margins of the plot, but save the default settings
op <- par(mar=c(5, 4, 4, 6) + 0.1)
# remove duplicate days
precip <- unique(wide.precip.df[, c("date", "precipitation")])
b.plot <- barplot(height = precip$precipitation, names.arg = precip$date,
xlab="Date", ylab="Precipitation (in)")
We will be using this plot as a back drop for plotting the watershed sampling data as points. The first plot will be percent saturation for station LMG-05-0009.
station.1 <- wide.precip.df[wide.precip.df$station == "LMG-05-0009" &
!is.na(wide.precip.df$station), ]
In order to add points to a bar plot, we’ll use the x locations that are found in the b.plot variable.
# subset the x-axis locations of the bars by using the julian day of the samples
bar.x <- b.plot[station.1$jday]
# plot a new figure on top of the current plot
par(new = TRUE)
plot(x = bar.x, y = station.1$sat, xlab = "", ylab = "", pch = 16,
ylim = c(0.5*min(station.1$sat, na.rm = TRUE), 2*max(station.1$sat, na.rm = TRUE)),
axes = FALSE, col = "red")
# add another y-axis
axis(4, col = "red", col.axis = "red")
# add a label for the new axis
mtext("% Saturation", side = 4, line=3, cex = par("cex.lab"), col = "red")
# add a title to the plot
title(main="Site 1")
Finally, we add a legend.
legend("topright", # places a legend at the appropriate place
legend = c("Precip","% Sat"), # puts text in the legend
lty=c(1, 0), pch = c(NA, 19), # gives the legend appropriate symbols
col=c("black","red")) # appropriate colors
# reset graphing parameters to original default settings
par(op)
Now we’ll make another graph, this time using E. coli data.
op <- par(mar=c(5, 4, 4, 6) + 0.1)
b.plot <- barplot(height = precip$precipitation, names.arg = precip$date,
xlab="Date", ylab="Precipitation (in)")
bar.x <- b.plot[station.1$jday]
par(new = TRUE)
plot(x = bar.x, y = station.1$e_coli, xlab = "", ylab = "", pch = 16,
ylim = c(0.5*min(station.1$e_coli, na.rm = TRUE), 2*max(station.1$e_coli, na.rm = TRUE)),
axes = FALSE, col = "red")
axis(4, col = "red", col.axis = "red")
# add a dashed line to reference the water quality standard
abline(h=125, lty=2, col="red")
mtext("E. coli (MPN/100mL)", side = 4, line=3, cex = par("cex.lab"), col = "red")
title(main="Site 1")
legend("topright", legend = c("Precip","E. coli", "WQS"),
lty=c(1, 0, 2), pch = c(NA, 19, NA), col=c("black","red", "red"))
par(op)
lattice is another very popular graphics package. For an introduction and examples see here and here. We’ll also use another package called latticeExtra which will make it easier to add the second y-axis to our package
Here’s a first attempt at a lattice graph:
library(lattice)
library(latticeExtra)
# a bar plot of the precipitation data
rain <- xyplot(precipitation ~ date, data = wide.precip.df, type = "h",
ylab = "Precipitation (in)")
# a scatter plot of the saturation data
sat <- xyplot(sat ~ date, data = station.1, type = "p", ylab = "% Saturation",
ylim = c(0, 2*max(station.1$sat, na.rm = TRUE)))
# display both on the same graph
doubleYScale(rain, sat, add.ylab2 = TRUE, under = TRUE,
text = c( "Precipitation (in)", "% Saturation"), columns = 2,
type = c("l", "p"))
We can change the format of the plot to look like our base bar plot by using the update() function.
update(trellis.last.object(),
par.settings = simpleTheme(col = c("black", "red"), pch = c(NA, 16)))
The advantage of the lattice package is that it allows you to easily split up the data by conditioning. We’ll create some variables that will allow us to split up our data by time periods.
# create a data set for one station in a format for lattice conditioning
lat.data <- merge(precip, station.1[, c("date", "sat")], all = TRUE)
# subset down to 2013 data
lat.data.2013 <- lat.data[substring(as.character(lat.data$date), 1, 4) == "2013", ]
# create a time variable in the POSIXlt format
time <- as.POSIXlt(lat.data.2013$date)
# create a month factor variable and arrange the factors in the proper order
month <- factor(months(time), levels = month.name[4:12])
# create a day-of-the-month variable
mday <- time$mday
# add the month and mday variables to the data frame
lat.data.2013 <- data.frame(lat.data.2013, month, mday)
Now we’ll creat two lattice plots, save them as variables, and display them together using the doubleYScale() function.
rain <- xyplot(precipitation ~ mday | month, data = lat.data.2013, type = "h",
ylab = "Precipitation (in)", layout = c(3, 3), as.table = TRUE,
xlab = "2013")
sat <- xyplot(sat ~ mday | month, data = lat.data.2013, type = "p",
ylab = "% Saturation", ylim = c(0, 2*max(station.1$sat, na.rm = TRUE)))
doubley <- doubleYScale(rain, sat, add.ylab2 = TRUE, under = TRUE,
text = c( "Precipitation (in)", "% Saturation"),
column = 2, type = c("l", "p"))
update(doubley,
par.settings = simpleTheme(col = c("black", "red"), pch = c(NA, 16)))
The ggplot2 package is also useful for displaying relationships between variables by conditioning. However, it’s not possible to add two y-axes to the same plot using ggplot2. So we will create two cumulative variables that will give us information about rainfall in days prior to the date of the field samples. One variable will be the sum of rainfall on the same day and one day prior to the sample. Another variable will be the sum of the same day, one day and 2 days prior to the sample.
# add precipitation column to a one day lagged copy of itself
precip_2day <- lat.data$precipitation +
c(0, lat.data$precipitation)[1:length(lat.data$precipitation)]
# add precipitation column to a one day and a two day lagged copy of itself
precip_3day <- lat.data$precipitation +
c(0, lat.data$precipitation)[1:length(lat.data$precipitation)] +
c(0, 0, lat.data$precipitation)[1:length(lat.data$precipitation)]
# add the cumulative precipitation variables to the data frame
gg.data <- data.frame(lat.data, precip_2day, precip_3day)
Now we reshape this wide data frame into a long format and plot it.
library(reshape2)
long.gg.data <- melt(gg.data, id.vars = c("date", "sat"))
library(ggplot2)
# assign x variable, y variable, and the conditioning variable
ggplot(long.gg.data, aes(x = date, y = sat, size = value)) +
# break the plot up by the number of factors in the 'variable' column
facet_grid(variable ~ .) +
# add the points, set their color, and adjust the size used for conveying precipitation
geom_point(color = "steelblue") + scale_size_continuous(range = c(3,8)) +
# add labels
ylab("% Saturation") + xlab("Date") + labs(size = "Precip. (in)") +
# set the color of the title panel for each facet
theme(strip.background = element_rect(fill="lemonchiffon1"))
In part 3 of this tutorial we learn how to automate the creation of these graphs using functions and loops.
This tutorial was created using RStudio’s R Markdown. The code can be found on GitHub.