Exercise 2: R and rainfall analysis

There are many helpful R tutorials that you can find by googling “R tutorial”. Here is one example: http://ww2.coastal.edu/kingw/statistics/R-tutorials/index.html

There are also lots of R blogs and pages for hydrologists, including: http://abouthydrology.blogspot.it/2012/08/r-resources-for-hydrologists.html These links may go out of date quickly, so just google search for R hydrology, etc.

I. Introductory commands

Some important commands are listed below. The . indicate blanks where you would fill in missing info.

?...                      # Loads help files for a specific command   
?getwd                  # Loads help files for "getwd"   
??getwd               # Searches for the word "getwd" in all the help files   
setwd(.)                # Sets the working directory to .   
getwd()                 # Shows the current working directory   
dir()                     # List the files in the working directory   
ls()                      #  List the variables in your R workspace   

# "#" Indicates a comment.  The text . is ignored   
install.packages(".")           # Gets user-created packages from online repositories   
library(.)                      #  Loads packages that you've installed

II. Running commands:

Commands can be run from either the prompt in the “Console” window by typing them in directly at the “>”“, or from a script window by putting the cursor on a line and clicking”Run" (the Run icon has a green arrow pointing to the word Run). I prefer to use scripts for most commands, because you can re-run them and save them for later.

Example 1: Starting R, change directories

Start R studio (“Start”>“RStudio”)

First, put the name of your class directory into a variable “mydir”.

mydir = "Z:" #  This puts the text for your directory in the variable mydir
mydir  # I always type the name of the variable after creating it to see if R did what I expect.

## [1] "Z:"

Now, go to your directory using setwd(), and list the contents:

setwd(mydir)
dir()

Example 2: Creating a script

Click “File>New File > RScript” Put the commands from the two boxes above into the Rscript window, highlight the text and click >Run. You can also run the lines by clicking on the first line, then click “>Run” repeatedly to run each line.

III. Data structures in R

A. Vectors and matrices

Vectors and matrices contain only one class of data (either numbers or text, but not both).
A vector is created using “c”:

x = c(10,20,30)
x  # Again, I like to type the variable after creating it to check its contents

## [1] 10 20 30

y = c("Joe","Harry","Larry")
y

## [1] "Joe"   "Harry" "Larry"

What happens if you type x = c(1,“Joe”)?

Find out the class of each object using class():

class(x)

## [1] "numeric"

class(y)

## [1] "character"

You can refer to elements of a vector or matrix with [ ]. What do each of the following do?

x[2]
x/5
x[2]/5
y[3]
paste(y[1],y[2],sep=" ")
paste0(y[1],y[2])
x2 = as.character(x)
x3 = as.numeric(x2)

Boolean operators. What do each of the following do?

x>15
x[x>15]
which(x>15)
length(x<15)
length(x[x<15])

B. Data frames and Reading in data

Data frames are perhaps the most widely used data format in R, especially for data analysis. They can contain a variety of data types, including text and numbers. Data frames also have names for each of their columns.
To experiment with this, we will load some precipitation data and plot it.

1. Load the data from a published google sheet using read.csv.

Here I have published some data in a google sheet to the web using File -> Published to the web, csv option. Doing this gives you a url that can be referenced in R to read in the data.

#fname = "lakeside_precip_hourly.txt"   # Name of the file to be loaded.
#loadname = paste(indir,fname,sep="")  # Creates a string with both directory and file name.
#lakeside.precip.hourly = read.table(loadname,skip=7)  #  Loads the data. 

lakeside.url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vSe7Pagem3LgS0-vO98FWHZTSDyjAWvzmm60752kEmwI5UvqPD-AZlSEj-mY0ZAlb-fWd63-CDook9M/pub?gid=0&single=true&output=csv"
lakeside.precip.hourly = read.csv(url(lakeside.url))

See how the functions read.csv() and url() work. Type:

?read.csv
?read.url

Now, look at the variable x that you just loaded:

lakeside.precip.hourly

But this prints out all of x–too much! To just see the first 6 lines of x:

head(lakeside.precip.hourly)

##          Date.time Precip.inches
## 1 1967-04-30 19:00             0
## 2 1967-04-30 20:00             0
## 3 1967-04-30 21:00             0
## 4 1967-04-30 22:00             0
## 5 1967-04-30 23:00             0
## 6 1967-05-01 00:00             0

2. Look at subsets of the rainfall data.

What do each of the following commands do?:

names(lakeside.precip.hourly)
lakeside.precip.hourly[,1]
lakeside.precip.hourly[,2]
lakeside.precip.hourly[1:10,]
lakewide.precip.hourly[1:10,"Precip.inches"]

C. Dates in R.

1. Date and time variables can be created using “strptime”, which creates something called a POSIX date-time object.

Date.time.POSIX = strptime(lakeside.precip.hourly$Date.time,format="%Y-%m-%d %H:%M")
head(Date.time.POSIX)

## [1] "1967-04-30 19:00:00 PDT" "1967-04-30 20:00:00 PDT"
## [3] "1967-04-30 21:00:00 PDT" "1967-04-30 22:00:00 PDT"
## [5] "1967-04-30 23:00:00 PDT" "1967-05-01 00:00:00 PDT"

Notice that date.time has the time zone associated with it. The computer guesses the time zone, but you can also specify the time zone with tz (if needed, see ?strptime).

III. Plotting data

Check out the plot command in help using ?plot

A. Plot the time series of rainfall using your date.time object and the rainfall column of x:

plot(Date.time.POSIX,lakeside.precip.hourly$Precip.inches,type="l")  # "l" means plot it as a "line"

Now change the axis labels:

plot(Date.time.POSIX,lakeside.precip.hourly$Precip.inches,type="l",xlab="",ylab="Hourly precipitation at Lakeside, inches")

IV. Time series management

A. Aggregate your houly data to daily and plot it.

1. Use the aggregate function

dates = format(Date.time.POSIX,"%Y-%m-%d")  # What does this do?
lakeside.precip.daily = aggregate(lakeside.precip.hourly$Precip.inches,by=list(dates),FUN=sum)
names(lakeside.precip.daily)  # Are these descriptive names?

## [1] "Group.1" "x"

  #  No, so replace with better names:
names(lakeside.precip.daily) = c("Date.text","Precip.inches")

#  The date is now of class "text", which R cannot read as a Date.  So, use as.Date to convert to a date object:
lakeside.precip.daily$Date = as.Date(lakeside.precip.daily$Date.text,"%Y-%m-%d")

2. Plot the time series of daily precipitation.

See the plot command in III.A. Your output should look like this:

3. Sort the precipitation data to find the largest daily events.

daily.sorted = sort(lakeside.precip.daily$Precip.inches,decreasing=TRUE)
daily.sorted[1:20]

##  [1] 3.822 3.316 3.239 3.097 3.090 2.813 2.650 2.490 2.489 2.438 2.427
## [12] 2.423 2.369 2.356 2.285 2.280 2.242 2.230 2.181 2.179

B. Time series management: hydroTSM

The package “hydroTSM” allows you to explore and summarize hydrological time series data.

1. First, install the package, which you only need to do once on a given computer:

install.packages("hydroTSM")

Now load the library:

library(hydroTSM)

2. Generate time series object and plot

You first need to convert your time series into a “zoo” class, which is a time series object.

lakeside.daily.ts = zoo(lakeside.precip.daily$Precip.inches,order.by=lakeside.precip.daily$Date)
head(lakeside.daily.ts)

## 1967-04-30 1967-05-01 1967-05-02 1967-05-03 1967-05-04 1967-05-05 
##          0          0          0          0          0          0

plot(lakeside.daily.ts,xlab="",ylab="Daily rainfall, inches")

See how many days have good data using the “dwi” function (use ?dwi to see what it does)

dwi(lakeside.daily.ts,out.unit="years")

## 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 
##  246  366  365  365  365  366  365  365  365  366  365  365  365  366  365 
## 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 
##  365  365  366  365  365  365  366  365  365  365  366  365  365  365  366 
## 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 
##  365  365  365  366  365  365  365  366  365  365  365  366  365    1

Are there any years you should leave out of the analysis?

3. hydroplot

Try the tool “hydroplot”:

dev.new()  # Generates a new plot window.
hydroplot(lakeside.daily.ts,var.type="Precipitation",var.unit="inches",ylab="Rainfall")

4. Monthly precipitation

m <- daily2monthly(lakeside.daily.ts, FUN=sum, na.rm=TRUE)
plot(m,xlab="",ylab="Monthly precipitation, inches")

C. Annual (water year) precipitation

Unfortunately, hydroTSM doesn’t have a way to summarize on water year. For that, we have to generate a list of dates we want to define the water year, label the time series with those water years, and sum the precipitation values for those labels (yikes!).

Note: I found the solution below at http://stackoverflow.com/questions/22073881/hydrological-year-time-series after googling “R hydroTSM october monthly2annual”)

breaks <- seq(as.Date("1967-10-01"), length=44, by="year")  
  # Length is the number of years you want the breaks for.
# labels.wy = seq(1968,length=43)  # Why do the labels start in 1968?
years.breaks = as.numeric(format(breaks,"%Y"))
labels.wy = years.breaks[2:length(breaks)]  # Take all but the first year in the breaks.  Why?
lakeside.precip.daily$hydroYear <- cut(lakeside.precip.daily$Date, breaks,labels=labels.wy)

Check to make sure the hydroYears were assigned correctly by looking at the first 600 lines:

lakeside.precip.daily[1:600,]

Now calculate the sum of precipitation by hydroYear using “aggregate”:

x.annual.wy = aggregate(lakeside.precip.daily$Precip.inches,by=list(lakeside.precip.daily$hydroYear),FUN=sum)
names(x.annual.wy) = c("Year","Rainfall.inches")
# x.daily$Group.1 is a "factor" class, which is annoying and doesn't plot right.
# Generate a vector of numerical values of year:
years = as.numeric(as.character(x.annual.wy$Year))
  # why do we need as.character?  What happens if you don't use "as.character"?
plot(years,x.annual.wy$Rainfall.inches,type="l",ylab="Annual precip, inches",xlab="Water year")

V. Rainfall IDF grids from NWS

Grids with storm depts for different times and intervals are available from the National Weather service using the package “rainfreq”.

A. Load and view rainfall frequency maps.

1. Install and load “rainfreq” (see earlier package installations for example).

2. Use “extract_freq” to download the 24 hour, 10-year storm for California.

First, look at what the function does and how to use it:

?extract_freq

Now use it, and assign the result to a raster, names “sw_24h_10yr”, and plot it. Note: The units are thousandths of an inch (e.g. 2000 = 2 inches)

sw_24h_10yr = extract_freq(region_name="sw",storm_RP=10,storm_duration="24h")
plot(sw_24h_10yr)

print(sw_24h_10yr)

## class       : RasterLayer 
## dimensions  : 1397, 2612, 3648964  (nrow, ncol, ncell)
## resolution  : 0.0083333, 0.0083333  (x, y)
## extent      : -124.7584, -102.9918, 31.32503, 42.96665  (xmin, xmax, ymin, ymax)
## coord. ref. : NA 
## data source : in memory
## names       : layer 
## values      : 825, 14526  (min, max)

B. Clearing variables

When exiting R, a window will appear asking if you want to save the R Workspace. ** Uncheck that box, or your variables will come up the next time you open R, which can be a problem.

If you accidentally saved your workspace and want to clear the variables, first look at the variables in your workspace:

ls()

#  If you want to then remove them:
rm(list=ls())

If you want to keep just some variables, you can develop a vector of variables you want to keep, and just remove the others:

variables.in.my.workspace = ls()
variables.I.want.to.keep = c("sw_24h_10yr", "x.annual.wy","x2")

variables.to.clear = variables.in.my.workspace[!(variables.in.my.workspace %in% variables.I.want.to.keep)]
rm(list=variables.to.clear)
  # the ! means "not"

C. Graduate students and intrepid undergrads: Extract IDF values over your watershed

1. Load your watershed boundary

indir.shp = "Z:"  #  The directory that has your watershed boundary in it.
fname.shp = "mywatershed_in_NLCD_projection"  #  The name of your watesrshed boundary

The command readOGR (rgdal library) reads in a shapefile and its projection information.

library(rgdal)
wshed.shp.ogr = readOGR(dsn=indir.shp,fname.shp) # dsn is the directory containing the shpfile.

## OGR data source with driver: ESRI Shapefile 
## Source: "G:\large_datasets\USA\california\sdcounty\hydrography_dataset", layer: "los_coches_wshed_NLCDprojection"
## with 1 features
## It has 7 fields
## Integer64 fields read as strings:  OBJECTID_1 OBJECTID Id gridcode

2. Define the projection of the NWS data and project the watershed boundary into the NWS projection.

nws.crs = "+proj=longlat +datum=WGS84"
wshed.proj <- spTransform(wshed.shp.ogr, nws.crs)

3. Now plot the two:

plot(sw_24h_10yr)
plot(wshed.proj,add=TRUE)

We can’t see our tiny watershed, so set the extent of the map to the extent of wshed.proj:

wshed.extent = extent(wshed.proj)
plot(sw_24h_10yr, ext=wshed.extent)
plot(wshed.proj,add=TRUE)

4. Extract the cell values over your watershed:

mywshed.24hr10yr = mean(as.numeric(unlist(extract(sw_24h_10yr,wshed.proj,method="simple"))))/1000

What object class is returned by extract(sw_24h_10yr,wshed.proj,method=“simple”)? What does unlist() do? What class is returned by unlist, and why do we have to do as.numeric? How would you find the min and max in addition to the mean?