There are many helpful R tutorials that you can find by googling “R tutorial”. Here is one example: http://ww2.coastal.edu/kingw/statistics/R-tutorials/index.html
There are also lots of R blogs and pages for hydrologists, including: http://abouthydrology.blogspot.it/2012/08/r-resources-for-hydrologists.html These links may go out of date quickly, so just google search for R hydrology, etc.
Some important commands are listed below. The . indicate blanks where you would fill in missing info.
?... # Loads help files for a specific command
?getwd # Loads help files for "getwd"
??getwd # Searches for the word "getwd" in all the help files
setwd(.) # Sets the working directory to .
getwd() # Shows the current working directory
dir() # List the files in the working directory
ls() # List the variables in your R workspace
# "#" Indicates a comment. The text . is ignored
install.packages(".") # Gets user-created packages from online repositories
library(.) # Loads packages that you've installed
Commands can be run from either the prompt in the “Console” window by typing them in directly at the “>”“, or from a script window by putting the cursor on a line and clicking”Run" (the Run icon has a green arrow pointing to the word Run). I prefer to use scripts for most commands, because you can re-run them and save them for later.
Start R studio (“Start”>“RStudio”)
First, put the name of your class directory into a variable “mydir”.
mydir = "Z:" # This puts the text for your directory in the variable mydir
mydir # I always type the name of the variable after creating it to see if R did what I expect.
## [1] "Z:"
Now, go to your directory using setwd(), and list the contents:
setwd(mydir)
dir()
Click “File>New File > RScript” Put the commands from the two boxes above into the Rscript window, highlight the text and click >Run. You can also run the lines by clicking on the first line, then click “>Run” repeatedly to run each line.
Vectors and matrices contain only one class of data (either numbers or text, but not both).
A vector is created using “c”:
x = c(10,20,30)
x # Again, I like to type the variable after creating it to check its contents
## [1] 10 20 30
y = c("Joe","Harry","Larry")
y
## [1] "Joe" "Harry" "Larry"
What happens if you type x = c(1,“Joe”)?
Find out the class of each object using class():
class(x)
## [1] "numeric"
class(y)
## [1] "character"
You can refer to elements of a vector or matrix with [ ]. What do each of the following do?
x[2]
x/5
x[2]/5
y[3]
paste(y[1],y[2],sep=" ")
paste0(y[1],y[2])
x2 = as.character(x)
x3 = as.numeric(x2)
Boolean operators. What do each of the following do?
x>15
x[x>15]
which(x>15)
length(x<15)
length(x[x<15])
Data frames are perhaps the most widely used data format in R, especially for data analysis. They can contain a variety of data types, including text and numbers. Data frames also have names for each of their columns.
To experiment with this, we will load some precipitation data and plot it.
Here I have published some data in a google sheet to the web using File -> Published to the web, csv option. Doing this gives you a url that can be referenced in R to read in the data.
#fname = "lakeside_precip_hourly.txt" # Name of the file to be loaded.
#loadname = paste(indir,fname,sep="") # Creates a string with both directory and file name.
#lakeside.precip.hourly = read.table(loadname,skip=7) # Loads the data.
lakeside.url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vSe7Pagem3LgS0-vO98FWHZTSDyjAWvzmm60752kEmwI5UvqPD-AZlSEj-mY0ZAlb-fWd63-CDook9M/pub?gid=0&single=true&output=csv"
lakeside.precip.hourly = read.csv(url(lakeside.url))
See how the functions read.csv() and url() work. Type:
?read.csv
?read.url
Now, look at the variable x that you just loaded:
lakeside.precip.hourly
But this prints out all of x–too much! To just see the first 6 lines of x:
head(lakeside.precip.hourly)
## Date.time Precip.inches
## 1 1967-04-30 19:00 0
## 2 1967-04-30 20:00 0
## 3 1967-04-30 21:00 0
## 4 1967-04-30 22:00 0
## 5 1967-04-30 23:00 0
## 6 1967-05-01 00:00 0
What do each of the following commands do?:
names(lakeside.precip.hourly)
lakeside.precip.hourly[,1]
lakeside.precip.hourly[,2]
lakeside.precip.hourly[1:10,]
lakewide.precip.hourly[1:10,"Precip.inches"]
Date.time.POSIX = strptime(lakeside.precip.hourly$Date.time,format="%Y-%m-%d %H:%M")
head(Date.time.POSIX)
## [1] "1967-04-30 19:00:00 PDT" "1967-04-30 20:00:00 PDT"
## [3] "1967-04-30 21:00:00 PDT" "1967-04-30 22:00:00 PDT"
## [5] "1967-04-30 23:00:00 PDT" "1967-05-01 00:00:00 PDT"
Notice that date.time has the time zone associated with it. The computer guesses the time zone, but you can also specify the time zone with tz (if needed, see ?strptime).
Check out the plot command in help using ?plot
plot(Date.time.POSIX,lakeside.precip.hourly$Precip.inches,type="l") # "l" means plot it as a "line"
Now change the axis labels:
plot(Date.time.POSIX,lakeside.precip.hourly$Precip.inches,type="l",xlab="",ylab="Hourly precipitation at Lakeside, inches")
dates = format(Date.time.POSIX,"%Y-%m-%d") # What does this do?
lakeside.precip.daily = aggregate(lakeside.precip.hourly$Precip.inches,by=list(dates),FUN=sum)
names(lakeside.precip.daily) # Are these descriptive names?
## [1] "Group.1" "x"
# No, so replace with better names:
names(lakeside.precip.daily) = c("Date.text","Precip.inches")
# The date is now of class "text", which R cannot read as a Date. So, use as.Date to convert to a date object:
lakeside.precip.daily$Date = as.Date(lakeside.precip.daily$Date.text,"%Y-%m-%d")
See the plot command in III.A. Your output should look like this:
daily.sorted = sort(lakeside.precip.daily$Precip.inches,decreasing=TRUE)
daily.sorted[1:20]
## [1] 3.822 3.316 3.239 3.097 3.090 2.813 2.650 2.490 2.489 2.438 2.427
## [12] 2.423 2.369 2.356 2.285 2.280 2.242 2.230 2.181 2.179
The package “hydroTSM” allows you to explore and summarize hydrological time series data.
install.packages("hydroTSM")
Now load the library:
library(hydroTSM)
You first need to convert your time series into a “zoo” class, which is a time series object.
lakeside.daily.ts = zoo(lakeside.precip.daily$Precip.inches,order.by=lakeside.precip.daily$Date)
head(lakeside.daily.ts)
## 1967-04-30 1967-05-01 1967-05-02 1967-05-03 1967-05-04 1967-05-05
## 0 0 0 0 0 0
plot(lakeside.daily.ts,xlab="",ylab="Daily rainfall, inches")
See how many days have good data using the “dwi” function (use ?dwi to see what it does)
dwi(lakeside.daily.ts,out.unit="years")
## 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981
## 246 366 365 365 365 366 365 365 365 366 365 365 365 366 365
## 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996
## 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366
## 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
## 365 365 365 366 365 365 365 366 365 365 365 366 365 1
Are there any years you should leave out of the analysis?
Try the tool “hydroplot”:
dev.new() # Generates a new plot window.
hydroplot(lakeside.daily.ts,var.type="Precipitation",var.unit="inches",ylab="Rainfall")
m <- daily2monthly(lakeside.daily.ts, FUN=sum, na.rm=TRUE)
plot(m,xlab="",ylab="Monthly precipitation, inches")
Unfortunately, hydroTSM doesn’t have a way to summarize on water year. For that, we have to generate a list of dates we want to define the water year, label the time series with those water years, and sum the precipitation values for those labels (yikes!).
Note: I found the solution below at http://stackoverflow.com/questions/22073881/hydrological-year-time-series after googling “R hydroTSM october monthly2annual”)
breaks <- seq(as.Date("1967-10-01"), length=44, by="year")
# Length is the number of years you want the breaks for.
# labels.wy = seq(1968,length=43) # Why do the labels start in 1968?
years.breaks = as.numeric(format(breaks,"%Y"))
labels.wy = years.breaks[2:length(breaks)] # Take all but the first year in the breaks. Why?
lakeside.precip.daily$hydroYear <- cut(lakeside.precip.daily$Date, breaks,labels=labels.wy)
Check to make sure the hydroYears were assigned correctly by looking at the first 600 lines:
lakeside.precip.daily[1:600,]
Now calculate the sum of precipitation by hydroYear using “aggregate”:
x.annual.wy = aggregate(lakeside.precip.daily$Precip.inches,by=list(lakeside.precip.daily$hydroYear),FUN=sum)
names(x.annual.wy) = c("Year","Rainfall.inches")
# x.daily$Group.1 is a "factor" class, which is annoying and doesn't plot right.
# Generate a vector of numerical values of year:
years = as.numeric(as.character(x.annual.wy$Year))
# why do we need as.character? What happens if you don't use "as.character"?
plot(years,x.annual.wy$Rainfall.inches,type="l",ylab="Annual precip, inches",xlab="Water year")
Grids with storm depts for different times and intervals are available from the National Weather service using the package “rainfreq”.
First, look at what the function does and how to use it:
?extract_freq
Now use it, and assign the result to a raster, names “sw_24h_10yr”, and plot it. Note: The units are thousandths of an inch (e.g. 2000 = 2 inches)
sw_24h_10yr = extract_freq(region_name="sw",storm_RP=10,storm_duration="24h")
plot(sw_24h_10yr)
print(sw_24h_10yr)
## class : RasterLayer
## dimensions : 1397, 2612, 3648964 (nrow, ncol, ncell)
## resolution : 0.0083333, 0.0083333 (x, y)
## extent : -124.7584, -102.9918, 31.32503, 42.96665 (xmin, xmax, ymin, ymax)
## coord. ref. : NA
## data source : in memory
## names : layer
## values : 825, 14526 (min, max)
When exiting R, a window will appear asking if you want to save the R Workspace. ** Uncheck that box, or your variables will come up the next time you open R, which can be a problem.
If you accidentally saved your workspace and want to clear the variables, first look at the variables in your workspace:
ls()
# If you want to then remove them:
rm(list=ls())
If you want to keep just some variables, you can develop a vector of variables you want to keep, and just remove the others:
variables.in.my.workspace = ls()
variables.I.want.to.keep = c("sw_24h_10yr", "x.annual.wy","x2")
variables.to.clear = variables.in.my.workspace[!(variables.in.my.workspace %in% variables.I.want.to.keep)]
rm(list=variables.to.clear)
# the ! means "not"
indir.shp = "Z:" # The directory that has your watershed boundary in it.
fname.shp = "mywatershed_in_NLCD_projection" # The name of your watesrshed boundary
The command readOGR (rgdal library) reads in a shapefile and its projection information.
library(rgdal)
wshed.shp.ogr = readOGR(dsn=indir.shp,fname.shp) # dsn is the directory containing the shpfile.
## OGR data source with driver: ESRI Shapefile
## Source: "G:\large_datasets\USA\california\sdcounty\hydrography_dataset", layer: "los_coches_wshed_NLCDprojection"
## with 1 features
## It has 7 fields
## Integer64 fields read as strings: OBJECTID_1 OBJECTID Id gridcode
nws.crs = "+proj=longlat +datum=WGS84"
wshed.proj <- spTransform(wshed.shp.ogr, nws.crs)
plot(sw_24h_10yr)
plot(wshed.proj,add=TRUE)
We can’t see our tiny watershed, so set the extent of the map to the extent of wshed.proj:
wshed.extent = extent(wshed.proj)
plot(sw_24h_10yr, ext=wshed.extent)
plot(wshed.proj,add=TRUE)
mywshed.24hr10yr = mean(as.numeric(unlist(extract(sw_24h_10yr,wshed.proj,method="simple"))))/1000
What object class is returned by extract(sw_24h_10yr,wshed.proj,method=“simple”)? What does unlist() do? What class is returned by unlist, and why do we have to do as.numeric? How would you find the min and max in addition to the mean?