GEOG 576 EX3 Shapefile projection and daymet grids

A. DAYMET

DAYMET is one of several precipitation and climate grids for North America, including US, Canada, and Mexico. See the metadata and access at https://daymet.ornl.gov/ Data are at 1km resolution, daily, from 1980-present (with a few months lag).

DAYMET data come as NETCDF grids, which are handled by R’s raster package.

B. In this Exercise, you’ll:

read in a watershed boundary shapefile (shpfile),
project it into the projection of the precipitation data
load precipitation grids and calculate the mean precipitation on the watershed boundary
plot time series of the extracted precipitation data

C. Key functions for this exercise include

1. “for loops”

Here is a simple example “for loop”. What does it do?

x = c("Harry","Moe","Larry")
for (i in 1:length(x)){
  print(paste(x[i],"says Hi."))
}

## [1] "Harry says Hi."
## [1] "Moe says Hi."
## [1] "Larry says Hi."

Skilled R programmers often use “apply” for loops. I haven’t mastered apply, so I often resort to “for loops”.

2. grep

The “grep” command finds the elements in a vector that contain a given string of characters:

#  What do these commands do?:
grep("Harry",x)
grep("Harry",x,invert=TRUE)

x[grep("Harry",x)]
x[grep("Harry",x,invert=TRUE)]

3. Find data with overlapping years with intersect and %in%

You have two data.frames with a list of years–how can you find where they intersect? Below I present one solution. There are others!

# Generate 2 data.frames for two "stations" s1 and s2 with fake, overlapping time series
s1.years = seq(1990,2000)
s2.years = seq(1995,2008)
s1.data = data.frame(Year=s1.years,Rainfall=runif(length(s1.years)))  # What does runif do?
s2.data = data.frame(Year=s2.years,Rainfall=runif(length(s2.years)))

# Find what years intersect both datasets
intersect.years = intersect(s1.data$Year,s2.data$Year)
print(intersect.years)

#  Now get subsets of the data that correspond to those years
s1.sub = s1.data[s1.data$Year %in% intersect.years,]
s2.sub = s2.data[s2.data$Year %in% intersect.years,]

# Print the results to make sure it worked.
print(s1.sub)
print(s2.sub)

4 if else statements

These are useful if you want to control the commands based on another variable

youre.happy.and.you.know.it = "YES"
if (youre.happy.and.you.know.it == "YES") {
  print("Clap your hands")
  } else {
  print("Take a break")
}

## [1] "Clap your hands"

D. Exercise tasks.

0. Load the necessary libraries.

Note: If you get the error:

Error in library(raster) : there is no package called ‘raster’

Then you have to install.packages(“raster”) first.

library(raster)
library(rgdal)

1. Set the directory of one sample daymet precipitation raster

indir.daymet = "Y:/rainfall/DAYMET/tars_2015/11012_2015/"
  # 11012 is the tile for san diego, and 2015 is the year.

2. Load and plot the raster dataset, just the first band (each prcp.nc file has 365 layers)

r = raster(paste0(indir.daymet,"prcp.nc"),band=1)  # loads layer 1
r  # Look at what's in r

## class       : RasterLayer 
## band        : 1  (of  365  bands)
## dimensions  : 247, 183, 45201  (nrow, ncol, ncell)
## resolution  : 1000, 1000  (x, y)
## extent      : -1601750, -1418750, -978500, -731500  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=lcc +lon_0=-100 +lat_0=42.5 +x_0=0 +y_0=0 +lat_1=25 +lat_2=60 +ellps=WGS84 
## data source : G:\large_datasets\USA\california\sdcounty\climate\daymet\V3\tars_2015\11012_2015\prcp.nc 
## names       : daily.total.precipitation 
## z-value     : 2015-01-01 
## zvar        : prcp

plot(r)

3. Retrieve the projection (“coordinate reference system”, or “crs”) for the daymet layer.

daymet.crs = r@crs  # the @crs accesses the "slot" of the crs
daymet.crs  # look at daymet.crs

## CRS arguments:
##  +proj=lcc +lon_0=-100 +lat_0=42.5 +x_0=0 +y_0=0 +lat_1=25
## +lat_2=60 +ellps=WGS84

4. Set the directory and filename of the watershed boundary.

Note that the indir.shp does not have a “/” at the end, like in other definitions of indir. This is a random quirk of rgdal.

indir.shp = << your directory to your watershed boundary >>
  #  Should look like "Z:/g576-10/ex1"
  #  !!! note the "/" (not "\"").
  #  !!! note there is no "/" at the end of the indir.shp string (unlike our other uses of indir strings)
fname.shp = << your watersehd boundary name >>

5. Load the watershed boundary shp using readOGR

readOGR is in the rgdal library.
Unlike the maptools library and its readShapePoly function, readOGR also reads in projection information. This is handy, because projection info in R is difficult to format.

wshed.shp.ogr = readOGR(dsn=indir.shp,fname.shp) # dsn is the directory containing the shpfile, which you put in the variable "indir.shp" above.

## OGR data source with driver: ESRI Shapefile 
## Source: "G:/large_datasets/USA/california/sdcounty/hydrography_dataset", layer: "los_coches_wshed_NLCDprojection"
## with 1 features
## It has 7 fields
## Integer64 fields read as strings:  OBJECTID_1 OBJECTID Id gridcode

6. Project watershed boundary into the projection of the DAYMET data.

wshed.proj <- spTransform(wshed.shp.ogr, daymet.crs)

7. Plot the watershed boundary on top of the daymet file to make sure they line up.

plot(wshed.proj,add=TRUE)

8. Calculate mean precip for a single layer

p.mean = raster::extract(r,wshed.proj,fun=mean)
  # the "raster::" means that R should use the extract function in the raster package.
  #  Normally, you don't need to do this, but the zoo library also has a function "extract" 
  # and R doesn't know which one to use.
p.mean

##           [,1]
## [1,] 0.2903226

9. Now, load all 365 days at once.

b = brick(paste0(indir.daymet,"prcp.nc"))  # "stack" also works
b

## class       : RasterBrick 
## dimensions  : 247, 183, 45201, 365  (nrow, ncol, ncell, nlayers)
## resolution  : 1000, 1000  (x, y)
## extent      : -1601750, -1418750, -978500, -731500  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=lcc +lon_0=-100 +lat_0=42.5 +x_0=0 +y_0=0 +lat_1=25 +lat_2=60 +ellps=WGS84 
## data source : G:\large_datasets\USA\california\sdcounty\climate\daymet\V3\tars_2015\11012_2015\prcp.nc 
## names       : X2015.01.01, X2015.01.02, X2015.01.03, X2015.01.04, X2015.01.05, X2015.01.06, X2015.01.07, X2015.01.08, X2015.01.09, X2015.01.10, X2015.01.11, X2015.01.12, X2015.01.13, X2015.01.14, X2015.01.15, ... 
## Date        : 2015-01-01, 2015-12-31 (min, max)
## varname     : prcp

10. Calculate the mean precip on the watershed boundary with “extract”

prcp.365days = raster::extract(b,wshed.proj,fun=mean)
prcp.365days[1,1:20]  # Print the first 20 columns of prcp.365days

## X2015.01.01 X2015.01.02 X2015.01.03 X2015.01.04 X2015.01.05 X2015.01.06 
##   0.2903226   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000 
## X2015.01.07 X2015.01.08 X2015.01.09 X2015.01.10 X2015.01.11 X2015.01.12 
##   0.0000000   0.0000000   0.0000000   0.0000000  10.9032258  16.7096774 
## X2015.01.13 X2015.01.14 X2015.01.15 X2015.01.16 X2015.01.17 X2015.01.18 
##   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000   0.0000000 
## X2015.01.19 X2015.01.20 
##   0.0000000   0.0000000

11. Get the time series of dates from the names of prcp.365days

date.strings = colnames(prcp.365days)
dates.string.sub = substr(date.strings,2,nchar(date.strings))  # Cut off the "X" to get just the date
dates.string.sub[1:20]  # Print first 20 values of dates.string.sub

##  [1] "2015.01.01" "2015.01.02" "2015.01.03" "2015.01.04" "2015.01.05"
##  [6] "2015.01.06" "2015.01.07" "2015.01.08" "2015.01.09" "2015.01.10"
## [11] "2015.01.11" "2015.01.12" "2015.01.13" "2015.01.14" "2015.01.15"
## [16] "2015.01.16" "2015.01.17" "2015.01.18" "2015.01.19" "2015.01.20"

dates = as.Date(dates.string.sub,format="%Y.%m.%d")  # as.Date turns a string into a date
dates[1:20]

##  [1] "2015-01-01" "2015-01-02" "2015-01-03" "2015-01-04" "2015-01-05"
##  [6] "2015-01-06" "2015-01-07" "2015-01-08" "2015-01-09" "2015-01-10"
## [11] "2015-01-11" "2015-01-12" "2015-01-13" "2015-01-14" "2015-01-15"
## [16] "2015-01-16" "2015-01-17" "2015-01-18" "2015-01-19" "2015-01-20"

plot(dates,prcp.365days,type="l")

12. Add labels to the x and y axes (I’m not showing the code!).

13. Turn the data into a time series using “zoo”

The vectors of data are ok for plotting, but for summarizing to monthly or annual, a time series object (zoo) is better.

library(zoo)
prcp.zoo = zoo(x=as.numeric(prcp.365days), order.by=dates) 
    # as.numeric changes prcp.365days from a matrix to a vector
prcp.zoo[1:20]

## 2015-01-01 2015-01-02 2015-01-03 2015-01-04 2015-01-05 2015-01-06 
##  0.2903226  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000 
## 2015-01-07 2015-01-08 2015-01-09 2015-01-10 2015-01-11 2015-01-12 
##  0.0000000  0.0000000  0.0000000  0.0000000 10.9032258 16.7096774 
## 2015-01-13 2015-01-14 2015-01-15 2015-01-16 2015-01-17 2015-01-18 
##  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000 
## 2015-01-19 2015-01-20 
##  0.0000000  0.0000000

plot(prcp.zoo,ylab="Daily total precipitation, mm")

14. Calculate monthly precip from the zoo object.

Note: you could also do this using the hydroTSM tools from Exercise 2. Here I’m showing you another way to aggregate to monthly.

library(xts)
prcp.monthly = apply.monthly(prcp.zoo,sum)
plot(prcp.monthly,ylab="Monthly precipitation in 2015, mm")
legend("topleft",legend="Monthly rainfall",lty=c(1),pch=NA)  # pch=NA means you're not plotting a point on the line

Is there something misleading about this plot (e.g. where the months plot)? How could you fix this?)

15. Do a loop for all years

First get the list of all directories that have the daymet data:

indir.daymet.base = "Y:/rainfall/DAYMET/"
daymet.directories = dir(indir.daymet.base,pattern = "11012",recursive=TRUE,full.names=TRUE,include.dirs=TRUE)
  # the 11012 is the tile for San Diego.

What does “recursive” do in dir? What does “full.names” do? To find out, try the command with each set to FALSE and see what “directories” looks like.

“directories” includes all directories and files, but we want just the directories that have the prcp.nc grids. Use the grep (“find”) function, with “invert=TRUE” to exclude filenames, identified by the characters “.tar”:

dir.sub = daymet.directories[grep("\\.tar",daymet.directories,invert=TRUE)]
  # grep searches for the string (here, "\\tar.") in the vector "directories".  
  # invert=TRUE means it returns only those elements in directories that do *not* contain the strong "\\.tar"

Now, loop through the dir.sub to extract daymet data for each year, and put into a data frame:

for (i in 1:length(dir.sub)){
  dir.to.load = dir.sub[i]
  b = brick(paste0(dir.to.load,"/prcp.nc"))
  prcp.365days = raster::extract(b,wshed.proj,fun=mean)
  date.strings = colnames(prcp.365days)
  dates.string.sub = substr(date.strings,2,nchar(date.strings))  # Cut off the "X" to get just the date
  dates = as.Date(dates.string.sub,format="%Y.%m.%d")
  
  #  Make a dataframe object that has the date and precipitation values
  #  This gets rid of the awkward matrix names in prcp.365days.
  outdata = data.frame(Date=dates,Precip.mm=as.vector(prcp.365days))
  # The if and else statements put all the data into "prcp.all".
    #  The "if" looks to see if there is already a prcp.all data.frame.
    #  If there isn't, it creates one and puts the data for this year into it.
    #  If there isn't, it appends the data for this year onto the previous years.
  if (exists("prcp.all")){
    prcp.all = rbind(prcp.all,outdata)
  } else {
    prcp.all = outdata
  }
  
  #  During the loop run, we may be curious about how much longer it will take.  You add something to print out where the loop is:
  print(paste(i," of ",length(dir.sub)))
  flush.console()  # This makes the print statement happen while the loop is running, instead of a the end.
}

## [1] "1  of  36"
## [1] "2  of  36"
## [1] "3  of  36"
## [1] "4  of  36"
## [1] "5  of  36"
## [1] "6  of  36"
## [1] "7  of  36"
## [1] "8  of  36"
## [1] "9  of  36"
## [1] "10  of  36"
## [1] "11  of  36"
## [1] "12  of  36"
## [1] "13  of  36"
## [1] "14  of  36"
## [1] "15  of  36"
## [1] "16  of  36"
## [1] "17  of  36"
## [1] "18  of  36"
## [1] "19  of  36"
## [1] "20  of  36"
## [1] "21  of  36"
## [1] "22  of  36"
## [1] "23  of  36"
## [1] "24  of  36"
## [1] "25  of  36"
## [1] "26  of  36"
## [1] "27  of  36"
## [1] "28  of  36"
## [1] "29  of  36"
## [1] "30  of  36"
## [1] "31  of  36"
## [1] "32  of  36"
## [1] "33  of  36"
## [1] "34  of  36"
## [1] "35  of  36"
## [1] "36  of  36"

Now, plot the time series of precipitation (I’m not providing code for it this time!)

Finally, convert prcp.all to a zoo time series object and calculate monthly precipitation, and plot it (I’m not showing the code!)

16. Compare monthly precipitation with monthly total potential evapotranspiration (PET)

The California Irrigation Management Information System (CIMIS) has a shpfile with a map of ETo zones for California. The shapefile is in the class data directory under shape_files.

Find out which zone your watershed is in using the ETo map in ArcGIS. If it crosses more than one zone, just use the dominant zone for this exercise.

Get the monthly mean ETo for your zone by loading the shpefile in R:

indir.ETo = "Y:/spatial_data/potential_ET/"
  # Note that this time indir includes the / at the end.

library(maptools)
fname.ETo = "original_zones.shp"
ETo.shp = readShapePoly(paste0(indir.ETo,fname.ETo))

See what data is in the shapefile:

slotNames(ETo.shp)
ETo.shp@data  # Displays the data in ETo.shp.
names(ETo.shp@data)  # Gives the names of the columns in the data in ETo.shp
ETo.shp@data[114,]  # Returns just line 114, which corresponds to South Coast Marine to Desert Transition.

Find the record number that corresponds with the region description you found in the CIMIS ETo shapefile. I searched visually for “SOUTH COAST TO DESERT TRANSITION” IN ETo.shp@data

ETo.monthly = ETo.shp@data[114,2:13]  #  In meters per month
  #  The 2:13 extracts just the ETo data (m/monthY without the description.
ETo.monthly.mm = ETo.monthly*1000
ETo.monthly.mm.df = data.frame(Months=seq(1:12),ETo.mm=as.numeric(t(ETo.monthly.mm[1,])))

Now, find the long-term mean monthly precipitation (daymet) in your watershed, and plot it on top of the ETo data.

The hydroTSM package has a convenient function for calculating the long-term mean of a daily or other time series:

monthlyfunction.

Look up this function in R using ?, apply it to your time series, and plot the results including a legend.

library(hydroTSM)
precip.month.avg = monthlyfunction(prcp.monthly,FUN=mean)
#  Turn it into a data.frame so you can plot ETo on top.
precip.month.avg.df = data.frame(precip.month.avg)

plot(ETo.monthly.mm.df$Months,ETo.monthly.mm.df$ETo.mm,type="l",xlab="Month",ylab="Monthly ETo or precipitation, mm",ylim=c(0,250))
lines(seq(1:12),precip.month.avg,col="grey")
legend("topleft",c("ETo","Precipitation"),col=c("black","grey"),lty=c(1,1))

In what months do you expect runoff from your watershed if the monthly precipitation-ETo balance was the only factor governing runoff?

E. Geoknife. Graduate students and intrepid undergraduates.

The new Geoknife package lets you extract data from a large number of datasets using a polygon boundary. First, install geoknife from an online repository:

install.packages("geoknife", 
                 repos = c("http://owi.usgs.gov/R"),
                 dependencies = TRUE)

Then load the library:

library(geoknife)

Project your watershed into geographic projection, which is used by many geoknife datasets, and convert to a “simple geometry” object.

wshed.proj <- spTransform(wshed.shp.ogr, "+proj=longlat +datum=WGS84")
wshed.simple = simplegeom(wshed.proj)

The “fabric” is the raster dataset. The “stencil” is the polygon. The “knife” is the math performed to give you a final dataset (e.g. mean, max, etc, where default is mean).

Define the fabric, and find out what the time range is and what variables it has:

fabric.daymet = webdata("daymet")
query(fabric.daymet,'times')
query(fabric.prism, 'variables')

Now, define the time range you want to extract the data for, and extract it. It can take a while to extract daily data, so here I provide an example for one year:

times(fabric.daymet) <- c('2004-01-01','2004-12-31')
wshed.daymet.precip = result(geoknife(wshed.simple, fabric.daymet, wait = TRUE))

To see what other datasets are available in geoknife:

webdatasets = query('webdata')  #  Print it to look at what's in it
webdatasets.title = title(webdatasets)  # Print to look at what's in it

If you wanted to extract data from, for example, the California basin characterization model:

webdatasets.california.index = grep("California",webdatasets.title)
print(webdatasets.title[webdatasets.california.index])

## [1] "Future California Basin Characterization Model Downscaled Climate and Hydrology"    
## [2] "Historical California Basin Characterization Model Downscaled Climate and Hydrology"

fabric.CABCM = webdata(webdatasets[webdatasets.california.index[2]]) # Sets the fabric as the 2nd entry in the list...

Then you can set time ranges and extract data from this dataset using the methods in Part D.