GEOG 576. Summarize land use in a watershed boundary.

This Exercise will introduce you to reading vector and raster data into R, and tabulating raster areas using a vector (watershed boundary). For more information on spatial analysis in R, see: https://www.paulamoraga.com/book-spatial/spatial-data-in-r.html

Focus on sf and stars (the new raster package): https://r-spatial.org/book/07-Introsf.html

0. Open R, and click on File > New File > R Script

For each line of code below, enter it into the R script.
Then, put the cursor on the line, and press “>Run” button in the upper right corner of script window.

1. Install the libraries (aka packages) you need.

You only have to “install.packages” once. After you have run this, you can comment it out with “#”

Note: Until 2023, R used the rgdal and regeos packages for spatial data. That has been changed to the “sf” package.

install.packages("raster")
install.packages("sf")
install.packages("htmlTable")

You need to call “library” every time you start a new R session.

library(stars)

## Warning: package 'stars' was built under R version 4.2.3

## Warning: package 'sf' was built under R version 4.2.3

library(sf)
library(htmlTable)

## Warning: package 'htmlTable' was built under R version 4.2.3

To see what each of these libraries do, do a google search on, e.g. “R cran raster package”

2. Loading data.

Set the input directory and filename for the watershed boundary shapefile. * This is the watershed boundary of the shapefile you delineated in Exercise 2. Note that there is no .shp extension on the shapefile name (fname.shp below)

indir.shp = "<your directory that has the shp files>"
  # This is the directory where you saved your watershed boundary from Exercise 2.

  # IMPORTANT:  this string cannot have a "/" at the end, so:
  #  indir.shp = "C:/myfiles/GEOG576" 
  #  *not*
  #  indir.shp = "C:/myfiles/GEOG576/"

   # Note that the directory has to have all of the files associated with your los penasquitos shapefile, including shp, shx, etc...
fname.shp = "< name you gave to the los penasquitos shapefile >"
  # Note...do not include the "shp" extension.  For example, if your los pen shapefile name is "trentwshed.shp", then fname.shp will be "trentwshed".

Now read in the watershed boundary shapefile. st_read is a function in the library sf. Use ? to find out more about the function:

?st_read

Now use st_read to read in your watershed boundary

wshed.shp = st_read(dsn=indir.shp,layer=fname.shp)  # The "=" inside the function call are optional.

## Reading layer `lospen_wshed_10m_utm11' from data source 
##   `K:\My Drive\Gdrive\mydocuments\SDSU_classes\past\2020_2021_701_576\2020F_576_701_498\576\hw\data\spatial\DEM\biggs_output' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 3 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 488586.1 ymin: 3641729 xmax: 504566.1 ymax: 3651339
## Projected CRS: WGS 84 / UTM zone 11N

Set the input directory and filename, and read in the land cover data.

Check the coordinate reference system (aka projection) of the watershed shapefile: See what the crs function does:

?st_crs

Now check the crs of your watershed boundary:

st_crs(wshed.shp)

## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 11N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 11N",
##     BASEGEOGCRS["WGS 84",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 11N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-117,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     ID["EPSG",32611]]

Load the tif file of land cover that you used from Exercise 2.

indir.lc = <the directory where you saved the NLCD tif file from Exercise 2>
fname.lc = "<the file name of the NLCD.tif file>"  e.g. "NLCD_san_diego_county_utm11.tif"
lc = read_stars(paste0(indir.lc,fname.lc))

Note: If this doesn’t load for some reason, try re-downloading the NLCD data from here: https://drive.google.com/file/d/1Cp0UxPXv7S6ikMbtStxmKXasRRzs70Ol/view?usp=sharing

Check the crs. Are they the same?

st_crs(lc)

## Coordinate Reference System:
##   User input: WGS 84 / UTM zone 11N 
##   wkt:
## PROJCRS["WGS 84 / UTM zone 11N",
##     BASEGEOGCRS["WGS 84",
##         ENSEMBLE["World Geodetic System 1984 ensemble",
##             MEMBER["World Geodetic System 1984 (Transit)"],
##             MEMBER["World Geodetic System 1984 (G730)"],
##             MEMBER["World Geodetic System 1984 (G873)"],
##             MEMBER["World Geodetic System 1984 (G1150)"],
##             MEMBER["World Geodetic System 1984 (G1674)"],
##             MEMBER["World Geodetic System 1984 (G1762)"],
##             MEMBER["World Geodetic System 1984 (G2139)"],
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ENSEMBLEACCURACY[2.0]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["UTM zone 11N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-117,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["Between 120°W and 114°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - Alberta; British Columbia (BC); Northwest Territories (NWT); Nunavut. Mexico. United States (USA)."],
##         BBOX[0,-120,84,-114]],
##     ID["EPSG",32611]]

3. Plot the two maps.

plot(lc)

## downsample set to 4

plot(wshed.shp,add=TRUE)

## Warning in plot.sf(wshed.shp, add = TRUE): ignoring all but the first attribute

Can you see the watershed boundary on your map?

4. Zoom into the extent of the watershed boundary.

and plot. What happens if you use plot(wshed.shp, add=TRUE) instead?

image(lc,extent=wshed.shp)
#plot(lc,extent=wshed.shp)
plot(st_geometry(wshed.shp[1]),add=TRUE, )

5. Extract the cell values that fall in the watershed boundary.

Extract, then look at what you extracted.

### Things that didn't work:

#lc.extract = extract(lc,wshed.shp,method="simple")  # Gets cell values for all cells in shp
#print(lc.extract[[1]][1:100])

#lc.extract.2 = aggregate(lc,wshed.shp[1],length)  # Super slow
#library(exactextractr)
#lc.extract.3 = st_extract(lc,wshed.shp)
#lc.nogeom = st_set_geometry(lc.extract.3,NULL)

# try converting stars to sf object, then tabulate

# Worked:
  # Subset the stars object (lc):
   # from "Cropping" in https://r-spatial.org/book/07-Introsf.html
lc.shp = lc[wshed.shp]
plot(lc.shp)

#table(lc)

Now, summarize the cropped stars object using “table”:

lc.table = table(lc.shp)  #  Creates a table of counts from the cell list
print(lc.table)

## NLCD_san_diego_county_utm11.tif
##    11    21    22    23    24    31    42    43    52    71    81    82    90 
##     5 18847 20625 27535  3750    70    39   111 59552  3676   205     1   428 
##    95 
##    21

6. Assign land cover class names to the table.

These codes correspond to the numbers in lc.table. The lookup table is at https://www.mrlc.gov/data/legends/national-land-cover-database-2016-nlcd2016-legend. I’ve typed them in here by hand.

lc.key = data.frame(Code=c(11,21,22,23,24,31,42,43,52,71,81,82,90,95), Name=c("Open Water","Developed, Open Space","Developed, Low Intensity","Developed, Medium Intensity","Developed, High Intensity","Barren","Evergreen Forest","Mixed Forest","Shrub/Scrub","Grassland","Pasture","Crops","Woody Wetlands","Emergent wetlands"))

#  Convert the table to a data frame.  This makes it easier to work with.
lc.df = data.frame(lc.table)
names(lc.df) = c("Code","Cells")
# Merge the lc dataframe with the lc.key:
lc.df.w.names = merge(lc.df,lc.key,by="Code")  # Merges two dfs on a single common field; here, "Code".
print(lc.df.w.names)

##    Code Cells                        Name
## 1    11     5                  Open Water
## 2    21 18847       Developed, Open Space
## 3    22 20625    Developed, Low Intensity
## 4    23 27535 Developed, Medium Intensity
## 5    24  3750   Developed, High Intensity
## 6    31    70                      Barren
## 7    42    39            Evergreen Forest
## 8    43   111                Mixed Forest
## 9    52 59552                 Shrub/Scrub
## 10   71  3676                   Grassland
## 11   81   205                     Pasture
## 12   82     1                       Crops
## 13   90   428              Woody Wetlands
## 14   95    21           Emergent wetlands

8. Put the results into a table.

htmlTable(lc.df.w.names)

	Code	Cells	Name
1	11	5	Open Water
2	21	18847	Developed, Open Space
3	22	20625	Developed, Low Intensity
4	23	27535	Developed, Medium Intensity
5	24	3750	Developed, High Intensity
6	31	70	Barren
7	42	39	Evergreen Forest
8	43	111	Mixed Forest
9	52	59552	Shrub/Scrub
10	71	3676	Grassland
11	81	205	Pasture
12	82	1	Crops
13	90	428	Woody Wetlands
14	95	21	Emergent wetlands

9. Calculate areas and percentage cover.

This is more meaningful than cell counts.

lc.df.w.names$Area.km2 = (lc.df.w.names$Cells)*30*30/1E6  #  Area in km2
lc.df.w.names$Area.percent = 100*lc.df.w.names$Cells/sum(lc.df.w.names$Cells)
print(lc.df.w.names)

##    Code Cells                        Name Area.km2 Area.percent
## 1    11     5                  Open Water   0.0045 3.707411e-03
## 2    21 18847       Developed, Open Space  16.9623 1.397472e+01
## 3    22 20625    Developed, Low Intensity  18.5625 1.529307e+01
## 4    23 27535 Developed, Medium Intensity  24.7815 2.041671e+01
## 5    24  3750   Developed, High Intensity   3.3750 2.780558e+00
## 6    31    70                      Barren   0.0630 5.190376e-02
## 7    42    39            Evergreen Forest   0.0351 2.891781e-02
## 8    43   111                Mixed Forest   0.0999 8.230453e-02
## 9    52 59552                 Shrub/Scrub  53.5968 4.415675e+01
## 10   71  3676                   Grassland   3.3084 2.725689e+00
## 11   81   205                     Pasture   0.1845 1.520039e-01
## 12   82     1                       Crops   0.0009 7.414822e-04
## 13   90   428              Woody Wetlands   0.3852 3.173544e-01
## 14   95    21           Emergent wetlands   0.0189 1.557113e-02

10. Round the numbers off.

lc.df.w.names$Area.km2 = round((lc.df.w.names$Cells)*30*30/1E6,digits=2)  #  Area in km2
lc.df.w.names$Area.percent = round(100*lc.df.w.names$Cells/sum(lc.df.w.names$Cells),digits=1)
print(lc.df.w.names)

##    Code Cells                        Name Area.km2 Area.percent
## 1    11     5                  Open Water     0.00          0.0
## 2    21 18847       Developed, Open Space    16.96         14.0
## 3    22 20625    Developed, Low Intensity    18.56         15.3
## 4    23 27535 Developed, Medium Intensity    24.78         20.4
## 5    24  3750   Developed, High Intensity     3.38          2.8
## 6    31    70                      Barren     0.06          0.1
## 7    42    39            Evergreen Forest     0.04          0.0
## 8    43   111                Mixed Forest     0.10          0.1
## 9    52 59552                 Shrub/Scrub    53.60         44.2
## 10   71  3676                   Grassland     3.31          2.7
## 11   81   205                     Pasture     0.18          0.2
## 12   82     1                       Crops     0.00          0.0
## 13   90   428              Woody Wetlands     0.39          0.3
## 14   95    21           Emergent wetlands     0.02          0.0

11. Make sure the area total matches the USGS watershed area.

sum(lc.df.w.names$Area.km2)

## [1] 121.38

How close is this to the USGS-reported drainage area?

12. Sort on Area.

This will give the land cover classes with the greatest cover fraction on top. You can do this with the “order” function.

lc.df.sort = lc.df.w.names[order(lc.df.w.names$Area.km2,decreasing=TRUE),]

13. Create the output table–select just the columns you want.

lc.out = lc.df.sort[,c("Name","Area.km2","Area.percent")]
htmlTable(lc.out)

	Name	Area.km2	Area.percent
9	Shrub/Scrub	53.6	44.2
4	Developed, Medium Intensity	24.78	20.4
3	Developed, Low Intensity	18.56	15.3
2	Developed, Open Space	16.96	14
5	Developed, High Intensity	3.38	2.8
10	Grassland	3.31	2.7
13	Woody Wetlands	0.39	0.3
11	Pasture	0.18	0.2
8	Mixed Forest	0.1	0.1
6	Barren	0.06	0.1
7	Evergreen Forest	0.04	0
14	Emergent wetlands	0.02	0
1	Open Water	0	0
12	Crops	0	0

14. Adjust the number of digits to print.

You might notice that your data has differing numbers of significant digits (e.g. “0s” are left off in the printing). To fix this, you have to convert the numbers to strings (as.character) and specify the format.

lc.out$Area.km2 = as.character(format(lc.out$Area.km2,digits=3))
lc.out$Area.percent = as.character(format(lc.out$Area.percent,digits=1))
htmlTable(lc.out)

	Name	Area.km2	Area.percent
9	Shrub/Scrub	53.60	44.2
4	Developed, Medium Intensity	24.78	20.4
3	Developed, Low Intensity	18.56	15.3
2	Developed, Open Space	16.96	14.0
5	Developed, High Intensity	3.38	2.8
10	Grassland	3.31	2.7
13	Woody Wetlands	0.39	0.3
11	Pasture	0.18	0.2
8	Mixed Forest	0.10	0.1
6	Barren	0.06	0.1
7	Evergreen Forest	0.04	0.0
14	Emergent wetlands	0.02	0.0
1	Open Water	0.00	0.0
12	Crops	0.00	0.0

15. Add a caption to your table and suppress printing of the row number.

outtable = htmlTable(lc.out,
          caption="Table 1.  Land use in the Los Penasquitos watershed.",
          rnames=FALSE)
outtable

Name	Area.km2	Area.percent
Table 1. Land use in the Los Penasquitos watershed.
Shrub/Scrub	53.60	44.2
Developed, Medium Intensity	24.78	20.4
Developed, Low Intensity	18.56	15.3
Developed, Open Space	16.96	14.0
Developed, High Intensity	3.38	2.8
Grassland	3.31	2.7
Woody Wetlands	0.39	0.3
Pasture	0.18	0.2
Mixed Forest	0.10	0.1
Barren	0.06	0.1
Evergreen Forest	0.04	0.0
Emergent wetlands	0.02	0.0
Open Water	0.00	0.0
Crops	0.00	0.0

16. Save your table to a document.

outdir = "some output directory"
outfile = "EXC_Land_Use_Table1.hmtl"
setwd(outdir)
sink(outfile)
print(outtable,type="html",useViewer=FALSE)
sink()