This Exercise will introduce you to reading vector and raster data into R, and tabulating raster areas using a vector (watershed boundary). For more information on spatial analysis in R, see: https://www.paulamoraga.com/book-spatial/spatial-data-in-r.html
Focus on sf and stars (the new raster package): https://r-spatial.org/book/07-Introsf.html
For each line of code below, enter it into the R script.
Then, put the cursor on the line, and press “>Run” button in the
upper right corner of script window.
You only have to “install.packages” once. After you have run this, you can comment it out with “#”
Note: Until 2023, R used the rgdal and regeos packages for spatial data. That has been changed to the “sf” package.
install.packages("raster")
install.packages("sf")
install.packages("htmlTable")
You need to call “library” every time you start a new R session.
library(stars)
## Warning: package 'stars' was built under R version 4.2.3
## Warning: package 'sf' was built under R version 4.2.3
library(sf)
library(htmlTable)
## Warning: package 'htmlTable' was built under R version 4.2.3
To see what each of these libraries do, do a google search on, e.g. “R cran raster package”
Set the input directory and filename for the watershed boundary shapefile. * This is the watershed boundary of the shapefile you delineated in Exercise 2. Note that there is no .shp extension on the shapefile name (fname.shp below)
indir.shp = "<your directory that has the shp files>"
# This is the directory where you saved your watershed boundary from Exercise 2.
# IMPORTANT: this string cannot have a "/" at the end, so:
# indir.shp = "C:/myfiles/GEOG576"
# *not*
# indir.shp = "C:/myfiles/GEOG576/"
# Note that the directory has to have all of the files associated with your los penasquitos shapefile, including shp, shx, etc...
fname.shp = "< name you gave to the los penasquitos shapefile >"
# Note...do not include the "shp" extension. For example, if your los pen shapefile name is "trentwshed.shp", then fname.shp will be "trentwshed".
Now read in the watershed boundary shapefile. st_read is a function in the library sf. Use ? to find out more about the function:
?st_read
Now use st_read to read in your watershed boundary
wshed.shp = st_read(dsn=indir.shp,layer=fname.shp) # The "=" inside the function call are optional.
## Reading layer `lospen_wshed_10m_utm11' from data source
## `K:\My Drive\Gdrive\mydocuments\SDSU_classes\past\2020_2021_701_576\2020F_576_701_498\576\hw\data\spatial\DEM\biggs_output'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 3 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 488586.1 ymin: 3641729 xmax: 504566.1 ymax: 3651339
## Projected CRS: WGS 84 / UTM zone 11N
Set the input directory and filename, and read in the land cover data.
Check the coordinate reference system (aka projection) of the watershed shapefile: See what the crs function does:
?st_crs
Now check the crs of your watershed boundary:
st_crs(wshed.shp)
## Coordinate Reference System:
## User input: WGS 84 / UTM zone 11N
## wkt:
## PROJCRS["WGS 84 / UTM zone 11N",
## BASEGEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["UTM zone 11N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-117,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## ID["EPSG",32611]]
Load the tif file of land cover that you used from Exercise 2.
indir.lc = <the directory where you saved the NLCD tif file from Exercise 2>
fname.lc = "<the file name of the NLCD.tif file>" e.g. "NLCD_san_diego_county_utm11.tif"
lc = read_stars(paste0(indir.lc,fname.lc))
Note: If this doesn’t load for some reason, try re-downloading the NLCD data from here: https://drive.google.com/file/d/1Cp0UxPXv7S6ikMbtStxmKXasRRzs70Ol/view?usp=sharing
Check the crs. Are they the same?
st_crs(lc)
## Coordinate Reference System:
## User input: WGS 84 / UTM zone 11N
## wkt:
## PROJCRS["WGS 84 / UTM zone 11N",
## BASEGEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["UTM zone 11N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-117,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["Between 120°W and 114°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - Alberta; British Columbia (BC); Northwest Territories (NWT); Nunavut. Mexico. United States (USA)."],
## BBOX[0,-120,84,-114]],
## ID["EPSG",32611]]
plot(lc)
## downsample set to 4
plot(wshed.shp,add=TRUE)
## Warning in plot.sf(wshed.shp, add = TRUE): ignoring all but the first attribute
Can you see the watershed boundary on your map?
and plot. What happens if you use plot(wshed.shp, add=TRUE) instead?
image(lc,extent=wshed.shp)
#plot(lc,extent=wshed.shp)
plot(st_geometry(wshed.shp[1]),add=TRUE, )
Extract, then look at what you extracted.
### Things that didn't work:
#lc.extract = extract(lc,wshed.shp,method="simple") # Gets cell values for all cells in shp
#print(lc.extract[[1]][1:100])
#lc.extract.2 = aggregate(lc,wshed.shp[1],length) # Super slow
#library(exactextractr)
#lc.extract.3 = st_extract(lc,wshed.shp)
#lc.nogeom = st_set_geometry(lc.extract.3,NULL)
# try converting stars to sf object, then tabulate
# Worked:
# Subset the stars object (lc):
# from "Cropping" in https://r-spatial.org/book/07-Introsf.html
lc.shp = lc[wshed.shp]
plot(lc.shp)
#table(lc)
Now, summarize the cropped stars object using “table”:
lc.table = table(lc.shp) # Creates a table of counts from the cell list
print(lc.table)
## NLCD_san_diego_county_utm11.tif
## 11 21 22 23 24 31 42 43 52 71 81 82 90
## 5 18847 20625 27535 3750 70 39 111 59552 3676 205 1 428
## 95
## 21
These codes correspond to the numbers in lc.table. The lookup table is at https://www.mrlc.gov/data/legends/national-land-cover-database-2016-nlcd2016-legend. I’ve typed them in here by hand.
lc.key = data.frame(Code=c(11,21,22,23,24,31,42,43,52,71,81,82,90,95), Name=c("Open Water","Developed, Open Space","Developed, Low Intensity","Developed, Medium Intensity","Developed, High Intensity","Barren","Evergreen Forest","Mixed Forest","Shrub/Scrub","Grassland","Pasture","Crops","Woody Wetlands","Emergent wetlands"))
# Convert the table to a data frame. This makes it easier to work with.
lc.df = data.frame(lc.table)
names(lc.df) = c("Code","Cells")
# Merge the lc dataframe with the lc.key:
lc.df.w.names = merge(lc.df,lc.key,by="Code") # Merges two dfs on a single common field; here, "Code".
print(lc.df.w.names)
## Code Cells Name
## 1 11 5 Open Water
## 2 21 18847 Developed, Open Space
## 3 22 20625 Developed, Low Intensity
## 4 23 27535 Developed, Medium Intensity
## 5 24 3750 Developed, High Intensity
## 6 31 70 Barren
## 7 42 39 Evergreen Forest
## 8 43 111 Mixed Forest
## 9 52 59552 Shrub/Scrub
## 10 71 3676 Grassland
## 11 81 205 Pasture
## 12 82 1 Crops
## 13 90 428 Woody Wetlands
## 14 95 21 Emergent wetlands
htmlTable(lc.df.w.names)
Code | Cells | Name | |
---|---|---|---|
1 | 11 | 5 | Open Water |
2 | 21 | 18847 | Developed, Open Space |
3 | 22 | 20625 | Developed, Low Intensity |
4 | 23 | 27535 | Developed, Medium Intensity |
5 | 24 | 3750 | Developed, High Intensity |
6 | 31 | 70 | Barren |
7 | 42 | 39 | Evergreen Forest |
8 | 43 | 111 | Mixed Forest |
9 | 52 | 59552 | Shrub/Scrub |
10 | 71 | 3676 | Grassland |
11 | 81 | 205 | Pasture |
12 | 82 | 1 | Crops |
13 | 90 | 428 | Woody Wetlands |
14 | 95 | 21 | Emergent wetlands |
This is more meaningful than cell counts.
lc.df.w.names$Area.km2 = (lc.df.w.names$Cells)*30*30/1E6 # Area in km2
lc.df.w.names$Area.percent = 100*lc.df.w.names$Cells/sum(lc.df.w.names$Cells)
print(lc.df.w.names)
## Code Cells Name Area.km2 Area.percent
## 1 11 5 Open Water 0.0045 3.707411e-03
## 2 21 18847 Developed, Open Space 16.9623 1.397472e+01
## 3 22 20625 Developed, Low Intensity 18.5625 1.529307e+01
## 4 23 27535 Developed, Medium Intensity 24.7815 2.041671e+01
## 5 24 3750 Developed, High Intensity 3.3750 2.780558e+00
## 6 31 70 Barren 0.0630 5.190376e-02
## 7 42 39 Evergreen Forest 0.0351 2.891781e-02
## 8 43 111 Mixed Forest 0.0999 8.230453e-02
## 9 52 59552 Shrub/Scrub 53.5968 4.415675e+01
## 10 71 3676 Grassland 3.3084 2.725689e+00
## 11 81 205 Pasture 0.1845 1.520039e-01
## 12 82 1 Crops 0.0009 7.414822e-04
## 13 90 428 Woody Wetlands 0.3852 3.173544e-01
## 14 95 21 Emergent wetlands 0.0189 1.557113e-02
lc.df.w.names$Area.km2 = round((lc.df.w.names$Cells)*30*30/1E6,digits=2) # Area in km2
lc.df.w.names$Area.percent = round(100*lc.df.w.names$Cells/sum(lc.df.w.names$Cells),digits=1)
print(lc.df.w.names)
## Code Cells Name Area.km2 Area.percent
## 1 11 5 Open Water 0.00 0.0
## 2 21 18847 Developed, Open Space 16.96 14.0
## 3 22 20625 Developed, Low Intensity 18.56 15.3
## 4 23 27535 Developed, Medium Intensity 24.78 20.4
## 5 24 3750 Developed, High Intensity 3.38 2.8
## 6 31 70 Barren 0.06 0.1
## 7 42 39 Evergreen Forest 0.04 0.0
## 8 43 111 Mixed Forest 0.10 0.1
## 9 52 59552 Shrub/Scrub 53.60 44.2
## 10 71 3676 Grassland 3.31 2.7
## 11 81 205 Pasture 0.18 0.2
## 12 82 1 Crops 0.00 0.0
## 13 90 428 Woody Wetlands 0.39 0.3
## 14 95 21 Emergent wetlands 0.02 0.0
sum(lc.df.w.names$Area.km2)
## [1] 121.38
How close is this to the USGS-reported drainage area?
This will give the land cover classes with the greatest cover fraction on top. You can do this with the “order” function.
lc.df.sort = lc.df.w.names[order(lc.df.w.names$Area.km2,decreasing=TRUE),]
lc.out = lc.df.sort[,c("Name","Area.km2","Area.percent")]
htmlTable(lc.out)
Name | Area.km2 | Area.percent | |
---|---|---|---|
9 | Shrub/Scrub | 53.6 | 44.2 |
4 | Developed, Medium Intensity | 24.78 | 20.4 |
3 | Developed, Low Intensity | 18.56 | 15.3 |
2 | Developed, Open Space | 16.96 | 14 |
5 | Developed, High Intensity | 3.38 | 2.8 |
10 | Grassland | 3.31 | 2.7 |
13 | Woody Wetlands | 0.39 | 0.3 |
11 | Pasture | 0.18 | 0.2 |
8 | Mixed Forest | 0.1 | 0.1 |
6 | Barren | 0.06 | 0.1 |
7 | Evergreen Forest | 0.04 | 0 |
14 | Emergent wetlands | 0.02 | 0 |
1 | Open Water | 0 | 0 |
12 | Crops | 0 | 0 |
You might notice that your data has differing numbers of significant digits (e.g. “0s” are left off in the printing). To fix this, you have to convert the numbers to strings (as.character) and specify the format.
lc.out$Area.km2 = as.character(format(lc.out$Area.km2,digits=3))
lc.out$Area.percent = as.character(format(lc.out$Area.percent,digits=1))
htmlTable(lc.out)
Name | Area.km2 | Area.percent | |
---|---|---|---|
9 | Shrub/Scrub | 53.60 | 44.2 |
4 | Developed, Medium Intensity | 24.78 | 20.4 |
3 | Developed, Low Intensity | 18.56 | 15.3 |
2 | Developed, Open Space | 16.96 | 14.0 |
5 | Developed, High Intensity | 3.38 | 2.8 |
10 | Grassland | 3.31 | 2.7 |
13 | Woody Wetlands | 0.39 | 0.3 |
11 | Pasture | 0.18 | 0.2 |
8 | Mixed Forest | 0.10 | 0.1 |
6 | Barren | 0.06 | 0.1 |
7 | Evergreen Forest | 0.04 | 0.0 |
14 | Emergent wetlands | 0.02 | 0.0 |
1 | Open Water | 0.00 | 0.0 |
12 | Crops | 0.00 | 0.0 |
outtable = htmlTable(lc.out,
caption="Table 1. Land use in the Los Penasquitos watershed.",
rnames=FALSE)
outtable
Table 1. Land use in the Los Penasquitos watershed. | ||
Name | Area.km2 | Area.percent |
---|---|---|
Shrub/Scrub | 53.60 | 44.2 |
Developed, Medium Intensity | 24.78 | 20.4 |
Developed, Low Intensity | 18.56 | 15.3 |
Developed, Open Space | 16.96 | 14.0 |
Developed, High Intensity | 3.38 | 2.8 |
Grassland | 3.31 | 2.7 |
Woody Wetlands | 0.39 | 0.3 |
Pasture | 0.18 | 0.2 |
Mixed Forest | 0.10 | 0.1 |
Barren | 0.06 | 0.1 |
Evergreen Forest | 0.04 | 0.0 |
Emergent wetlands | 0.02 | 0.0 |
Open Water | 0.00 | 0.0 |
Crops | 0.00 | 0.0 |
outdir = "some output directory"
outfile = "EXC_Land_Use_Table1.hmtl"
setwd(outdir)
sink(outfile)
print(outtable,type="html",useViewer=FALSE)
sink()