This exercise accompanies the lessons in Environmental Data Analytics (ENV872L) on spatial analysis.
<DarpanBarua>_A09_SpatialAnalysis.Rmd (replacing
<DarpanBarua> with your first and last name).here() command to display the current
project directory#1. Below imports libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
library(leaflet)
library(here)
## here() starts at /home/guest/EDA_Spring2025
install.packages("mapview")
## Installing package into '/home/guest/R/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(mapview)
#2. Below shows the current project directory
here()
## [1] "/home/guest/EDA_Spring2025"
In this exercise, we will be exploring stream gage height data in
Nebraska corresponding to floods occurring there in 2019. First, we will
import from the US Counties shapefile we’ve used in lab lessons,
filtering it this time for just Nebraska counties. Nebraska’s state FIPS
code is 31 (as North Carolina’s was 37).
cb_2018_us_county_20m.shp shapefile into an sf
dataframe, filtering records for Nebraska counties (State FIPS =
31)mapview or
ggplot)#3. Read in Counties shapefile into an sf dataframe, filtering for just NE counties
# Read and filter shapefile
county_shapefile <- st_read(here("Data", "cb_2018_us_county_20m.shp")) %>%
filter(STATEFP == "31")
## Reading layer `cb_2018_us_county_20m' from data source
## `/home/guest/EDA_Spring2025/Data/cb_2018_us_county_20m.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 3220 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS: NAD83
#4. Reveal the CRS of the counties features
st_crs(county_shapefile)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
#5. Plot the data
mapview(county_shapefile)
ANSWER: 4269. Geographic — it uses angular units (latitude and longitude in degrees).NAD83 (North American Datum 1983) is the datum associated with the CRS.
Next we’ll read in some USGS/NWIS gage location data added to the
Data/Raw folder. These are in the
NWIS_SiteInfo_NE_RAW.csv file.(See
NWIS_SiteInfo_NE_RAW.README.txt for more info on this
dataset.)
Read the NWIS_SiteInfo_NE_RAW.csv file into a
standard dataframe, being sure to set the site_no field as
well as other character columns as a factor.
Display the structure of this dataset.
#7. Read in gage locations csv as a dataframe
gage_data <- read_csv(
here("Data", "Raw", "NWIS_SiteInfo_NE_RAW.csv"),
col_types = cols(
.default = col_character(),
dec_lat_va = col_double(),
dec_long_va = col_double()
)
)
#8. Display the structure of the dataframe
str(gage_data)
## spc_tbl_ [175 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ site_no : chr [1:175] "06453600" "06454100" "06461500" "06463500" ...
## $ station_nm : chr [1:175] "Ponca Creek at Verdel, Nebr." "Niobrara River at Agate, Nebr." "Niobrara River near Sparks, Nebr." "Long Pine Creek near Riverview, Nebr." ...
## $ site_tp_cd : chr [1:175] "ST" "ST" "ST" "ST" ...
## $ dec_lat_va : num [1:175] 42.8 42.4 42.9 42.7 42.8 ...
## $ dec_long_va : num [1:175] -98.2 -103.8 -100.4 -99.7 -99.3 ...
## $ coord_acy_cd : chr [1:175] "S" "S" "S" "S" ...
## $ dec_coord_datum_cd: chr [1:175] "NAD83" "NAD83" "NAD83" "NAD83" ...
## - attr(*, "spec")=
## .. cols(
## .. .default = col_character(),
## .. site_no = col_character(),
## .. station_nm = col_character(),
## .. site_tp_cd = col_character(),
## .. dec_lat_va = col_double(),
## .. dec_long_va = col_double(),
## .. coord_acy_cd = col_character(),
## .. dec_coord_datum_cd = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Convert the dataframe to an sf dataframe. * Note: These data use the same coordinate reference system as the counties dataset
Display the structure of the resulting sf dataframe
#10. Convert to an sf object
gage_sf <- gage_data %>%
st_as_sf(coords = c("dec_long_va", "dec_lat_va"), crs = 4269)
#11. Display the structure
class(gage_sf)
## [1] "sf" "tbl_df" "tbl" "data.frame"
st_crs(gage_sf)
## Coordinate Reference System:
## User input: EPSG:4269
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Geodesy."],
## AREA["North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands."],
## BBOX[14.92,167.65,86.46,-47.74]],
## ID["EPSG",4269]]
mapview(gage_sf)
ANSWER: ‘geometry’ the new field - is the special column that stores the spatial information (i.e., point coordinates for each gage). ‘dec_lat_va’ and ‘dec_long_va’ no longer appear because we turned these 2 numeric columns into spatial coordinates and stored them in ‘gemoetry’.
ggplot to plot the county and gage location
datasets.#13. Plot the gage locations atop the county features
library(ggplot2)
my_theme <- theme_bw() +
theme(
plot.background = element_rect(
fill = 'linen',
color = 'Black'
),
legend.background = element_rect(fill = 'linen'),
axis.line = element_line(
linewidth = 1,
color = 'Black'
),
axis.text = element_text(
family = 'serif'
),
plot.title = element_text(
face = 'bold',
color = 'Black',
family = 'serif'
),
panel.background = element_rect(
fill = "linen",
colour = "linen",
linewidth = 0.5,
linetype = "solid"
)
)
# Below sets my_theme as the default theme for all plots
theme_set(my_theme)
ggplot() +
geom_sf(data = county_shapefile, fill = "gray90", color = "black") +
geom_sf(data = gage_sf, color = "blue", size = 2, alpha = 0.7) +
labs(
title = "NWIS Gage Locations in Nebraska",
subtitle = "Darpan Barua")
Lastly, we want to attach some gage height data to our site
locations. I’ve constructed a csv file listing many of the Nebraska gage
sites, by station name and site number along with stream gage heights
(in meters) recorded during the recent flood event. This file is titled
NWIS_SiteFlowData_NE_RAW.csv and is found in the Data/Raw
folder.
NWIS_SiteFlowData_NE_RAW.csv dataset in as a
dataframe
site_no and station_nm can both/either
serve as joining attributes#14. Read the site flow data into a data frame
gage_height <- read_csv(
here("Data", "Raw", "NWIS_SiteFlowData_NE_RAW.csv"),
col_types = cols(
site_no = col_character(),
station_nm = col_character(),
date = col_datetime(format = ""),
gage_ht = col_double()
)
)
#15. Show the structure of the dataframe
str(gage_height)
## spc_tbl_ [152 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ site_no : chr [1:152] "06453600" "06453620" "06461500" "06461500" ...
## $ station_nm: chr [1:152] "Ponca Creek at Verdel, Nebr." "Missouri River blw Ponca Creek nr Verdel, Nebr." "Niobrara River near Sparks, Nebr." "Niobrara River near Sparks, Nebr." ...
## $ date : POSIXct[1:152], format: "2019-03-20 14:15:00" "2019-03-20 14:15:00" ...
## $ gage_ht : num [1:152] 12.58 19.96 3.61 3.63 11.38 ...
## - attr(*, "spec")=
## .. cols(
## .. site_no = col_character(),
## .. station_nm = col_character(),
## .. date = col_datetime(format = ""),
## .. gage_ht = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
#16. Join the flow data to our NWIS gage location spatial dataframe
gage_joined <- inner_join(
gage_sf,
gage_height,
by = "site_no"
)
#17. Show the column names in the resulting spatial dataframe
colnames(gage_joined)
## [1] "site_no" "station_nm.x" "site_tp_cd"
## [4] "coord_acy_cd" "dec_coord_datum_cd" "geometry"
## [7] "station_nm.y" "date" "gage_ht"
#18. Show the dimensions of this joined dataset
dim(gage_joined)
## [1] 136 9
Now we can examine where the flooding appears most acute by
visualizing gage heights spatially. 19. Plot the gage sites on top of
counties (using mapview, ggplot, or
leaflet) * Show the magnitude of gage height by color,
shape, other visualization technique.
#Map the points, sized by gage height
mapview(county_shapefile) +
mapview(gage_joined, zcol = "gage_ht", cex = "gage_ht")
Up next we will do some spatial analysis with our data. To prepare for this, we should transform our data into a projected coordinate system. We’ll choose UTM Zone 14N (EPGS = 32614).
mapview or ggplot, plot the data so
that each can be seen as different colors#20 Transform the counties and gage location datasets to UTM Zone 14
# Below transforms the counties dataset
county_proj <- st_transform(county_shapefile, crs = 32614)
# Below transforms the gage site dataset (with gage height info)
gage_proj <- st_transform(gage_joined, crs = 32614)
#21 Plot the data
mapview(county_proj, color = "black", alpha = 0.2) +
mapview(gage_proj, zcol = "gage_ht", cex = "gage_ht")
Now let’s zoom into a particular county and examine the gages located there. 22. Select Saunders county from your projected county sf dataframe 23. Select the gage sites falling within that county to a new spatial dataframe 24. Select the gage sites within 15km of the county to a new spatial dataframe 25. Create a plot showing (each symbolized distinctly): * all Nebraska counties, * the selected county, * the gage sites in that county, * and the gage sites within 15 km of the county
#22 Select the county
saunders <- county_proj %>%
filter(NAME == "Saunders")
#23 Spatially select gages within the selected county
saunders_gages <- st_intersection(gage_proj, saunders)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
#24 Spatially select gages within 15 k of the selected county
saunders_buffer <- st_buffer(saunders, dist = 15000)
saunders_buffer_gages <- gage_proj[saunders_buffer, ]
#25 Plot
mapview(county_proj, color = "gray") +
mapview(saunders, color = "blue", alpha = 0.3) +
mapview(saunders_buffer_gages, color = "orange", col.regions = "orange", cex = 5) +
mapview(saunders_gages, color = "red", col.regions = "red", cex = 7)