OVERVIEW

This exercise accompanies the lessons in Environmental Data Analytics (ENV872L) on spatial analysis.

Directions

  1. Rename this file <DarpanBarua>_A09_SpatialAnalysis.Rmd (replacing <DarpanBarua> with your first and last name).
  2. Change “Student Name” on line 3 (above) with your name.
  3. Use the lesson as a guide. It contains code that can be modified to complete the assignment.
  4. Work through the steps, creating code and output that fulfill each instruction.
  5. Be sure to answer the questions in this assignment document. Space for your answers is provided in this document and is indicated by the “>” character. If you need a second paragraph be sure to start the first line with “>”. You should notice that the answer is highlighted in green by RStudio.
  6. When you have completed the assignment, Knit the text and code into a single HTML file.

DATA WRANGLING

Set up your session

  1. Import libraries: tidyverse, sf, leaflet, here, and mapview
  2. Execute the here() command to display the current project directory
#1. Below imports libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
library(leaflet)
library(here)
## here() starts at /home/guest/EDA_Spring2025
install.packages("mapview")
## Installing package into '/home/guest/R/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(mapview)

#2. Below shows the current project directory
here()
## [1] "/home/guest/EDA_Spring2025"

Read (and filter) county features into an sf dataframe and plot

In this exercise, we will be exploring stream gage height data in Nebraska corresponding to floods occurring there in 2019. First, we will import from the US Counties shapefile we’ve used in lab lessons, filtering it this time for just Nebraska counties. Nebraska’s state FIPS code is 31 (as North Carolina’s was 37).

  1. Read the cb_2018_us_county_20m.shp shapefile into an sf dataframe, filtering records for Nebraska counties (State FIPS = 31)
  2. Reveal the dataset’s coordinate reference system
  3. Plot the records as a map (using mapview or ggplot)
#3. Read in Counties shapefile into an sf dataframe, filtering for just NE counties

# Read and filter shapefile
county_shapefile <- st_read(here("Data", "cb_2018_us_county_20m.shp")) %>%
  filter(STATEFP == "31")
## Reading layer `cb_2018_us_county_20m' from data source 
##   `/home/guest/EDA_Spring2025/Data/cb_2018_us_county_20m.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3220 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS:  NAD83
#4. Reveal the CRS of the counties features

st_crs(county_shapefile)
## Coordinate Reference System:
##   User input: NAD83 
##   wkt:
## GEOGCRS["NAD83",
##     DATUM["North American Datum 1983",
##         ELLIPSOID["GRS 1980",6378137,298.257222101,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4269]]
#5. Plot the data

mapview(county_shapefile)
  1. What is the EPSG code of the Counties dataset? Is this a geographic or a projected coordinate reference system? (In other words, does this CRS use angular or planar coordinate units?) To what datum is this CRS associated? (Tip: lookup the EPSG code on https://epsg.io or https://spatialreference.org)

ANSWER: 4269. Geographic — it uses angular units (latitude and longitude in degrees).NAD83 (North American Datum 1983) is the datum associated with the CRS.

Read in gage locations csv as a dataframe, then display the column names it contains

Next we’ll read in some USGS/NWIS gage location data added to the Data/Raw folder. These are in the NWIS_SiteInfo_NE_RAW.csv file.(See NWIS_SiteInfo_NE_RAW.README.txt for more info on this dataset.)

  1. Read the NWIS_SiteInfo_NE_RAW.csv file into a standard dataframe, being sure to set the site_no field as well as other character columns as a factor.

  2. Display the structure of this dataset.

#7. Read in gage locations csv as a dataframe

gage_data <- read_csv(
  here("Data", "Raw", "NWIS_SiteInfo_NE_RAW.csv"),
  col_types = cols(
    .default = col_character(),
    dec_lat_va = col_double(),
    dec_long_va = col_double()
  )
)



#8. Display the structure of the dataframe
str(gage_data)
## spc_tbl_ [175 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ site_no           : chr [1:175] "06453600" "06454100" "06461500" "06463500" ...
##  $ station_nm        : chr [1:175] "Ponca Creek at Verdel, Nebr." "Niobrara River at Agate, Nebr." "Niobrara River near Sparks, Nebr." "Long Pine Creek near Riverview, Nebr." ...
##  $ site_tp_cd        : chr [1:175] "ST" "ST" "ST" "ST" ...
##  $ dec_lat_va        : num [1:175] 42.8 42.4 42.9 42.7 42.8 ...
##  $ dec_long_va       : num [1:175] -98.2 -103.8 -100.4 -99.7 -99.3 ...
##  $ coord_acy_cd      : chr [1:175] "S" "S" "S" "S" ...
##  $ dec_coord_datum_cd: chr [1:175] "NAD83" "NAD83" "NAD83" "NAD83" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   .default = col_character(),
##   ..   site_no = col_character(),
##   ..   station_nm = col_character(),
##   ..   site_tp_cd = col_character(),
##   ..   dec_lat_va = col_double(),
##   ..   dec_long_va = col_double(),
##   ..   coord_acy_cd = col_character(),
##   ..   dec_coord_datum_cd = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
  1. What columns in the dataset contain the x and y coordinate values, respectively?
    > ANSWER: ‘dec_long_va’ and ‘dec_lat_va’

Convert the dataframe to a spatial features (“sf”) dataframe

  1. Convert the dataframe to an sf dataframe. * Note: These data use the same coordinate reference system as the counties dataset

  2. Display the structure of the resulting sf dataframe

#10. Convert to an sf object

gage_sf <- gage_data %>%
  st_as_sf(coords = c("dec_long_va", "dec_lat_va"), crs = 4269)


#11. Display the structure

class(gage_sf)
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"
st_crs(gage_sf)
## Coordinate Reference System:
##   User input: EPSG:4269 
##   wkt:
## GEOGCRS["NAD83",
##     DATUM["North American Datum 1983",
##         ELLIPSOID["GRS 1980",6378137,298.257222101,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Geodesy."],
##         AREA["North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands."],
##         BBOX[14.92,167.65,86.46,-47.74]],
##     ID["EPSG",4269]]
mapview(gage_sf)
  1. What new field(s) appear in the sf dataframe created? What field(s), if any, disappeared?

ANSWER: ‘geometry’ the new field - is the special column that stores the spatial information (i.e., point coordinates for each gage). ‘dec_lat_va’ and ‘dec_long_va’ no longer appear because we turned these 2 numeric columns into spatial coordinates and stored them in ‘gemoetry’.

Plot the gage locations on top of the counties

  1. Use ggplot to plot the county and gage location datasets.
  • Be sure the datasets are displayed in different colors
  • Title your plot “NWIS Gage Locations in Nebraska”
  • Subtitle your plot with your name
#13. Plot the gage locations atop the county features


library(ggplot2)

my_theme <- theme_bw() +
  theme(
    plot.background = element_rect(
      fill = 'linen',
      color = 'Black'
    ),
    legend.background = element_rect(fill = 'linen'),
    axis.line = element_line(
      linewidth = 1,
      color = 'Black'
    ),
    axis.text = element_text(
      family = 'serif'
    ),
    plot.title = element_text(
      
      face = 'bold',
      color = 'Black',
      family = 'serif'
    ),
    panel.background = element_rect(
      fill = "linen",
      colour = "linen",
      linewidth = 0.5,
      linetype = "solid"
    )
  )

# Below sets my_theme as the default theme for all plots
theme_set(my_theme)


ggplot() +
  geom_sf(data = county_shapefile, fill = "gray90", color = "black") +
  geom_sf(data = gage_sf, color = "blue", size = 2, alpha = 0.7) +
  labs(
    title = "NWIS Gage Locations in Nebraska",
    subtitle = "Darpan Barua")

Read in the gage height data and join the site location data to it.

Lastly, we want to attach some gage height data to our site locations. I’ve constructed a csv file listing many of the Nebraska gage sites, by station name and site number along with stream gage heights (in meters) recorded during the recent flood event. This file is titled NWIS_SiteFlowData_NE_RAW.csv and is found in the Data/Raw folder.

  1. Read the NWIS_SiteFlowData_NE_RAW.csv dataset in as a dataframe
    • Pay attention to which fields should be imported as factors!
  2. Show the structure of the dataframe.
  3. Join our site information (already imported above) to these gage height data
    • The site_no and station_nm can both/either serve as joining attributes
    • Construct this join so that the result only includes records features where both tables have data (N=136)
  4. Show the column names of this resulting spatial dataframe
  5. Show the dimensions of the resulting joined dataframe
#14. Read the site flow data into a data frame

gage_height <- read_csv(
  here("Data", "Raw", "NWIS_SiteFlowData_NE_RAW.csv"),
  col_types = cols(
    site_no = col_character(),
    station_nm = col_character(),
    date = col_datetime(format = ""),  
    gage_ht = col_double()
  )
)


#15. Show the structure of the dataframe 
str(gage_height)
## spc_tbl_ [152 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ site_no   : chr [1:152] "06453600" "06453620" "06461500" "06461500" ...
##  $ station_nm: chr [1:152] "Ponca Creek at Verdel, Nebr." "Missouri River blw Ponca Creek nr Verdel, Nebr." "Niobrara River near Sparks, Nebr." "Niobrara River near Sparks, Nebr." ...
##  $ date      : POSIXct[1:152], format: "2019-03-20 14:15:00" "2019-03-20 14:15:00" ...
##  $ gage_ht   : num [1:152] 12.58 19.96 3.61 3.63 11.38 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   site_no = col_character(),
##   ..   station_nm = col_character(),
##   ..   date = col_datetime(format = ""),
##   ..   gage_ht = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
#16. Join the flow data to our NWIS gage location spatial dataframe

gage_joined <- inner_join(
  gage_sf,
  gage_height,
  by = "site_no"
)

#17. Show the column names in the resulting spatial dataframe

colnames(gage_joined)
## [1] "site_no"            "station_nm.x"       "site_tp_cd"        
## [4] "coord_acy_cd"       "dec_coord_datum_cd" "geometry"          
## [7] "station_nm.y"       "date"               "gage_ht"
#18. Show the dimensions of this joined dataset

dim(gage_joined)
## [1] 136   9

Map the pattern of gage height data

Now we can examine where the flooding appears most acute by visualizing gage heights spatially. 19. Plot the gage sites on top of counties (using mapview, ggplot, or leaflet) * Show the magnitude of gage height by color, shape, other visualization technique.

#Map the points, sized by gage height
mapview(county_shapefile) +
  mapview(gage_joined, zcol = "gage_ht", cex = "gage_ht")

SPATIAL ANALYSIS

Up next we will do some spatial analysis with our data. To prepare for this, we should transform our data into a projected coordinate system. We’ll choose UTM Zone 14N (EPGS = 32614).

Transform the counties and gage site datasets to UTM Zone 14N

  1. Transform the counties and gage sf datasets to UTM Zone 14N (EPGS = 32614).
  2. Using mapview or ggplot, plot the data so that each can be seen as different colors
#20 Transform the counties and gage location datasets to UTM Zone 14

# Below transforms the counties dataset
county_proj <- st_transform(county_shapefile, crs = 32614)

# Below transforms the gage site dataset (with gage height info)
gage_proj <- st_transform(gage_joined, crs = 32614)

#21 Plot the data

mapview(county_proj, color = "black", alpha = 0.2) +
  mapview(gage_proj, zcol = "gage_ht", cex = "gage_ht")

Select the gages falling within a given county

Now let’s zoom into a particular county and examine the gages located there. 22. Select Saunders county from your projected county sf dataframe 23. Select the gage sites falling within that county to a new spatial dataframe 24. Select the gage sites within 15km of the county to a new spatial dataframe 25. Create a plot showing (each symbolized distinctly): * all Nebraska counties, * the selected county, * the gage sites in that county, * and the gage sites within 15 km of the county

#22 Select the county

saunders <- county_proj %>%
  filter(NAME == "Saunders")


#23 Spatially select gages within the selected county

saunders_gages <- st_intersection(gage_proj, saunders)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
#24 Spatially select gages within 15 k of the selected county
saunders_buffer <- st_buffer(saunders, dist = 15000)
saunders_buffer_gages <- gage_proj[saunders_buffer, ]


#25 Plot

mapview(county_proj, color = "gray") +
  mapview(saunders, color = "blue", alpha = 0.3) +
  mapview(saunders_buffer_gages, color = "orange", col.regions = "orange", cex = 5) +
  mapview(saunders_gages, color = "red", col.regions = "red", cex = 7)