Chapter 9 Combining Raster and Vector Data

12/7/2021

Chapter Link: https://bookdown.org/mcwimberly/gdswr-book/combining-raster-and-vector-data-1-gridded-meteorological-data.html

1 - Gridded Meteorological Data

This chapter focuses on the analysis of gridded meteorological and climate data using the PRISM dataset. We will look at the problem of summarizing mean annual precipitation for every county in the United States.

The Libraries Used

library(tidyverse) # Provides an efficient, fast, and well-documented workflow for general data modeling, wrangling, and visualization tasks.

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.2     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(sf) # Designed to bring spatial analysis in R in line with these other systems.

## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1

library(raster) # Implements basic and high-level functions for raster data and for vector data operations such as intersections.

## Loading required package: sp

## 
## Attaching package: 'raster'

## The following object is masked from 'package:dplyr':
## 
##     select

## The following object is masked from 'package:tidyr':
## 
##     extract

library(prism)# Allows special formatting of column names such as superscripts.

## Warning: package 'prism' was built under R version 4.1.2

## Be sure to set the download folder using `prism_set_dl_dir()`.

prism_set_dl_dir("~/prismtmp") # Be sure to set the download folder using `prism_set_dl_dir()`!!!!

9.1 ZONAL STATISTICS

county <- st_read("cb_2018_us_county_20m.shp") # The STATEFP and GEOID fields are initially read in as factors.

## Reading layer `cb_2018_us_county_20m' from data source 
##   `C:\Users\Da Dell\Documents\My SugarSync\School\GIS\Final Project\gdswr_chapter9_data\cb_2018_us_county_20m.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3220 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS:  NAD83

county <- county %>%
  mutate(state = as.numeric(as.character(STATEFP)), # The STATEFP and GEOID fields are converted.
         fips = as.numeric(as.character(GEOID))) %>%
  filter(state != 2, state != 15, state < 60) # The dataset is filtered down to the 48 conterminous United States.

st_crs(county) # Checks the coordinate reference system and note that the data are geographic (longitude and latitude) with a NAD83 datum.

## Coordinate Reference System:
##   User input: NAD83 
##   wkt:
## GEOGCRS["NAD83",
##     DATUM["North American Datum 1983",
##         ELLIPSOID["GRS 1980",6378137,298.257222101,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4269]]

ggplot(data = county) +
  geom_sf(fill = NA) # Plots out the data creating a basic & unfilled map of the US counties.

The precipitation data to summarize comes from the PRISM dataset (http://www.prism.oregonstate.edu). Is a widely-used interpolated climate dataset for the United States that combines data from meteorological stations with other ancillary datasets such as elevation.

options(prism.path = ".") # Is a prism package that automates data downloading and importing as a raster object.
get_prism_normals(type = 'ppt', resolution = '4km', annual = T, keepZip = TRUE) # Downloads the 30-year climatology of annual precipitation for the conterminous US.

## 
  |                                                                            
  |                                                                      |   0%

## 
##  already exists. Skipping downloading.

## 
  |                                                                            
  |======================================================================| 100%

prism_archive_ls() # Allows the user to view the list of downloaded packages.

##  [1] "PRISM_ppt_30yr_normal_4kmM2_annual_bil"
##  [2] "PRISM_ppt_stable_4kmM3_201801_bil"     
##  [3] "PRISM_ppt_stable_4kmM3_201802_bil"     
##  [4] "PRISM_ppt_stable_4kmM3_201803_bil"     
##  [5] "PRISM_ppt_stable_4kmM3_201804_bil"     
##  [6] "PRISM_ppt_stable_4kmM3_201805_bil"     
##  [7] "PRISM_ppt_stable_4kmM3_201806_bil"     
##  [8] "PRISM_ppt_stable_4kmM3_201807_bil"     
##  [9] "PRISM_ppt_stable_4kmM3_201808_bil"     
## [10] "PRISM_ppt_stable_4kmM3_201809_bil"     
## [11] "PRISM_ppt_stable_4kmM3_201810_bil"     
## [12] "PRISM_ppt_stable_4kmM3_201811_bil"     
## [13] "PRISM_ppt_stable_4kmM3_201812_bil"

prism_p30 <- pd_stack(prism_archive_ls()[1]) # Used to incorporate one or more of these dataset into a raster object.

cnty_ras <- rasterize(county, prism_p30, field = "fips") # Converts the vector polygon dataset of counties to a raster dataset, in which each raster cell is coded with the 5-digit FIPS code of the corresponding county.
cnty_p30 <- zonal(prism_p30, cnty_ras, fun = "mean") # Runs zonal statistics by specifying (1) the raster layer to summarize (prism_p30), (2) the raster layer with the zones to use for summarizing (cnty_ras), and (3) the function to use for summarizing ("mean").

cnty_p30 <- data.frame(cnty_p30) # The output matrix is converted into a data frame.
summary(cnty_p30) # Returns the data. However, there is a problem with the output. There are fewer counties in the summary data table than there are counties in the polygon file.

##       zone           value        
##  Min.   : 1001   Min.   :  83.68  
##  1st Qu.:19041   1st Qu.: 751.83  
##  Median :29205   Median :1054.33  
##  Mean   :30625   Mean   : 991.52  
##  3rd Qu.:45087   3rd Qu.:1221.14  
##  Max.   :56045   Max.   :2960.78

dim(cnty_p30) # Returns the dimension of the data frame.

## [1] 3101    2

dim(county) # Returns the dimension of the data frame.

## [1] 3108   12

setdiff(county$fips, cnty_p30$zone) # Indicates which elements of data frame county$fip are not existent in data frame cnty_p30$zone

## [1] 51580 51685 51610 51570 51683 51600 51678

9.2 ZONE SIZE AND RASTER SPATIAL RESOLUTION

cnty_join1 <- left_join(county, cnty_p30, by = c("fips" = "zone")) # Joins the zonal summaries to the county polygons.

ggplot(data = cnty_join1) +
  geom_sf(aes(fill = value)) +
  scale_fill_continuous(name = "Precip (mm)") # Generates a national map, which there are no obvious counties with missing data with a legend for the precipitation.

Note that all the missing counties have a state FIPS code of 51, which is Virginia

va_join1 <- filter(cnty_join1, STATEFP == "51") # Zooms to Virginia by selecting the counties in the state so we can query a non-spatial data frame.

ggplot(data = va_join1) +
  geom_sf(aes(fill = value)) +
  scale_fill_continuous(name = "Precip (mm)") # Generates a map of Virginia with a legend for the precipitation.

Virginia is unique among states in that it has a number of small, independent cities that are governed separately from the surrounding county (akin to Washington D.C.) and are therefore assigned their own county FIPS codes. We will create another map to find where missing data is located.

ggplot(data = va_join1) +
  geom_sf(aes(fill = value)) +
  coord_sf(xlim = c(-77.6, -77.0), ylim = c(38.6, 39.1)) +
  scale_fill_continuous(name = "Precip (mm)") # Generates a map of a portion of Northern  Virginia with a legend for the precipitation.

Now we can see several small cities with missing precipitation data. When we rasterized the counties to a 4 km grid to match the precipitation data, we sampled the center point of each 4 km grid cell and assigned it the FIPS code of the county that it fell within. However, if a county is very small or narrow, it is possible that no grid center point will fall within its boundaries. In this case, the county ends up excluded from the zonal summary. The most straightforward way to deal with this is to use a finer sampling grid to generate the zonal statistics.

prism_p30_1km <- disaggregate(prism_p30, fact = 4, method = "bilinear") # disaggregate() reduces the cell size of the PRISM data by a factor of four - changing the cell size to approximately 1 km. Bilinear resampling is used to conduct linear interpolation between the center points of the 4 km grid cells to estimate values at the 1 km resolution. 
cnty_ras_1km <- rasterize(county, prism_p30_1km, field = "fips") # Uses the new 1 km grid to rasterize the county dataset.

cnty_p30_1km <- zonal(prism_p30_1km, cnty_ras_1km, fun = "mean")  # Runs zonal statistics by specifying (1) the raster layer to summarize (prism_p30_1km), (2) the raster layer with the zones to use for summarizing (cnty_ras_1km), and (3) the function to use for summarizing ("mean").

cnty_p30_1km <- data.frame(cnty_p30_1km) # cnty_p30_1km is converted into a data frame.
summary(cnty_p30_1km) # Returns the data.

##       zone            mean       
##  Min.   : 1001   Min.   :  83.2  
##  1st Qu.:19045   1st Qu.: 751.2  
##  Median :29212   Median :1054.8  
##  Mean   :30672   Mean   : 991.5  
##  3rd Qu.:46008   3rd Qu.:1221.4  
##  Max.   :56045   Max.   :2943.2

dim(cnty_p30_1km) # Returns the dimension of the data frame.

## [1] 3108    2

dim(county) # Returns the dimension of the data frame.

## [1] 3108   12

setdiff(county$fips, cnty_p30_1km$zone) # Indicates which elements of data frame county$fip are not existent in data frame cnty_p30_1km$zone.

## numeric(0)

Now the zonal summary table contains the same number of records as the county file. By checking the zoomed-in map of Virginia, we can see that all of the counties, including the tiny independent cities, now have precipitation values.

cnty_join2 <- left_join(county, cnty_p30_1km, 
                        by = c("fips" = "zone")) # Joins the zonal summaries to the county polygons.

va_join2 <- filter(cnty_join2, STATEFP == "51") # Zooms to Virginia by selecting the counties in the state so we can query a non-spatial data frame.

ggplot(data = va_join2) +
  geom_sf(aes(fill = mean)) +
  scale_fill_continuous(name = "Precip (mm)") +
  coord_sf(xlim = c(-77.6, -77.0), ylim = c(38.6, 39.1)) # Generates a map of a portion of Northern  Virginia with a legend for the precipitation.

9.3 EXTRACTING VALUES WITH POINT DATA

Let’s say that we would like to check the accuracy of the PRISM climate grids by comparing them with weather station data. In this example, we will look at monthly values for 2018.

get_prism_monthlys(type = 'ppt', years = 2018,  mon=1:12, keepZip = TRUE) # Downloads the monthly climatology of 2018 precipitation for the conterminous US.

## 
  |                                                                            
  |                                                                      |   0%

## 
## PRISM_ppt_stable_4kmM3_201801_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |======                                                                |   8%

## 
## PRISM_ppt_stable_4kmM3_201802_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |============                                                          |  17%

## 
## PRISM_ppt_stable_4kmM3_201803_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |==================                                                    |  25%

## 
## PRISM_ppt_stable_4kmM3_201804_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |=======================                                               |  33%

## 
## PRISM_ppt_stable_4kmM3_201805_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |=============================                                         |  42%

## 
## PRISM_ppt_stable_4kmM3_201806_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |===================================                                   |  50%

## 
## PRISM_ppt_stable_4kmM3_201807_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |=========================================                             |  58%

## 
## PRISM_ppt_stable_4kmM3_201808_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |===============================================                       |  67%

## 
## PRISM_ppt_stable_4kmM3_201809_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |====================================================                  |  75%

## 
## PRISM_ppt_stable_4kmM3_201810_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |==========================================================            |  83%

## 
## PRISM_ppt_stable_4kmM3_201811_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |================================================================      |  92%

## 
## PRISM_ppt_stable_4kmM3_201812_bil.zip already exists. Skipping downloading.

## 
  |                                                                            
  |======================================================================| 100%

datalist <- prism_archive_ls() # Allows the user to view the list of downloaded packages.
prism_prec_2018 <- pd_stack(datalist[2:13]) # Used to incorporate one or more of these dataset into a raster object.

mesosm <- read_csv("mesodata_small.csv") # Reads .csv file.

## Rows: 240 Columns: 9

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (1): STID
## dbl  (7): MONTH, YEAR, TMAX, TMIN, HMAX, HMIN, RAIN
## date (1): DATE

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

geo_coords <- read_csv("geoinfo.csv") # Reads .csv file.

## Rows: 143 Columns: 12

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): stid, name, city, cdir, cnty, cdiv, clas
## dbl (5): stnm, rang, lat, lon, elev

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

geo_coords <- st_as_sf(geo_coords, coords = c("lon", "lat")) # Converts to a sf object
mystations <- unique(mesosm$STID) # Returns a vector, data frame or array but with duplicate elements/rows removed.
station_pts <- geo_coords %>%
  filter(stid %in% mystations) # Filters data to only contains the locations of the four weather stations included in the mesosm dataset.

prism_samp <- extract(prism_prec_2018, station_pts, factors = T, df = T) # Obtains the raster data associated with each of the four points.

names(prism_samp) <- c("ID", month.abb) # Gets and sets the names of prism_samp.
prism_samp <- bind_cols(prism_samp, station_pts) # Combines the columns of prism_samp and station_pts into the prism_samp data frame.

prism_long <- prism_samp %>%
  pivot_longer(Jan:Dec, names_to = "mnth_name", values_to = "PPrec") %>%
  mutate(month = match(mnth_name, month.abb), PPrec_in = PPrec * 0.0393701) # The data is pivoted into a long format.

compare_dat <- mesosm %>%
  filter(YEAR == 2018) %>%
  inner_join(prism_long, by = c("STID" = "stid", "MONTH" = "month")) %>%
  dplyr::select(STID, MONTH, RAIN, PPrec_in)  # The data is joined to the mesosm data frame (which contains the station measurements) by month and station ID. The critical variables are the monthly rainfall measured at the mesonet stations (RAIN), and the monthly rainfall estimates from the PRISM dataset converted to inches to match the Mesonet data (“Pprec_in”).

ggplot(data = compare_dat) +
  geom_point(aes(x = RAIN, y = PPrec_in, color = STID)) +
  scale_color_discrete(name = "Station ID") +
  xlab("Mesonet rainfall (in)") +
  ylab("PRISM rainfall (in)") # Generates a plot w/ X-axis = Mesonet rainfall (in), Y-axis = PRISM rainfall (in), and a a legend for the Station IDs.

The tight relationship looks pretty impressive. However, keep in mind that the Oklahoma mesonet data is one of many sources of station data that are used to developed the interpolated PRISM dataset. So the two sources of data are not entirely independent. The following code generates some more summaries of the relationships between these precipitation estimates.

rain_lm <- lm(PPrec_in ~ RAIN, data = compare_dat) # Carries out a linear regression. The linear regression has a high R-squared and a slope that is slightly less than one.
summary(rain_lm) # Returns the data.

## 
## Call:
## lm(formula = PPrec_in ~ RAIN, data = compare_dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.58056 -0.20460 -0.06542  0.20472  1.40651 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.25288    0.10994    2.30    0.026 *  
## RAIN         0.92353    0.02497   36.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4625 on 46 degrees of freedom
## Multiple R-squared:  0.9675, Adjusted R-squared:  0.9668 
## F-statistic:  1368 on 1 and 46 DF,  p-value: < 2.2e-16

rain_sum <- compare_dat %>%
  summarize(RMSE = sqrt(mean((PPrec_in - RAIN)^2)),
            MAE = mean(abs(PPrec_in - RAIN)),
            ME = mean(PPrec_in - RAIN)) # Summarizes the statistics of the root mean squared error (RMSE), mean absolute error (MAE), and the mean error (ME).
rain_sum # Calls tibble rain_sum

## # A tibble: 1 x 3
##    RMSE   MAE      ME
##   <dbl> <dbl>   <dbl>
## 1 0.497 0.270 -0.0147