Lecture 18 - Geospatial Data Part Two

Load Libraries

In this class, we will study the second basic geospatial data type - raster. Below are the packages that need to be installed and loaded.

library(sf)           
library(terra)          
library(tidyverse)

Textbook link:https://bookdown.org/mcwimberly/gdswr-book/raster-geospatial-data---continuous.html

Recall: Geospatial data types

Vector: point, lines, polygons referenced by geospatial information
Raster: geospatial information by pixel or equal-spaced grids

Raster data type

Raster data type is useful to visualize and study characteristics such as

temperature
terrain
recipitation

and other weather or geographic conditions that continuously span the area.

Importing Raster Data

A raster object can be created by calling the rast() function and specifying an external image file as an argument.

In this example, a dataset of land surface temperature (LST) measured by the MODIS sensor on board the Terra satellite is imported into R as a raster object.

MODIS data, along with many other types of satellite remote sensing products, can be obtained from the United State Geological Survey’s Land Processes Distributed Active Archive Center (https://lpdaac.usgs.gov/).

lst <- rast("~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/MOD11A2_2017-07-12.LST_Day_1km.tif")

`SpatRaster` class

The resulting object belongs to the SpatRaster class. Invoking the print() function for a raster object provides information about the dimensions of the grid, cell size, geographic location, and other details.

class(lst)

## [1] "SpatRaster"
## attr(,"package")
## [1] "terra"

lst

## class       : SpatRaster 
## dimensions  : 1110, 3902, 1  (nrow, ncol, nlyr)
## resolution  : 0.009009009, 0.009009009  (x, y)
## extent      : -104.4326, -69.27943, 30, 40  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : MOD11A2_2017-07-12.LST_Day_1km.tif 
## name        : MOD11A2_2017-07-12.LST_Day_1km

resolution: the size of each pixel in x (latitude) and y (longitude).
extent: the range of this image
coord.ref.: the coordinate reference system, which is used to locate the actual location on earth from the coordinates

Raster values

In each layer of raster data, there stores a value correspoding to a characteristic of each pixel. In this example, the value is a measured temperature.

The summary() function provides information about the statistical distribution of raster values, with the size argument specifying the number of randomly sampled pixels to include in the summary.

summary(lst, size = 1e6)

##  MOD11A2_2017.07.12.LST_Day_1km
##  Min.   :    0                 
##  1st Qu.:    0                 
##  Median :14981                 
##  Mean   : 8321                 
##  3rd Qu.:15148                 
##  Max.   :16396

Helper functions

There are a number of helper functions, shown below, that can be used to extract specific characteristics of a SpatRaster object. The ncell(), nrow(), ncol(), and nlyr() functions return the numbers of grid cells, grid rows, grid columns, and raster layers in the dataset. The res() function returns the grid cell size.

ncell(lst)

## [1] 4331220

nrow(lst)

## [1] 1110

ncol(lst)

## [1] 3902

nlyr(lst)

## [1] 1

res(lst)

## [1] 0.009009009 0.009009009

The `ext()` function

The ext() function returns a SpatExtent object that contains the geographic coordinates of the raster extent. SpatExtent objects can be used to specify the extent of new raster objects or to crop rasters to a new extent.

lst_ext <- ext(lst)
class(lst_ext)

## [1] "SpatExtent"
## attr(,"package")
## [1] "terra"

lst_ext[1:4]

##       xmin       xmax       ymin       ymax 
## -104.43258  -69.27943   30.00000   40.00000

Rename layers

Each layer in a SpatRaster object has a name. By default, this name is the same as the data file that was used to create the raster. Because file names are often long and difficult to interpret, it is useful to specify more readable layer names. Layer names can be extracted and assigned using the names() function.

names(lst)

## [1] "MOD11A2_2017-07-12.LST_Day_1km"

names(lst)<-c("temperature")
names(lst)

## [1] "temperature"

Update raster values

In the LST dataset, missing data are coded as zeroes, and the raw digital numbers must be modified by a scaling factor of 0.02 to convert them into degrees Kelvin. The code below uses the ifel() function, which is analogous to the ifelse() function in base R, to replace zero values in the LST dataset with NA values.

lst <- ifel(lst == 0, NA, lst)
lst_c <- lst * 0.02 - 273.15
summary(lst_c, size = 1e6)

##   temperature    
##  Min.   : 9.1    
##  1st Qu.:28.0    
##  Median :29.6    
##  Mean   :30.3    
##  3rd Qu.:31.5    
##  Max.   :54.8    
##  NA's   :452091

The `global()` function

The global() function can be used to generate statistical summaries of the pixels in a raster dataset. As demonstrated in Chapter 1, the na.rm argument must be TRUE to ignore missing data in the calculations.

global(lst_c, fun = "mean", na.rm=T)

##                 mean
## temperature 30.28687

global(lst_c, fun = "min", na.rm=T)

##              min
## temperature 1.67

global(lst_c, fun = "max", na.rm=T)

##               max
## temperature 54.77

global(lst_c, fun = "sd", na.rm=T)

##                   sd
## temperature 3.652403

Write a raster file

Raster objects can be exported using the writeRaster() function. The format of the exported image is specified using the format argument. Common output formats are GTiff (GeoTiff), ascii (ESRI ASCII text ), CDF, (NetCDF), and HFA (ERDAS Imagine). The overwrite=TRUE argument will replace existing files with the same name.

writeRaster(lst_c, 
            filename = "MOD11A2_2017-07-12.LST_Day_1km_DegC.tif", 
            filetype="GTiff", overwrite=TRUE )

Visualize Raster map

The ggplot() function can be used to make maps with raster data, as well as vector data. However, ggplot() only works with inputs in the form of a data frame and therefore cannot be used to directly map a SpatRaster object.

In most situations, data frames are very inefficient for storing raster data because

In a data frame, we are effectively storing a vector representation of the raster dataset where each cell is stored as a separate point feature with its own x and y coordinates.
In contrast, raster cells are all the same size and arranged on a regular grid, it is not necessary to record the x and y coordinates of every cell. If the coordinates of just one cell (usually called the origin) are known, then the coordinates of any other cell can be determined automatically based on its relative position in the grid.

Fortunately, raster objects can be converted to data frames with one row per cell, columns for the x and y coordinates, and columns for one or more attribute that are associated with each cell.

A raster to dataframe function introduced by the textbook

Converting raster data to a data frame is straightforward, but requires several lines of code. To automate the process, a custom function can be used.

rasterdf <- function(x, aggregate = 1) {
  resampleFactor <- aggregate        
  inputRaster <- x    
  inCols <- ncol(inputRaster)
  inRows <- nrow(inputRaster)
  # Compute numbers of columns and rows in the resampled raster
  resampledRaster <- rast(ncol=(inCols / resampleFactor), 
                          nrow=(inRows / resampleFactor),
                          crs = crs(inputRaster))
  # Match to the extent of the original raster
  ext(resampledRaster) <- ext(inputRaster)
  # Resample data on the new raster
  y <- resample(inputRaster,resampledRaster,method='near')
  # Extract cell coordinates into a data frame
  coords <- xyFromCell(y, seq_len(ncell(y)))
  # Extract layer names
  dat <- stack(values(y, dataframe = TRUE))
  # Add names - 'value' for data, 'variable' for different
  # layer names in a multilayer raster
  names(dat) <- c('value', 'variable')
  dat <- cbind(coords, dat)
  dat
}

Convert raster data to dataframe

lst_df <- rasterdf(lst_c, aggregate = 3)
summary(lst_df)

##        x                 y             value               variable     
##  Min.   :-104.42   Min.   :30.01   Min.   :11.23    temperature:481370  
##  1st Qu.: -95.64   1st Qu.:32.50   1st Qu.:28.05                        
##  Median : -86.86   Median :35.00   Median :29.55                        
##  Mean   : -86.86   Mean   :35.00   Mean   :30.29                        
##  3rd Qu.: -78.07   3rd Qu.:37.50   3rd Qu.:31.53                        
##  Max.   : -69.29   Max.   :39.99   Max.   :54.77                        
##                                    NA's   :217276

Visualize data with `geom_raster()`

The geom_raster() function is used with the x and y arguments assigned to the corresponding columns in lst_df. The fill argument is the value column from lst_df, which contains the LST values.

ggplot(data = lst_df) +
  geom_raster(aes(x = x, 
                  y = y, 
                  fill = value)) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  labs(title = "LST-Adjusted Temperature (Terra Day)",
       x = "longitude", 
       y = "latitude") +
  theme(legend.position = "bottom")

Plot the vector map on top of raster data

The temperature map in the previous page looks nice but where is the plotting range exactly located? We can plot a US map to see that.

us_map <- st_read("~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/conterminous_us_states.shp", quiet = T)

ggplot() +
  geom_raster(data = lst_df, aes(x = x, 
                  y = y, 
                  fill = value), alpha = 0.5) + 
  geom_sf(data = us_map, color = "grey50", fill = NA) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  labs(title = "LST-Adjusted Temperature (Terra Day)",
       x = "longitude", 
       y = "latitude")

Cropping raster data

The terra package includes functions for a modifying the geometry of raster datasets. To clip out a smaller portion of the LST dataset, we can use the crop() function.

In this example, a bounding box of geographic coordinates (latitude and longitude) is specified to create a SpatExtent object and crop the raster to the U.S. state of Georgia.

clipext <- ext(-86, -80.5, 30, 35.5)
class(clipext)

## [1] "SpatExtent"
## attr(,"package")
## [1] "terra"

lst_clip <- crop(lst_c, clipext)
lst_clip_df <- rasterdf(lst_clip)

Let’s also read the vector map for Georgia state.

ga_sf <- st_read(dsn = "~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/GA_SHP.shp", quiet = TRUE)

Plotting temperature data around Georgia

ggplot() +  
  geom_raster(data = lst_clip_df, 
              aes(x = x, 
                  y = y, 
                  fill = value)) +
  geom_sf(data=ga_sf, 
          color = "grey50", 
          fill = NA, size = 0.5) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  labs(title = "LST-Adjusted Temperature (Terra Day)",
       x = "longitude", 
       y = "latitude")

Multiple-layer raster

The next examples will use forcings data from the North American Land Data Assimilation System (NLDAS).

These are gridded meterological data that can be obtained through the National Aeronautics and Space Administration’s Goddard Earth Sciences Data and Information Services Center (https://disc.gsfc.nasa.gov/).

The data have a grid cell size of 0.125 x 0.125 degrees and are stored as GeoTIFF files and can be imported as SpatRaster objects in the same way as the LST dataset.

path = "~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/"
temp2012 <- rast(paste0(path,"NLDAS_FORA0125_M.A201207.002.grb.temp.tif"))
temp2013 <- rast(paste0(path,"NLDAS_FORA0125_M.A201307.002.grb.temp.tif"))
temp2014 <- rast(paste0(path,"NLDAS_FORA0125_M.A201407.002.grb.temp.tif"))
temp2015 <- rast(paste0(path,"NLDAS_FORA0125_M.A201507.002.grb.temp.tif"))
temp2016 <- rast(paste0(path,"NLDAS_FORA0125_M.A201607.002.grb.temp.tif"))
temp2017 <- rast(paste0(path,"NLDAS_FORA0125_M.A201707.002.grb.temp.tif"))

Create multiple-layer raster

Multiple SpatRaster objects can be combined to produce multi-layer raster datasets. These objects are frequently used for space-time data because they can store rasters from different dates.

tempstack <- c(temp2012, temp2013, temp2014, 
               temp2015, temp2016, temp2017)
names(tempstack) <- c("July 2012", "July 2013", "July 2014", 
                      "July 2015", "July 2016", "July 2017")

The code above creates two multi-layer SpatRaster objects, one containing July temperature data from 2012-2017 and one containing July precipitation data from 2012-2017. The names() function is used to specify the layer names for each object.

Summary of multiple-layer raster data

When the global() function is applied to multi-layer rasters, each layer in the SpatRaster object is summarized individually.

global(tempstack, stat = "mean", na.rm=T)

##               mean
## July 2012 297.8852
## July 2013 296.1879
## July 2014 295.7905
## July 2015 296.2655
## July 2016 296.5719
## July 2017 296.7618

Update multiple-layer raster values

From these summaries, is apparent that the temperature units are in Kelvin. As was done with the MODIS LST data, these data can be converted to Celsius using a simple R expression. The expression is automatically applied to all the layers in the multi-layer SpatRaster object.

tempstack <- tempstack - 273.15
global(tempstack, stat = "mean", na.rm=T)

##               mean
## July 2012 24.73522
## July 2013 23.03793
## July 2014 22.64051
## July 2015 23.11549
## July 2016 23.42190
## July 2017 23.61176

Convert multiple-layer raster to dataframe

Multi-layer rasters can be mapped using ggplot(). First, the SpatRaster object is converted to a data frame using the custom rasterdf() function, and a dataset of U.S. state boundaries is imported for use as an overlay.

tempstack_df <- rasterdf(tempstack)
summary(tempstack_df)

##        x                 y             value             variable     
##  Min.   :-124.94   Min.   :25.06   Min.   : 3.09    July 2012:103936  
##  1st Qu.:-110.47   1st Qu.:32.03   1st Qu.:19.59    July 2013:103936  
##  Median : -96.00   Median :39.00   Median :23.53    July 2014:103936  
##  Mean   : -96.00   Mean   :39.00   Mean   :23.43    July 2015:103936  
##  3rd Qu.: -81.53   3rd Qu.:45.97   3rd Qu.:27.62    July 2016:103936  
##  Max.   : -67.06   Max.   :52.94   Max.   :40.18    July 2017:103936  
##                                    NA's   :140982

Visualize multiple-layer raster

Then, ggplot() is used with facet_wrap() to display a composite plot with one year mapped in each facet. The variable column, which contains the names of the different raster layers, is specified in the facets argument. The temperature data are displayed using a yellow to red color ramp with a vector dataset of state boundaries overlaid to provide a spatial reference (Figure 6.3).

states_sf <- read_sf(paste0(path,"conterminous_us_states.shp"), quiet = TRUE)

ggplot() +
  geom_raster(data = tempstack_df, 
              aes(x = x, 
                  y = y, 
                  fill = value)) +
  geom_sf(data = states_sf, 
          fill = NA,
          color = "grey50", 
          size = 0.25) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  facet_wrap(facets = vars(variable), ncol = 2) + 
  labs(title = "Mean July Temperature",
       x = "longitude", 
       y = "latitude")

Computations on Raster Objects

Many statistical summary functions have methods for raster objects. When used with multi-layer rasters, these functions will be applied to each pixel to summarize the data across all layers.

For example, the following code generates a mean temperature for each pixel using the six years of data stored in different layers.

Note the difference compared to the summaries calculated with global(), which were summarized across all pixels for each layer instead of across all layers for each pixel.

meantemp <- mean(tempstack)

meantemp_df <- rasterdf(meantemp)

ggplot() +
  geom_raster(data = meantemp_df, aes(x = x, 
                                      y = y, 
                                      fill = value)) +
  geom_sf(data = states_sf, 
          fill = NA, 
          color = "grey50", 
          size = 0.25) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  labs(title = "Mean July Temperature, 2012-2017",
       x = "longitude", y = "latitude") +
  theme(legend.position = "bottom")

Another example

The following code subtracts each of the six annual datasets in the multi-layer raster from the mean temperature layer to generate maps of temperature anomalies.

tempanom <- tempstack - meantemp
names(tempanom) <- names(tempstack)

The anomalies are displayed using a bicolor ramp generated with the scale_fill_gradients2() function. This approach is justified because the anomalies have a breakpoint at zero. Positive (warm) anomalies are displayed as red, and negative (cold) anomalies are displayed as blue.

tempanom_df <- rasterdf(tempanom)

ggplot() +
  geom_raster(data = tempanom_df, aes(x = x, 
                                      y = y, 
                                      fill = value)) +
  geom_sf(data = states_sf, 
          fill = NA,
          color = "grey50", 
          size = 0.25) +
  scale_fill_gradient2(name = "Degrees C", 
                       low = "blue", 
                       mid = "lightyellow", 
                       high = "red") +
  coord_sf(expand = TRUE) +
  facet_wrap(facets = vars(variable), ncol = 2) + 
  theme_void() +
  theme(strip.text.x = element_text(size=12, face="bold"))

Change color palette

One may also change the palette with what we learned from last class:

library(RColorBrewer) 

ggplot() +
  geom_raster(data = tempanom_df, aes(x = x, 
                                      y = y, 
                                      fill = value)) +
  geom_sf(data = states_sf, 
          fill = NA,
          color = "grey50", 
          size = 0.25) +
    scale_fill_distiller(name = "Degree C", 
                       palette = "Spectral") +
  coord_sf(expand = TRUE) +
  facet_wrap(facets = vars(variable), ncol = 2) + 
  theme_void() +
  theme(strip.text.x = element_text(size=12, face="bold"))

Quick plotting and summarizing functions

The terra package has methods for many of the base R plotting functions. This means that a SpatRaster object can be provided as an input to these functions, and R will be able to generate an appropriate map or graph. Below are a few examples.

plot(tempstack[[1]])

plot(tempstack)

hist(tempstack)

Lab Homework (Required)

Convert the clipped LST raster from Celsius to Fahrenheit and generate an updated version of the Georgia LST map.
Generate a new raster where each pixel contains the standard deviation of the NLDAS July Temperature values from 2012-2017. Map the result to explore where temperatures were most variable over this period.

(Optional) Raster Geospatial Data - Discrete

To use raster data to visualize terrain, we need to study the following chapter of the textbook:

https://bookdown.org/mcwimberly/gdswr-book/raster-geospatial-data---discrete.html

Lecture 18 - Geospatial Data Part Two

Miao Yu

2024-03-20

Load Libraries

Recall: Geospatial data types

Raster data type

Importing Raster Data

`SpatRaster` class

Raster values

Helper functions

The `ext()` function

Rename layers

Update raster values

The `global()` function

Write a raster file

Visualize Raster map

A raster to dataframe function introduced by the textbook

Convert raster data to dataframe

Visualize data with `geom_raster()`

Plot the vector map on top of raster data

Cropping raster data

Plotting temperature data around Georgia

Multiple-layer raster

Create multiple-layer raster

Summary of multiple-layer raster data

Update multiple-layer raster values

Convert multiple-layer raster to dataframe

Visualize multiple-layer raster

Computations on Raster Objects

Another example

Change color palette

Quick plotting and summarizing functions

Lab Homework (Required)

(Optional) Raster Geospatial Data - Discrete

Lecture 18 - Geospatial Data Part Two

Miao Yu

2024-03-20

Load Libraries

Recall: Geospatial data types

Raster data type

Importing Raster Data

SpatRaster class

Raster values

Helper functions

The ext() function

Rename layers

Update raster values

The global() function

Write a raster file

Visualize Raster map

A raster to dataframe function introduced by the textbook

Convert raster data to dataframe

Visualize data with geom_raster()

Plot the vector map on top of raster data

Cropping raster data

Plotting temperature data around Georgia

Multiple-layer raster

Create multiple-layer raster

Summary of multiple-layer raster data

Update multiple-layer raster values

Convert multiple-layer raster to dataframe

Visualize multiple-layer raster

Computations on Raster Objects

Another example

Change color palette

Quick plotting and summarizing functions

Lab Homework (Required)

(Optional) Raster Geospatial Data - Discrete

`SpatRaster` class

The `ext()` function

The `global()` function

Visualize data with `geom_raster()`