Load Libraries

In this class, we will study the second basic geospatial data type - raster. Below are the packages that need to be installed and loaded.


library(sf)           
library(terra)          
library(tidyverse)  


Textbook link:https://bookdown.org/mcwimberly/gdswr-book/raster-geospatial-data---continuous.html

Recall: Geospatial data types


  • Vector: point, lines, polygons referenced by geospatial information
  • Raster: geospatial information by pixel or equal-spaced grids


Raster data type


Raster data type is useful to visualize and study characteristics such as

  • temperature
  • terrain
  • recipitation

and other weather or geographic conditions that continuously span the area.

Importing Raster Data


A raster object can be created by calling the rast() function and specifying an external image file as an argument.

In this example, a dataset of land surface temperature (LST) measured by the MODIS sensor on board the Terra satellite is imported into R as a raster object.

MODIS data, along with many other types of satellite remote sensing products, can be obtained from the United State Geological Survey’s Land Processes Distributed Active Archive Center (https://lpdaac.usgs.gov/).


lst <- rast("~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/MOD11A2_2017-07-12.LST_Day_1km.tif")

SpatRaster class


The resulting object belongs to the SpatRaster class. Invoking the print() function for a raster object provides information about the dimensions of the grid, cell size, geographic location, and other details.

class(lst)
## [1] "SpatRaster"
## attr(,"package")
## [1] "terra"
lst
## class       : SpatRaster 
## dimensions  : 1110, 3902, 1  (nrow, ncol, nlyr)
## resolution  : 0.009009009, 0.009009009  (x, y)
## extent      : -104.4326, -69.27943, 30, 40  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : MOD11A2_2017-07-12.LST_Day_1km.tif 
## name        : MOD11A2_2017-07-12.LST_Day_1km


  • resolution: the size of each pixel in x (latitude) and y (longitude).

  • extent: the range of this image

  • coord.ref.: the coordinate reference system, which is used to locate the actual location on earth from the coordinates

Raster values


In each layer of raster data, there stores a value correspoding to a characteristic of each pixel. In this example, the value is a measured temperature.

The summary() function provides information about the statistical distribution of raster values, with the size argument specifying the number of randomly sampled pixels to include in the summary.

summary(lst, size = 1e6)
##  MOD11A2_2017.07.12.LST_Day_1km
##  Min.   :    0                 
##  1st Qu.:    0                 
##  Median :14981                 
##  Mean   : 8321                 
##  3rd Qu.:15148                 
##  Max.   :16396

Helper functions


There are a number of helper functions, shown below, that can be used to extract specific characteristics of a SpatRaster object. The ncell(), nrow(), ncol(), and nlyr() functions return the numbers of grid cells, grid rows, grid columns, and raster layers in the dataset. The res() function returns the grid cell size.


ncell(lst)
## [1] 4331220
nrow(lst)
## [1] 1110
ncol(lst)
## [1] 3902
nlyr(lst)
## [1] 1
res(lst)
## [1] 0.009009009 0.009009009

The ext() function


The ext() function returns a SpatExtent object that contains the geographic coordinates of the raster extent. SpatExtent objects can be used to specify the extent of new raster objects or to crop rasters to a new extent.


lst_ext <- ext(lst)
class(lst_ext)
## [1] "SpatExtent"
## attr(,"package")
## [1] "terra"
lst_ext[1:4]
##       xmin       xmax       ymin       ymax 
## -104.43258  -69.27943   30.00000   40.00000

Rename layers


Each layer in a SpatRaster object has a name. By default, this name is the same as the data file that was used to create the raster. Because file names are often long and difficult to interpret, it is useful to specify more readable layer names. Layer names can be extracted and assigned using the names() function.

names(lst)
## [1] "MOD11A2_2017-07-12.LST_Day_1km"
names(lst)<-c("temperature")
names(lst)
## [1] "temperature"

Update raster values


In the LST dataset, missing data are coded as zeroes, and the raw digital numbers must be modified by a scaling factor of 0.02 to convert them into degrees Kelvin. The code below uses the ifel() function, which is analogous to the ifelse() function in base R, to replace zero values in the LST dataset with NA values.

lst <- ifel(lst == 0, NA, lst)
lst_c <- lst * 0.02 - 273.15
summary(lst_c, size = 1e6)
##   temperature    
##  Min.   : 9.1    
##  1st Qu.:28.0    
##  Median :29.6    
##  Mean   :30.3    
##  3rd Qu.:31.5    
##  Max.   :54.8    
##  NA's   :452091

The global() function


The global() function can be used to generate statistical summaries of the pixels in a raster dataset. As demonstrated in Chapter 1, the na.rm argument must be TRUE to ignore missing data in the calculations.

global(lst_c, fun = "mean", na.rm=T)
##                 mean
## temperature 30.28687
global(lst_c, fun = "min", na.rm=T)
##              min
## temperature 1.67
global(lst_c, fun = "max", na.rm=T)
##               max
## temperature 54.77
global(lst_c, fun = "sd", na.rm=T)
##                   sd
## temperature 3.652403

Write a raster file


Raster objects can be exported using the writeRaster() function. The format of the exported image is specified using the format argument. Common output formats are GTiff (GeoTiff), ascii (ESRI ASCII text ), CDF, (NetCDF), and HFA (ERDAS Imagine). The overwrite=TRUE argument will replace existing files with the same name.

writeRaster(lst_c, 
            filename = "MOD11A2_2017-07-12.LST_Day_1km_DegC.tif", 
            filetype="GTiff", overwrite=TRUE )

Visualize Raster map


The ggplot() function can be used to make maps with raster data, as well as vector data. However, ggplot() only works with inputs in the form of a data frame and therefore cannot be used to directly map a SpatRaster object.

In most situations, data frames are very inefficient for storing raster data because

  • In a data frame, we are effectively storing a vector representation of the raster dataset where each cell is stored as a separate point feature with its own x and y coordinates.
  • In contrast, raster cells are all the same size and arranged on a regular grid, it is not necessary to record the x and y coordinates of every cell. If the coordinates of just one cell (usually called the origin) are known, then the coordinates of any other cell can be determined automatically based on its relative position in the grid.

Fortunately, raster objects can be converted to data frames with one row per cell, columns for the x and y coordinates, and columns for one or more attribute that are associated with each cell.

A raster to dataframe function introduced by the textbook


Converting raster data to a data frame is straightforward, but requires several lines of code. To automate the process, a custom function can be used.

rasterdf <- function(x, aggregate = 1) {
  resampleFactor <- aggregate        
  inputRaster <- x    
  inCols <- ncol(inputRaster)
  inRows <- nrow(inputRaster)
  # Compute numbers of columns and rows in the resampled raster
  resampledRaster <- rast(ncol=(inCols / resampleFactor), 
                          nrow=(inRows / resampleFactor),
                          crs = crs(inputRaster))
  # Match to the extent of the original raster
  ext(resampledRaster) <- ext(inputRaster)
  # Resample data on the new raster
  y <- resample(inputRaster,resampledRaster,method='near')
  # Extract cell coordinates into a data frame
  coords <- xyFromCell(y, seq_len(ncell(y)))
  # Extract layer names
  dat <- stack(values(y, dataframe = TRUE))
  # Add names - 'value' for data, 'variable' for different
  # layer names in a multilayer raster
  names(dat) <- c('value', 'variable')
  dat <- cbind(coords, dat)
  dat
}

Convert raster data to dataframe


lst_df <- rasterdf(lst_c, aggregate = 3)
summary(lst_df)
##        x                 y             value               variable     
##  Min.   :-104.42   Min.   :30.01   Min.   :11.23    temperature:481370  
##  1st Qu.: -95.64   1st Qu.:32.50   1st Qu.:28.05                        
##  Median : -86.86   Median :35.00   Median :29.55                        
##  Mean   : -86.86   Mean   :35.00   Mean   :30.29                        
##  3rd Qu.: -78.07   3rd Qu.:37.50   3rd Qu.:31.53                        
##  Max.   : -69.29   Max.   :39.99   Max.   :54.77                        
##                                    NA's   :217276

Visualize data with geom_raster()


The geom_raster() function is used with the x and y arguments assigned to the corresponding columns in lst_df. The fill argument is the value column from lst_df, which contains the LST values.


ggplot(data = lst_df) +
  geom_raster(aes(x = x, 
                  y = y, 
                  fill = value)) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  labs(title = "LST-Adjusted Temperature (Terra Day)",
       x = "longitude", 
       y = "latitude") +
  theme(legend.position = "bottom")

Plot the vector map on top of raster data


The temperature map in the previous page looks nice but where is the plotting range exactly located? We can plot a US map to see that.

us_map <- st_read("~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/conterminous_us_states.shp", quiet = T)

ggplot() +
  geom_raster(data = lst_df, aes(x = x, 
                  y = y, 
                  fill = value), alpha = 0.5) + 
  geom_sf(data = us_map, color = "grey50", fill = NA) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  labs(title = "LST-Adjusted Temperature (Terra Day)",
       x = "longitude", 
       y = "latitude")

Cropping raster data


The terra package includes functions for a modifying the geometry of raster datasets. To clip out a smaller portion of the LST dataset, we can use the crop() function.

In this example, a bounding box of geographic coordinates (latitude and longitude) is specified to create a SpatExtent object and crop the raster to the U.S. state of Georgia.

clipext <- ext(-86, -80.5, 30, 35.5)
class(clipext)
## [1] "SpatExtent"
## attr(,"package")
## [1] "terra"
lst_clip <- crop(lst_c, clipext)
lst_clip_df <- rasterdf(lst_clip)

Let’s also read the vector map for Georgia state.

ga_sf <- st_read(dsn = "~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/GA_SHP.shp", quiet = TRUE)

Plotting temperature data around Georgia


ggplot() +  
  geom_raster(data = lst_clip_df, 
              aes(x = x, 
                  y = y, 
                  fill = value)) +
  geom_sf(data=ga_sf, 
          color = "grey50", 
          fill = NA, size = 0.5) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  labs(title = "LST-Adjusted Temperature (Terra Day)",
       x = "longitude", 
       y = "latitude")

Multiple-layer raster


The next examples will use forcings data from the North American Land Data Assimilation System (NLDAS).

These are gridded meterological data that can be obtained through the National Aeronautics and Space Administration’s Goddard Earth Sciences Data and Information Services Center (https://disc.gsfc.nasa.gov/).

The data have a grid cell size of 0.125 x 0.125 degrees and are stored as GeoTIFF files and can be imported as SpatRaster objects in the same way as the LST dataset.

path = "~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/"
temp2012 <- rast(paste0(path,"NLDAS_FORA0125_M.A201207.002.grb.temp.tif"))
temp2013 <- rast(paste0(path,"NLDAS_FORA0125_M.A201307.002.grb.temp.tif"))
temp2014 <- rast(paste0(path,"NLDAS_FORA0125_M.A201407.002.grb.temp.tif"))
temp2015 <- rast(paste0(path,"NLDAS_FORA0125_M.A201507.002.grb.temp.tif"))
temp2016 <- rast(paste0(path,"NLDAS_FORA0125_M.A201607.002.grb.temp.tif"))
temp2017 <- rast(paste0(path,"NLDAS_FORA0125_M.A201707.002.grb.temp.tif"))

Create multiple-layer raster


Multiple SpatRaster objects can be combined to produce multi-layer raster datasets. These objects are frequently used for space-time data because they can store rasters from different dates.

tempstack <- c(temp2012, temp2013, temp2014, 
               temp2015, temp2016, temp2017)
names(tempstack) <- c("July 2012", "July 2013", "July 2014", 
                      "July 2015", "July 2016", "July 2017")

The code above creates two multi-layer SpatRaster objects, one containing July temperature data from 2012-2017 and one containing July precipitation data from 2012-2017. The names() function is used to specify the layer names for each object.

Summary of multiple-layer raster data


When the global() function is applied to multi-layer rasters, each layer in the SpatRaster object is summarized individually.

global(tempstack, stat = "mean", na.rm=T)
##               mean
## July 2012 297.8852
## July 2013 296.1879
## July 2014 295.7905
## July 2015 296.2655
## July 2016 296.5719
## July 2017 296.7618

Update multiple-layer raster values


From these summaries, is apparent that the temperature units are in Kelvin. As was done with the MODIS LST data, these data can be converted to Celsius using a simple R expression. The expression is automatically applied to all the layers in the multi-layer SpatRaster object.

tempstack <- tempstack - 273.15
global(tempstack, stat = "mean", na.rm=T)
##               mean
## July 2012 24.73522
## July 2013 23.03793
## July 2014 22.64051
## July 2015 23.11549
## July 2016 23.42190
## July 2017 23.61176

Convert multiple-layer raster to dataframe


Multi-layer rasters can be mapped using ggplot(). First, the SpatRaster object is converted to a data frame using the custom rasterdf() function, and a dataset of U.S. state boundaries is imported for use as an overlay.

tempstack_df <- rasterdf(tempstack)
summary(tempstack_df)
##        x                 y             value             variable     
##  Min.   :-124.94   Min.   :25.06   Min.   : 3.09    July 2012:103936  
##  1st Qu.:-110.47   1st Qu.:32.03   1st Qu.:19.59    July 2013:103936  
##  Median : -96.00   Median :39.00   Median :23.53    July 2014:103936  
##  Mean   : -96.00   Mean   :39.00   Mean   :23.43    July 2015:103936  
##  3rd Qu.: -81.53   3rd Qu.:45.97   3rd Qu.:27.62    July 2016:103936  
##  Max.   : -67.06   Max.   :52.94   Max.   :40.18    July 2017:103936  
##                                    NA's   :140982

Visualize multiple-layer raster


Then, ggplot() is used with facet_wrap() to display a composite plot with one year mapped in each facet. The variable column, which contains the names of the different raster layers, is specified in the facets argument. The temperature data are displayed using a yellow to red color ramp with a vector dataset of state boundaries overlaid to provide a spatial reference (Figure 6.3).

states_sf <- read_sf(paste0(path,"conterminous_us_states.shp"), quiet = TRUE)

ggplot() +
  geom_raster(data = tempstack_df, 
              aes(x = x, 
                  y = y, 
                  fill = value)) +
  geom_sf(data = states_sf, 
          fill = NA,
          color = "grey50", 
          size = 0.25) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  facet_wrap(facets = vars(variable), ncol = 2) + 
  labs(title = "Mean July Temperature",
       x = "longitude", 
       y = "latitude")

Computations on Raster Objects


Many statistical summary functions have methods for raster objects. When used with multi-layer rasters, these functions will be applied to each pixel to summarize the data across all layers.

For example, the following code generates a mean temperature for each pixel using the six years of data stored in different layers.

Note the difference compared to the summaries calculated with global(), which were summarized across all pixels for each layer instead of across all layers for each pixel.

meantemp <- mean(tempstack)

meantemp_df <- rasterdf(meantemp)

ggplot() +
  geom_raster(data = meantemp_df, aes(x = x, 
                                      y = y, 
                                      fill = value)) +
  geom_sf(data = states_sf, 
          fill = NA, 
          color = "grey50", 
          size = 0.25) +
  scale_fill_gradient(name = "Degrees C", 
                      low = "yellow", 
                      high = "red") +
  coord_sf(expand = FALSE) +
  labs(title = "Mean July Temperature, 2012-2017",
       x = "longitude", y = "latitude") +
  theme(legend.position = "bottom")

Another example


The following code subtracts each of the six annual datasets in the multi-layer raster from the mean temperature layer to generate maps of temperature anomalies.

tempanom <- tempstack - meantemp
names(tempanom) <- names(tempstack)

The anomalies are displayed using a bicolor ramp generated with the scale_fill_gradients2() function. This approach is justified because the anomalies have a breakpoint at zero. Positive (warm) anomalies are displayed as red, and negative (cold) anomalies are displayed as blue.

tempanom_df <- rasterdf(tempanom)

ggplot() +
  geom_raster(data = tempanom_df, aes(x = x, 
                                      y = y, 
                                      fill = value)) +
  geom_sf(data = states_sf, 
          fill = NA,
          color = "grey50", 
          size = 0.25) +
  scale_fill_gradient2(name = "Degrees C", 
                       low = "blue", 
                       mid = "lightyellow", 
                       high = "red") +
  coord_sf(expand = TRUE) +
  facet_wrap(facets = vars(variable), ncol = 2) + 
  theme_void() +
  theme(strip.text.x = element_text(size=12, face="bold"))

Change color palette


One may also change the palette with what we learned from last class:

library(RColorBrewer) 

ggplot() +
  geom_raster(data = tempanom_df, aes(x = x, 
                                      y = y, 
                                      fill = value)) +
  geom_sf(data = states_sf, 
          fill = NA,
          color = "grey50", 
          size = 0.25) +
    scale_fill_distiller(name = "Degree C", 
                       palette = "Spectral") +
  coord_sf(expand = TRUE) +
  facet_wrap(facets = vars(variable), ncol = 2) + 
  theme_void() +
  theme(strip.text.x = element_text(size=12, face="bold"))

Quick plotting and summarizing functions


The terra package has methods for many of the base R plotting functions. This means that a SpatRaster object can be provided as an input to these functions, and R will be able to generate an appropriate map or graph. Below are a few examples.

plot(tempstack[[1]])

plot(tempstack)

hist(tempstack)

Lab Homework (Required)


  • Convert the clipped LST raster from Celsius to Fahrenheit and generate an updated version of the Georgia LST map.

  • Generate a new raster where each pixel contains the standard deviation of the NLDAS July Temperature values from 2012-2017. Map the result to explore where temperatures were most variable over this period.

(Optional) Raster Geospatial Data - Discrete


To use raster data to visualize terrain, we need to study the following chapter of the textbook:

https://bookdown.org/mcwimberly/gdswr-book/raster-geospatial-data---discrete.html