Data Visualization and Geospatial Analysis With R

Large-scale Geospatial Analysis (2023)


Taking examples from global satellite data in gridded/raster format, we will demonstrate several common geospatial operations like projections, resampling, cropping, masking, spatial extraction etc. using rasters, shapefiles, and spatial data frames. For a seamless analysis across different datatypes and platforms, conversion from/to different data formats like data frames, matrices, rasters, and structured data like NetCDF will be discussed. Advanced topics will include working with data cubes (SpatRaster), layer-wise operations on data cubes, cell-wise operations on raster time series by implementing user-defined functions with global, app, tapp and lapp. Parallel implementation of custom functions will be demonstrated for large-scale datasets.


Course level: Advanced. Prior experience in R required.


Materials: For this exercise, we will use global dataset from two NASA satellites- MODIS for NDVI and SMAP for surface (~5cm) soil moisture. We will use global aridity estimates (from [Global Aridity and PET Database] (https://cgiarcsi.community/data/global-aridity-and-pet-database/) as a general climate classification (Hyper-arid, arid, semi-arid, sub-humid and humid).

The codes are tested on R version 4.2.3- Shortstop Beagle and version 4.1.2 - Bird Hippie. To update, see section 5.2.


CHAPTER 0. Import libraries and sample dataset


We will begin by loading necessary libraries. The sample dataset for this excercise can be downloaded manually from GitHub by accessing https://github.com/Vinit-Sehgal/SampleData


Alternatively, use the following code to download and extract the sample data from GitHub repository.

Install all necessary packages (Run once).

###############################################################
#~~~ Load required libraries
lib_names=c("terra", "tidyterra", "cetcolor", "scico", "tmap",    
            "gifski", "lubridate","Rcpp",
            "raster","ggplot2","unikn","mapview",
            "gridExtra","rgdal","fields",
            "RColorBrewer","ncdf4","rasterVis",
            "rcartocolor","pacman","purrr","moments","tictoc", 
            "sf", "sp", "exactextractr","readxl", 
            "snow","future.apply","parallel")
# Load necessary packages
invisible(suppressMessages
          (suppressWarnings
            (lapply
              (lib_names, require, character.only = T))))

# An easy way to load multiple packages is through pacman::p_load
# pacman::p_load("raster","ggplot2","unikn","mapview",
#             "gridExtra","rgdal","fields",
#             "RColorBrewer","ncdf4","rasterVis", "moments", "tictoc", "tibble")

# Update packages if they are already installed
# update.package(ask = FALSE)

Note: The legacy R spatial infrastructure packages - maptools, rgdal and rgeos have been archived by CRAN from October 16, 2023; these retired packages will continue to be available as source packages on https://cran.r-project.org/src/contrib/Archive but won’t undergo any further development.
We will now download the workshop repository, which contains all data we will use for this exercise.

###############################################################
#~~~ Import sample data from GitHub repository
download.file(url = "https://github.com/Vinit-Sehgal/SampleData/archive/master.zip",
destfile = "SampleData-master.zip")    # Download ".Zip"

# Unzip the downloaded .zip file
unzip(zipfile = "SampleData-master.zip")
# getwd()                           # Current working directory
list.files("./SampleData-master")   # List folder contents
##  [1] "CMIP_land"                               
##  [2] "functions"                               
##  [3] "images"                                  
##  [4] "Largescale_geospatial_analysis_2022.html"
##  [5] "location_points.xlsx"                    
##  [6] "ne_10m_coastline"                        
##  [7] "raster_files"                            
##  [8] "README.md"                               
##  [9] "sample_pdfs"                             
## [10] "SMAP_L3_USA.nc"                          
## [11] "SMAPL4_H5"                               
## [12] "SMOS_nc"                                 
## [13] "USA_states"                              
## [14] "Workbook_DVGAR21-Part1.html"             
## [15] "Workbook_DVGAR21-Part2.html"

CHAPTER 1. Raster and shapefile visualization


A geographic information system, or GIS refers to a platform which can map, analyzes and manipulate geographically referenced dataset. A geographically referenced data (or geo-referenced data) is a spatial dataset which can be related to a point on Earth with the help of geographic coordinates. Types of geo-referenced spatial data include: rasters (grids of regularly sized pixels) and vectors (polygons, lines, points).



A quick and helpful review of spatial data can be found here: https://spatialvision.com.au/blog-raster-and-vector-data-in-gis/

1.1. Plotting raster data

In this section, we will plot global raster data of surface (~5 cm) soil moisture from SMAP. In this process we will explore functions from terra, and tidyterra packages.

Let’s start by importing the necessary packages.

# For raster operations
library(terra)

# For plotting operations
library(tidyterra) 
library(tmap)
library(ggplot2)
library(mapview)  

# For Perceptually Uniform Colour palettes
library(cetcolor)
library(scico)
 
# Import SMAP soil moisture raster from the downloaded folder
sm=terra::rast("./SampleData-master/raster_files/SMAP_SM.tif")

Once we have imported the Spatraster using rast() function from terra package, let’s now check its attributes. Notice the dimensions, resolution, extent, crs i.e. coordinate reference system and values. Note that the cell of one raster layer can only hold a single value. The value might be numeric or categorical!

print(sm)
## class       : SpatRaster 
## dimensions  : 456, 964, 1  (nrow, ncol, nlyr)
## resolution  : 0.373444, 0.373444  (x, y)
## extent      : -180, 180, -85.24595, 85.0445  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : SMAP_SM.tif 
## name        :    SMAP_SM 
## min value   : 0.01999998 
## max value   : 0.87667608
# Try:
# dim(sm)   # Dimension (nrow, ncol, nlyr) of the raster
# terra::res(sm)   # X-Y resolution of the raster
# terra::ext(sm)   # Spatial extent of the raster
# terra::crs(sm)   # Coordinate reference system

Now let’s plot the raster using terra::plot. Interactive plots can be made by using mapview function.

# Basic Raster plot
terra::plot(sm, main = "Soil Moisture")

What are the different features of the above plot?

Expert Note: terra’s functionality is largely the same as the more mature raster package (created by the same developer, Robert Hijmans), but are usually more computationally efficient than raster equivalents. However, one can seamlessly translate between the two types of object to ensure backwards compatibility with older scripts and packages, for example, with the functions raster(), stack(), and brick() in the raster package.

1.2. Customizing terra plot options

1.2.1 Scientific Color palettes

We will generate custom color palettes for better visualization. We will demonstrate the usage of cetcolor and scico package which provide access to the perceptually uniform and colour-blindness friendly palettes.


  1. You can select CET colormaps from: https://cran.r-project.org/web/packages/cetcolor/vignettes/cet_color_schemes.html
  2. You can select scico colormaps from: https://github.com/thomasp85/scico
  3. R Color Brewer is also a great resource for popular colormaps: https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3
# Make custom color palette
library(unikn) 

#~~ A) User defined color palette using scico library
mypal1 = scico(20, alpha = 1.0, direction =  -1, palette = "vik")
unikn::seecol(mypal1)

#Check: scico_palette_names() for available palettes!
# Try combinations of alpha=0.5, direction =1, and various different color palette

#~~ B) User defined color palette using cetcolor library
mypal2 = cetcolor::cet_pal(20, name = "r2")  
unikn::seecol(mypal2)

# Or reverse color pal
mypal2 = rev(cetcolor::cet_pal(20, name = "r2") ) 
unikn::seecol(mypal2)

1.2.2 Customize terra plots

There is a long list of customizable operations available while plotting rasters in R. Let’s play with some of these options. We will start with basic plot from terra, and then venture into more powerful tmap and tidyterra packages. Try horizontal=TRUE, interpolate=FALSE, change xlim=c(-180, 180) with asp=1, or try legend.shrink=0.4.

sm=rast("./SampleData-master/raster_files/SMAP_SM.tif") # SMAP soil moisture data

terra::plot(sm,
            main = "Scientific Plot of Raster",
            
            #Color options
            col = mypal2,                    # User Defined Color Palette
            breaks = seq(0, 1, by=0.1),      # Sequence from 0-1 with 0.1 increment
            colNA = "lightgray",             # Color of cells with NA values
            
            # Axis options      
            axes=TRUE,                       # Plot axes: TRUE/ FALSE
            xlim=c(-180, 180),               # X-axis limit
            ylim=c(-90, 90),                 # Y-axis limit
            xlab="Longitude",                # X-axis label
            ylab="Latitue",                  # Y-axis label
            
            # Legend options      
            legend=TRUE,                     # Plot legend: TRUE/ FALSE
            
            # Miscellaneous
            mar = c(3.1, 3.1, 2.1, 7.1),     #Margins
            grid = FALSE                     #Add grid lines
        )

#Plotting through tmap:
tmap_mode("plot")  #Setting tmap mode: Static plots by "plot", Interactive plots by"view"

tmap_SM = tm_shape(sm)+
  tm_grid(alpha = 0.2)+
  tm_raster(alpha = 0.7, palette = mypal2, 
            style = "pretty", title = "Volumetric Soil Moisture")+
  tm_layout(legend.position = c("left", "bottom"))+
  tm_xlab("Longitude")+ tm_ylab("Latitude")

tmap_SM

To convert the static plot into an interactive map we will use mapview package which which is compatible with rasterbrick.

# Interactive plot
library(mapview)
library(raster)

mapview(brick(sm),            # Convert SpatRaster to RasterBrick
        col.regions = mypal2, # Color palette 
        at=seq(0, 0.8, 0.1)   # Breaks
        )

1.3. Plotting raster data using tidyterra

tidyterra is a package that add common methods from the tidyverse for SpatRaster and SpatVectors objects created with the {terra} package. It also adds specific geom_spat*() functions for plotting these kind of objects with {ggplot2}.

Note on Performance: tidyterra is conceived as a user-friendly wrapper of {terra} using the {tidyverse} methods and verbs. This approach therefore has a cost in terms of performance.

library(tidyterra)
library(ggplot2)

ggplot() +
  geom_spatraster(data = sm) +
  scale_fill_gradientn(colors=mypal2,                               # Use user-defined colormap
                       name = "SM",                                 # Name of the colorbar
                       na.value = "transparent",                    # transparent NA cells
                       labels=(c("0", "0.2", "0.4", "0.6", "0.8")), # Labels of colorbar
                       breaks=seq(0,0.8,by=0.2),                    # Set breaks of colorbar
                       limits=c(0,0.8))+
  theme_void()  # Try different themes: theme_bw(), theme_gray(), theme_minimal()

What if we are interested in a particular region and not the entire globe? We can plot the map for a specific extent (CONUS, in this case) by changing the range of coord_sfoption. We will also use a different theme: theme_bw. Try xlim = c(114,153) and ylim = c(-43,-11)!

sm_conus= ggplot() +
  geom_spatraster(data = sm) +
  scale_fill_gradientn(colors=mypal2,                               # Use user-defined colormap
                       name = "SM",                                 # Name of the colorbar
                       na.value = "transparent",                    # transparent NA cells
                       labels=(c("0", "0.2", "0.4", "0.6", "0.8")), # Labels of colorbar
                       breaks=seq(0,0.8,by=0.2),                    # Set breaks of colorbar
                       limits=c(0,0.8)) +
  coord_sf(xlim = c(-125,-67),                                      # Add extent for CONUS
              ylim = c(24,50))+               
  theme_bw()                                                        # Try black-and-white theme. 

print(sm_conus)

tmap provides the easiest way of manipulating the legends from continuous to discrete by just adding the style of color scale desired by the user.

#We will manipulate the existing tmap_SM plot by adding style parameter:
tmap_SM = tm_shape(sm)+
  tm_grid(alpha = 0.2)+
  tm_raster(alpha = 0.7, style = "pretty", 
            palette = mypal2,  title = "Volumetric Soil Moisture") +
  tm_layout(legend.position = c("left","bottom"), inner.margins = 0)+           #Adjust the legend position
   tm_xlab("Longitude")+ tm_ylab("Latitude")  

tmap_SM

#Using breaks would give more control on scale discretization
tm_shape(sm)+
  tm_grid(alpha = 0.2)+
  tm_raster(alpha = 0.7, breaks = c(0.00, 0.25, 0.50, 0.75, 1.00), 
            palette = mypal2,  title = "Volumetric Soil Moisture") +
  tm_layout(legend.outside = TRUE,
            legend.outside.position = c("right","top"),
            inner.margins = 0)+
  tm_xlab("Longitude")+ tm_ylab("Latitude")

# Saving the plot to disk:
tmap_save(tmap_SM,                    # tmap object
          filename = "sm_world.png",  # filename including extension
          dpi = 300,                  # dots per inch
          height = 5,
          width = 5,
          units = "in")               #units for width and height

1.4. Plotting vector data

Importing and plotting shapefiles is equally easy in R. We will import the shapefile of the updated global IPCC climate reference regions (https://doi.org/10.5194/essd-12-2959-2020) as Simple Feature (sf) Object. We will also use global coastline shapefile from the web for plotting.

Note: Even though terra provides vect() function to handle vector data, sf package is most suitable and powerful for manipulating and plotting purposes.

###############################################################
#~~~ PART 1.4.1: Importing and visualizing shapefiles
library(sf)  

# Import the shapefile of global IPCC climate reference regions (only for land) 
IPCC_shp = read_sf("./SampleData-master/CMIP_land/CMIP_land.shp")

# View attribute table of the shapefile
IPCC_shp # Notice the attributes look like a data frame
## Simple feature collection with 41 features and 4 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -168 ymin: -56 xmax: 180 ymax: 85
## Geodetic CRS:  WGS 84
## # A tibble: 41 × 5
##    V1              V2                V3       V4                        geometry
##    <chr>           <chr>             <chr> <dbl>                   <POLYGON [°]>
##  1 ARCTIC          Greenland/Iceland GIC       1 ((-10 58, -10.43956 58, -10.87…
##  2 NORTH-AMERICA   N.E.Canada        NEC       2 ((-55 50, -55.4386 50, -55.877…
##  3 NORTH-AMERICA   C.North-America   CNA       3 ((-90 50, -90 49.5614, -90 49.…
##  4 NORTH-AMERICA   E.North-America   ENA       4 ((-70 25, -70.43478 25, -70.86…
##  5 NORTH-AMERICA   N.W.North-America NWN       5 ((-105 50, -105.4386 50, -105.…
##  6 NORTH-AMERICA   W.North-America   WNA       6 ((-130 50, -129.5614 50, -129.…
##  7 CENTRAL-AMERICA N.Central-America NCA       7 ((-90 25, -90.37179 24.76923, …
##  8 CENTRAL-AMERICA S.Central-America SCA       8 ((-75 12, -75.28 11.67333, -75…
##  9 CENTRAL-AMERICA Caribbean         CAR       9 ((-75 12, -75.32609 12.28261, …
## 10 SOUTH-AMERICA   N.W.South-America NWS      10 ((-75 12, -74.57143 12, -74.14…
## # ℹ 31 more rows
# Load global coastline shapefile 
coastlines = read_sf("./SampleData-master/ne_10m_coastline/ne_10m_coastline.shp")

# Alternatively, download global coastlines from the web 
# NOTE: May not work if the online server is down
# download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_coastline.zip?version=4.0.1",destfile = 'ne_110m_coastline.zip')
# # Unzip the downloaded file
# unzip(zipfile = "ne_110m_coastline.zip",exdir = 'ne-coastlines-110m')

# Plot both sf objects using tmap:
tm_shape(IPCC_shp)+
  tm_borders()+            # Add IPCC land regions in blue color
  tm_shape(coastlines)+
  tm_sf()                  # Add global coastline

#Subset shapefile for Eastern North-America
terra::plot(IPCC_shp[4,][1], main="Polygon for Eastern North-America")

# Combine terra plots with overlaying shapefiles
tmap_SM + 
  tm_shape(IPCC_shp)+
  tm_borders()+            
  tm_shape(coastlines)+
  tm_sf()

###############################################################
#~~~ PART 1.4.2: Add spatial point to shapefile/ raster

#~~ Create a spatial point for College Station, Texas
college_station = st_sfc(
                  st_point(x = c(-96.33, 30.62), dim = "XY"), # Lat-long as spatial points 
                  crs = "EPSG:4326")  # Coordinate system: More details in the next part

#~~ Create map by adding all the layers
tm_shape(IPCC_shp[c(3,4,6,7),])+                       # Selected regions from 'IPCC_shp'
  tm_borders(col = "black",lwd = 1, lty = "solid")+    # Border color
  tm_fill(col = "lightgrey")+                          # Fill color
  tm_shape(coastlines)+                                # Add coastline
  tm_sf(col = "maroon")+                               # Change color of coastline
  tm_shape(college_station)+                           # Add spatial point to the map
  tm_dots(size = 2, col = "blue")                      # Customize point

Note: Each layer (here 3) needs to be added as with tm_shape()!

1.5. Reprojection of rasters using terra::project

A coordinate reference system (CRS) is used to relate locations on Earth (which is a 3-D spheroid) to a 2-D projected map using coordinates (for example latitude and longitude). Projected CRSs are usually expressed in Easting and Northing (x and y) values corresponding to long-lat values in Geographic CRS. A good description of coordinate reference systems and their importance can be found here: https://docs.qgis.org/3.4/en/docs/gentle_gis_introduction/coordinate_reference_systems.html https://datacarpentry.org/organization-geospatial/03-crs/



In R, the coordinate reference systems or CRS are commonly specified in EPSG (European Petroleum Survey Group) or PROJ4 format (See: https://epsg.io/).
The Spatraster reprojection process is done with project() from the terra package.

# Importing SMAP soil moisture data
sm=rast("./SampleData-master/raster_files/SMAP_SM.tif") 

#~~ Projection 1: NAD83 (EPSG: 4269)

sm_proj1 = terra::project(sm, "epsg:4269")

terra::plot(sm_proj1, 
             main = "NAD83",    # Title of the plot
             col = mypal2,      # Colormap for the plot
             axes = FALSE,      # Disable axes
             box = FALSE,       # Disable box around the plots
             asp = NA,          # No fixed aspect ratio; asp=NA fills plot to window
             legend=FALSE)      # Disable legend

#~~ Projection 2: World Robinson projection (ESRI:54030)

sm_proj2 = terra::project(sm, "ESRI:54030")

terra::plot(sm_proj2, 
             main = "Robinson", # Title of the plot
             col = mypal2,      # Colormap for the plot
             axes = FALSE,      # Disable axes
             box = FALSE,       # Disable box around the plots
             asp = NA,          # No fixed aspect ratio; asp=NA fills plot to window
             legend=FALSE)      # Disable legend

Other than the projections demonstrated, try the following:

"+init=epsg:3857" For Mercator (EPSG: 3857): Used in Google Maps, Open Street Maps, etc.
"+proj=robin +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs" for World Robinson Projection
"+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs" for World Mercator projection

We will use a custom function provided with the material to plot the global raster in Robinson projection.

#~~~Projection 2: Robinson projection

WorldSHP=terra::vect(spData::world)

RobinsonPlot <- ggplot() +
  geom_spatraster(data = sm)+                   # Plot SpatRaster layer               
  geom_spatvector(data = WorldSHP, 
                  fill = "transparent") +       # Add world political map
  ggtitle("Robinson Projection") +              # Add title
  scale_fill_gradientn(colors=mypal2,           # Use user-defined colormap
                       name = "Soil Moisture",  # Name of the colorbar
                       na.value = "transparent",# Set color for NA values
                       lim=c(0,0.8))+           # Z axis limit
  theme_minimal()+                              # Select theme. Try 'theme_void'
  theme(plot.title = element_text(hjust =0.5),  # Place title in the middle of the plot
        text = element_text(size = 12))+        # Adjust plot text size for visibility
  coord_sf(crs = "ESRI:54030",                  # Reproject to World Robinson
           xlim = c(-152,152)*100000,    
           ylim = c(-55,90)*100000)

print(RobinsonPlot)

Now let’s plot the robison plot using tmap:

WorldSHP = st_as_sf(WorldSHP)           # Convert 'WorldSHP' to simple feature

tm_shape(WorldSHP,                      # Initiate shapefile       
         projection = 'ESRI:54030',     # Set projection: World Robinson
         ylim = c(-65, 90)*100000,      # Set y-limit
         xlim = c(-152,152)*100000,     # Set x-limit
         raster.warp = TRUE)+              
  tm_sf()+                              # Plot shapefile 
  tm_shape(sm,                          # Add raster file
           projection = 'ESRI:54030',   # Set projection 
           raster.warp = FALSE) +
  tm_raster( palette = mypal2,          # Set color map for raster
             title = "Soil Moisture")+  # Add plot title
  tm_layout(main.title = "Surface Soil Moisture",
            main.title.fontfamily = "Times",               # Set text font
            legend.show = T,                               # Show legend= T/F
            legend.outside = T,                            
            legend.outside.position = c("right", "top"),   # Legend position
            frame = FALSE,                                 # Add plot frame
            earth.boundary.color = "grey",                 # Boundary color
            earth.boundary.lwd = 2,                        # Boundary linewidth
            fontfamily = "Times")+                         # Text font
  tm_graticules(alpha = 0.2,                               # Add lat-long graticules
                labels.inside.frame = FALSE,  
                col = "lightgrey", n.x = 4, n.y = 3)

Useful references: More excellent examples on making maps in R can be found here: https://bookdown.org/nicohahn/making_maps_with_r5/docs/introduction.html. Quintessential resource for reference on charts and plots in R: https://www.r-graph-gallery.com/index.html.


CHAPTER 2. Geospatial operations on raster/vector data


2.1. Raster resampling

We will demonstrate how to change the resolution of Spatraster files using terra::aggregate (fine to coarse), terra::disagg (coarse to fine), and resampling values from one raster to another using terra::resample function. We will use global aridity and soil moisture Spatrasters for this purpose.

# Importing SMAP soil moisture data
sm=rast("./SampleData-master/raster_files/SMAP_SM.tif") 

# Original resoluton of raster for reference
res(sm)
## [1] 0.373444 0.373444
#~~ Aggregate raster to coarser resolution
SMcoarse = terra::aggregate(sm,           # Soil moisture raster
                            fact = 10,    # Aggregate by x 10
                            fun = mean)   # Function used to aggregate values
res(SMcoarse)
## [1] 3.73444 3.73444
#~~ Disaggregate raster to finer resolution
SMfine = terra::disagg(sm, 
                   fact=3, 
                   method='bilinear')
res(SMfine)
## [1] 0.1244813 0.1244813
#~~ Raster resampling
# Import global aridity raster
aridity=rast("./SampleData-master/raster_files/aridity_36km.tif") 

# Plot aridity map
terra::plot(aridity, col=mypal2, main= "Aridity Spatraster")

# Resample aridity raster to coarse resolution  
aridityResamp=terra::resample(aridity,      # Original raster
                       SMcoarse,     # Target resolution raster
                       method='ngb') # bilinear or ngb (nearest neighbor) 

# Plot resampled aridity map
terra::plot(aridityResamp, col=mypal2, main= "Aridity at Coarser Resolution")

2.2. Raster summary statistics

Arithmetic operations a.k.a. arith-generic (+, -, *, /, ^, %%, %/%) on Spatrasters closely resemble simple vector-like operations. More details on arith-generic can be found here: https://rdrr.io/cran/terra/man/arith-generic.html.
We will use global function to apply summary statistics and user-defined operations on cells of a raster.

# Simple arithmetic operations
sm2=sm*2
print(sm2) # Try sm2=sm*10, or sm2=sm^2 and see the difference in sm2 values
## class       : SpatRaster 
## dimensions  : 456, 964, 1  (nrow, ncol, nlyr)
## resolution  : 0.373444, 0.373444  (x, y)
## extent      : -180, 180, -85.24595, 85.0445  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source(s)   : memory
## varname     : SMAP_SM 
## name        :    SMAP_SM 
## min value   : 0.03999996 
## max value   : 1.75335217
# Summary statistics
global(sm, mean, na.rm = T)
##             mean
## SMAP_SM 0.209402
global(sm, sd, na.rm = T)
##                sd
## SMAP_SM 0.1434444
global(sm, quantile, probs = c(0.25, 0.75), na.rm = T)
##               X25.      X75.
## SMAP_SM 0.09521707 0.2922016
# User-defined statistics by defining own function
quant_fun = function(x, na.rm=TRUE){ # Remember to add "na.rm" option
  quantile(x, probs = c(0.25, 0.75), na.rm=TRUE)
} 
global(sm, quant_fun)   # 25th, and 75th percentile of each layer
##               X25.      X75.
## SMAP_SM 0.09521707 0.2922016

Note: With a multi-layered raster object, global will summarize each layer separately.

2.3. Summarizing rasters using shapefles

Let’s explore using a spatial polygon/shapefile for summarizing a raster (in this case, global SMAP soil moisture) by using extract function from the terra library. We will also transform global aridity raster to a polygon using as.polygons and st_as_sf functions to find the mean soil moisture values for each aridity class.

First, we will use the IPCC shapefile to summarize the soil moisture raster.

###############################################################
#~~~ PART 2.3.1: Using shapefile to summarize a raster
sm_IPCC_df=terra::extract(sm,        # Spatraster to be summarized
                          vect(IPCC_shp),  # Shapefile/ polygon to summarize the raster
                          #df=TRUE,   # Gives the summary statistics as a dataframe
                          fun=mean,  # Desired statistic: mean, sum, min and max 
                          na.rm=TRUE)# Ignore NA values? TRUE=yes! 

head(sm_IPCC_df)
##   ID   SMAP_SM
## 1  1 0.2545766
## 2  2 0.3839087
## 3  3 0.2345457
## 4  4 0.3788003
## 5  5 0.2383580
## 6  6 0.1475145
###############################################################
#~~~ PART 2.3.2: Extract cell values for each region 
sm_IPCC_list=terra::extract(sm,       # Raster to be summarized
                          vect(IPCC_shp),   # Shapefile/ polygon to summarize the raster
                          df=FALSE,   # Returns a list
                          fun=NULL,   # fun=NULL will output cell values within each region
                          na.rm=TRUE) # Ignore NA values? yes! 

# Apply function on cell values for each region
library(tidyverse)

sm_IPCC_list %>% 
  as_tibble() %>% 
  group_by(ID) %>% 
  summarise(mean_SM = mean(SMAP_SM, na.rm =T))
## # A tibble: 41 × 2
##       ID mean_SM
##    <dbl>   <dbl>
##  1     1   0.255
##  2     2   0.384
##  3     3   0.235
##  4     4   0.379
##  5     5   0.238
##  6     6   0.148
##  7     7   0.110
##  8     8   0.336
##  9     9   0.399
## 10    10   0.320
## # ℹ 31 more rows
#~~ Try user defined function
myfun=function (y){return(mean(y, na.rm=TRUE))}    # User defined function for calculating means

sm_IPCC_list %>% 
  as_tibble() %>% 
  group_by(ID) %>% 
  summarise(mean_SM = myfun(SMAP_SM))  
## # A tibble: 41 × 2
##       ID mean_SM
##    <dbl>   <dbl>
##  1     1   0.255
##  2     2   0.384
##  3     3   0.235
##  4     4   0.379
##  5     5   0.238
##  6     6   0.148
##  7     7   0.110
##  8     8   0.336
##  9     9   0.399
## 10    10   0.320
## # ℹ 31 more rows
# Is this the same as the previous result?

2.4. DIY: Summarize raster using classified raster

In the next example, we will convert global aridity raster into a polygon based on aridity classification using as.polygons and st_as_sf functions.
Global aridity raster has 5 classes with 5 indicating humid and 1 indicating hyper-arid climate. We will use this polygon to extract values from the Spatraster and summarize soil moisture for each aridity class.

###############################################################
#~~~ PART 2.4.1: Convert a raster to a shapefile
aridity=rast("./SampleData-master/raster_files/aridity_36km.tif") #Global aridity

# Convert raster to shapefile
arid_poly=st_as_sf(as.polygons(aridity))   # Convert SpatRaster to polygon and then to sf

# Plot aridity polygon
terra::plot(arid_poly, 
     col=arid_poly$aridity_36km)  # Colors based on aridity values (i.e. 1,2,3,4,5)

Summarize values of SMAP soil moisture raster for aridity classes:

sm_arid_df=terra::extract(sm,        # Raster to be summarized
                          vect(arid_poly), # Shapefile/ polygon to summarize the raster
                          #df=TRUE,   # Gives the summary statistics as a dataframe
                          fun=mean,  # Desired statistic: mean, sum, min and max 
                          na.rm=TRUE)# Ignore NA values? yes! 

# Lets plot the climate-wise mean of surface soil moisture
{plot(sm_arid_df,     
     xaxt = "n",              # Disable x-tick labels
     xlab="Aridity",          # X axis label
     ylab="Soil moisture",    # Y axis label
     type="b",                # line type
     col="blue",              # Line color
     main="Climate-wise mean of surface soil moisture")
axis(1, at=1:5, labels=c("Hyper-arid", "Arid", "Semi-Arid","Sub-humid","Humid"))}


CHAPTER 3. Data-cubes or SpatRaster


Whats is a Spatraster?

A SpatRaster represents multi-layer (multi-variable) raster data. It always stores a number of fundamental parameters describing its geometry. These include the number of columns and rows, the spatial extent, and the Coordinate Reference System.


In addition, a SpatRaster can store information about the file in which the raster cell values are stored (equivalent to Raster Stack). Or, if there is no such file, a SpatRaster can hold the cell values in memory (equivalent to Raster Brick from the older raster package).

Note: Raster operations on cell values stored in memory are usually faster but computationally intensive.

3.1. Create, subset and visualize SpatRaster

We will create multilayer SpatRaster for NDVI and SMAP SM from the sample dataset and demonstrate several applications and operations. Working with SpatRaster is very similar to working with regular arrays or lists.

###############################################################
#~~~ PART 3.1.1: Create and plot NDVI SpatRaster
library(terra)

# Location of the NDVI raster files
ndvi_path="./SampleData-master/raster_files/NDVI/" #Specify location of NDVI rasters

# List of all NDVI rasters
ras_path=list.files(ndvi_path,pattern='*.tif',full.names=TRUE)
head(ras_path)
## [1] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-01-01.tif"
## [2] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-01-17.tif"
## [3] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-02-02.tif"
## [4] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-02-18.tif"
## [5] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-03-05.tif"
## [6] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-03-21.tif"

Let’s import NDVI rasters from the location and store as a 3D data cube. We will use lapply, map, and for loop using addlayer function to import rasters from paths stored ras_list. We will also learn how to add shapefiles to the background of all plots made using RasterBrick/Stack.

# Method 1: Use lapply to create raster layer list from the raster location
ras_list = lapply(paste(ras_path, sep = ""), rast)
#This a list of 23 raster objects stored as individual elements.

#Convert raster layer lists to data cube/Spatraster 
ras_stack = rast(ras_list)     # Stacking all rasters as a data cube!!! 
# This a multi-layer (23 layers in this case) SpatRaster Object.

#inMemory() reports whether the raster data is stored in memory or on disk.
inMemory(ras_stack[[1]]) 
## [1] FALSE
# Method 2: Using pipes to create raster layers from the raster location
library(tidyr) # For piping
ras_list = ras_path %>%  purrr::map(~ rast(.x))  # Import the raster
ras_stack = rast(ras_list)  # Convert RasterStack to RasterBrick

# Method 3: Using for loop to create raster layers from the raster location
ras_stack=rast()
for (nRun in 1:length(ras_path)){
  ras_stack=c(ras_stack,rast(ras_path[[nRun]]))
}

# Check dimension of data cube
dim(ras_stack) #[x: y: z]- 23 raster layers with 456 x 964 cells
## [1] 456 964  23
# Number of layers in raster brick
nlyr(ras_stack)
## [1] 23
# Check variable names 
names(ras_stack)
##  [1] "NDVI_resamp_2016-01-01" "NDVI_resamp_2016-01-17" "NDVI_resamp_2016-02-02"
##  [4] "NDVI_resamp_2016-02-18" "NDVI_resamp_2016-03-05" "NDVI_resamp_2016-03-21"
##  [7] "NDVI_resamp_2016-04-06" "NDVI_resamp_2016-04-22" "NDVI_resamp_2016-05-08"
## [10] "NDVI_resamp_2016-05-24" "NDVI_resamp_2016-06-09" "NDVI_resamp_2016-06-25"
## [13] "NDVI_resamp_2016-07-11" "NDVI_resamp_2016-07-27" "NDVI_resamp_2016-08-12"
## [16] "NDVI_resamp_2016-08-28" "NDVI_resamp_2016-09-13" "NDVI_resamp_2016-09-29"
## [19] "NDVI_resamp_2016-10-15" "NDVI_resamp_2016-10-31" "NDVI_resamp_2016-11-16"
## [22] "NDVI_resamp_2016-12-02" "NDVI_resamp_2016-12-18"
# Subset raster stack/brick (notice the double [[]] bracket and similarity to lists)
sub_ras_stack=ras_stack[[c(1,3,5,10,12)]] #Select 1st, 3rd, 5th, 10th and 12th layers

#Try subsetting with dates:
Season<-str_subset(string = names(ras_stack), pattern = c("20..-0[3/4/5]")) 
ras_stack[[Season]]
## class       : SpatRaster 
## dimensions  : 456, 964, 6  (nrow, ncol, nlyr)
## resolution  : 0.373444, 0.373444  (x, y)
## extent      : -180, 180, -85.24595, 85.0445  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## sources     : NDVI_resamp_2016-03-05.tif  
##               NDVI_resamp_2016-03-21.tif  
##               NDVI_resamp_2016-04-06.tif  
##               ... and 3 more source(s)
## names       : NDVI_~03-05, NDVI_~03-21, NDVI_~04-06, NDVI_~04-22, NDVI_~05-08, NDVI_~05-24 
## min values  :  -0.2000000,   -0.175517,  -0.1699783,  -0.1848017,   -0.199075,  -0.1863334 
## max values  :   0.9168695,    0.909500,   0.9042087,   0.9123000,    0.944600,   0.8917526
#~~ Plot first 4 elements of NDVI SpatRaster
# Function to add shapefile to a raster plot
addCoastlines=function(){
  plot(vect(coastlines), add=TRUE)   #convert 'coastlines' to vector format
  } 

terra::plot(sub_ras_stack[[1:4]],  # Select raster layers to plot
     col=mypal2,                   # Specify colormap
     asp=NA,                       # Aspect ratio= NA
     fun=addCoastlines)            # Add coastline to each layer

Expert Note: For memory intensive rasters to brick formation, the “for” loop method may be better to avoid memory bottleneck.

3.2. Geospatial operations on SpatRaster

In this section, we will demonstrate application of several operations on SpatRaster for e.g. extracting time series at user-defined locations, spatial data extraction using spatial polygons/raster, crop and masking of SpatRaster.

3.2.1. Time-series extraction from SpatRaster

Raster extraction is the process of identifying and returning the values associated with a ‘target’ raster at specific locations, based on a vector object. The results depend on the type of vector object used (points, lines or polygons) and arguments passed to the terra::extract() function.

We will demonstrate two ways of extracting time series information from SpatRaster.

Method 1: Using terra::extract function to extract time series data for specific location(s)

###############################################################
#~~~~ Method 1.1: Extract values for a single location

# User defined lat and long
Long=-96.33; Lat=30.62 
# Creating a spatial vector object from the location coords
college_station = vect(SpatialPoints(cbind(Long, Lat)))


# Extract time series for the location
ndvi_val=terra::extract(ras_stack,
                 college_station,    # lat-long as spatial locations
                 method='bilinear')  # or "simple" for nearest neighbor

# Create a dataframe using the dates (derived from raster layer names) and extracted values
ndvi_ts=data.frame(Time=c(1:nlyr(ras_stack)), # Sequence of retrieval time
                   NDVI=as.numeric(ndvi_val[,-1]))     #NDVI values
# Try changing Time to as.Date(substr(colnames(ndvi_val),13,22), format = "%Y.%m.%d")

# Plot NDVI time series extracted from raster brick/stack
plot(ndvi_ts, type="l", col="maroon", ylim=c(0.2,0.8))

Let us now extract values for multiple locations.

#~~~~ Method 1.2:  Extract values for multiple locations
# Import sample locations from contrasting hydroclimate
library(readxl)
loc= read_excel("./SampleData-master/location_points.xlsx")
print(loc)
## # A tibble: 3 × 4
##   Aridity   State     Longitude Latitude
##   <chr>     <chr>         <dbl>    <dbl>
## 1 Humid     Louisiana     -92.7     34.3
## 2 Arid      Nevada       -116.      38.7
## 3 Semi-arid Kansas        -99.8     38.8
# Extract time series using "raster::extract"
loc_ndvi=terra::extract(ras_stack,
                         #2-column matrix or data.frame with lat-long
                         loc[,3:4],   
                         # Use bilinear interpolation (or simple) option
                         method='bilinear')

The rows in loc_ndvi variable corresponds to the points specified by the user, namely humid, arid and semi-arid locations. Let’s plot the extracted NDVI for three locations to look for patterns (influence on climate on vegetation).

# Create a new data frame with dates of retrieval and NDVI for different hydroclimates
library(lubridate)
ndvi_locs = data.frame(Date=ymd(substr(colnames(ndvi_val[,-1]),13,22)),
                      Humid = as.numeric(loc_ndvi[1,-1]),    #Location 1
                      Arid = as.numeric(loc_ndvi[2,-1]),     #Location 2
                      SemiArid = as.numeric(loc_ndvi[3,-1])) #Location 3

# Convert data frame in "long" format for plotting using ggplot
library(tidyr)
df_long=ndvi_locs %>% gather(Climate,NDVI,-Date)
#"-" sign indicates decreasing order
head(df_long,n=4)
##         Date Climate      NDVI
## 1 2016-01-01   Humid 0.5919486
## 2 2016-01-17   Humid 0.5762772
## 3 2016-02-02   Humid 0.5570874
## 4 2016-02-18   Humid 0.5752835
# Plot NDVI for different locations (hydroclimates). Do we see any pattern?? 
library(ggplot2)
l = ggplot(df_long,               # Dataframe with input Dataset 
          aes(x=Date,             # X-variable
              y=NDVI,             # Y-variable
              group=Climate)) +   # Group by climate 
  geom_line(aes(color=Climate))+  # Line color based on climate variable
  geom_point(aes(color=Climate))+ # Point color based on climate variable 
  theme_classic() 
print(l)

Method 2: Use rowFromY or rowFromX to find row and column number of the raster for the location.

###############################################################
#~~~~ Method 2: Retrieve values using cell row and column number
row = rowFromY(ras_stack[[1]], Lat)    # Gives row number for the selected Lat
col = colFromX(ras_stack[[1]], Long)   # Gives column number for the selected Long
ras_stack[row,col][1:5]                # First five elements of data cube for selected x-y
##   NDVI_resamp_2016-01-01 NDVI_resamp_2016-01-17 NDVI_resamp_2016-02-02
## 1              0.5614925              0.5193725              0.4958409
##   NDVI_resamp_2016-02-18 NDVI_resamp_2016-03-05
## 1              0.5731566              0.6259534

3.2.2. Spatial extraction/summary from SpatRaster

Let’s try extracting values of SpatRaster (for all layers simultaneously) based on polygon features using extract function.

#~~~~ Method 1: Extract values based on spatial polygons
IPCC_shp = read_sf("./SampleData-master/CMIP_land/CMIP_land.shp")

# Calculates the 'mean' of all cells within each feature of 'IPCC_shp' for each layers
ndvi_sp = terra::extract(ras_stack,   # Data cube
                         vect(IPCC_shp),    # Shapefile for feature reference 
                         fun=mean, 
                         na.rm=T, 
                         #df=TRUE, 
                         method='bilinear')

head(ndvi_sp,n=3)[1:5]
##   ID NDVI_resamp_2016-01-01 NDVI_resamp_2016-01-17 NDVI_resamp_2016-02-02
## 1  1            -0.08325353            -0.05983509            -0.04461097
## 2  2             0.02765715             0.02737234             0.03172858
## 3  3             0.21922899             0.22114161             0.21826435
##   NDVI_resamp_2016-02-18
## 1            -0.03795020
## 2             0.03084664
## 3             0.27331317

We can also use a classified raster like aridity data we previously used, to extract values corresponding to each class of values using lapply and selective filtering. Note: The two rasters must have same extent and resolution.

#~~~~ Method 2: Extract values based on another raster
aridity = rast("./SampleData-master/raster_files/aridity_36km.tif")

# Create an empty list to store extracted feature 
climate_ndvi = list()
# Extracts the time series of NDVI for pixels for each climate region
climate_ndvi = lapply(list(1,2,3,4,5),function(x) (na.omit((ras_stack[aridity==x]))))

# Calculate and store mean NDVI for each climate region
climate_ndvi_mean = lapply(list(1,2,3,4,5),function(x) (mean(ras_stack[aridity==x], na.rm=TRUE)))

plot(unlist(climate_ndvi_mean),
     type = "b", 
     ylab = "Climate_NDVI_Mean", 
     xlab = "Aridity Index")

3.2.3. Crop and mask SpatRaster

We will now crop and mask SpatRaster using shapefile from the disk or from imported shapefile from spData::us_states. crop function clips the raster to the extent provided. mask removes the area outside the provided shapefile. Alternatively, we can import shapefile from disk using read_sf function.

# Import polygons for polygons for USA at level 1 i.e. state from disk

state_shapefile = read_sf("./SampleData-master/USA_states/cb_2018_us_state_5m.shp")

# Alternatively, use dataset from 'spData' package 'spData::us_states'

# Print variable names
names(state_shapefile)
##  [1] "STATEFP"  "STATENS"  "AFFGEOID" "GEOID"    "STUSPS"   "NAME"    
##  [7] "LSAD"     "ALAND"    "AWATER"   "geometry"
# Print state/ territory names
print(state_shapefile$NAME)
##  [1] "Nebraska"                                    
##  [2] "Washington"                                  
##  [3] "New Mexico"                                  
##  [4] "South Dakota"                                
##  [5] "Texas"                                       
##  [6] "California"                                  
##  [7] "Kentucky"                                    
##  [8] "Ohio"                                        
##  [9] "Alabama"                                     
## [10] "Georgia"                                     
## [11] "Wisconsin"                                   
## [12] "Oregon"                                      
## [13] "Pennsylvania"                                
## [14] "Mississippi"                                 
## [15] "Missouri"                                    
## [16] "North Carolina"                              
## [17] "Oklahoma"                                    
## [18] "West Virginia"                               
## [19] "New York"                                    
## [20] "Indiana"                                     
## [21] "Kansas"                                      
## [22] "Idaho"                                       
## [23] "Nevada"                                      
## [24] "Vermont"                                     
## [25] "Montana"                                     
## [26] "Minnesota"                                   
## [27] "North Dakota"                                
## [28] "Hawaii"                                      
## [29] "Arizona"                                     
## [30] "Delaware"                                    
## [31] "Rhode Island"                                
## [32] "Colorado"                                    
## [33] "Utah"                                        
## [34] "Virginia"                                    
## [35] "Wyoming"                                     
## [36] "Louisiana"                                   
## [37] "Michigan"                                    
## [38] "Massachusetts"                               
## [39] "Florida"                                     
## [40] "United States Virgin Islands"                
## [41] "Connecticut"                                 
## [42] "New Jersey"                                  
## [43] "Maryland"                                    
## [44] "South Carolina"                              
## [45] "Maine"                                       
## [46] "New Hampshire"                               
## [47] "District of Columbia"                        
## [48] "Guam"                                        
## [49] "Commonwealth of the Northern Mariana Islands"
## [50] "American Samoa"                              
## [51] "Iowa"                                        
## [52] "Puerto Rico"                                 
## [53] "Arkansas"                                    
## [54] "Tennessee"                                   
## [55] "Illinois"                                    
## [56] "Alaska"
# Exclude states outside of CONUS 
conus = state_shapefile[!(state_shapefile$NAME %in% c("Alaska","Hawaii","American Samoa",
                        "United States Virgin Islands","Guam", "Puerto Rico",
                        "Commonwealth of the Northern Mariana Islands")),] 

# plot CONUS as a vector using terra package
terra::plot(vect(conus)) 

# Crop SpatRaster using USA polygon
usa_crop = crop(ras_stack, ext(conus))       # Crop raster to CONUS extent

# Plot cropped raster
plot(usa_crop[[1:4]], col=mypal2)

# Mask SpatRaster using USA polygon
ndvi_mask_usa = terra::mask(usa_crop, vect(conus))    # Mask raster to CONUS 

# Mask SpatRaster using USA polygon
plot(ndvi_mask_usa[[1:4]], 
     col=mypal2, 
     fun=function(){plot(vect(conus), add=TRUE)} # Add background states
)

Let’s crop raster for selected US states.

# Mask raster for selected states
states = c('Colorado','Texas','New Mexico')

# Subset the selected states from CONUS shapefile
state_plot = state_shapefile[(state_shapefile$NAME %in% states),] 

# Raster operation
states_trim = crop(ras_stack, ext(state_plot))                # Crop raster
ndvi_mask_states = terra::mask(states_trim, vect(state_plot)) # Mask

# Plot raster and shapefile
plot(ndvi_mask_states[[1]], 
     col=mypal2, 
     fun=function(){plot(vect(conus), add=TRUE)} # Add background states
     )

3.3. Layer-wise operations on SpatRaster

We will demonstrate common layer-wise arithmetic operations, summary statistics and data transformation on SpatRaster.


Arithmetic operations a.k.a arith-generic (+, -, *, /, ^, %%, %/%) on SpatRaster closely resemble simple vector-like operations. More details on arith-generic can be found here: https://rdrr.io/cran/terra/man/arith-generic.html. For layer-wise summary statistics and user-defined operations, we will use global function.

Let’s try some layer-wise cell-statistics:

###############################################################
#~~~ PART 3.3.1: Layer-wise cell-statistics 
# Layer-wise Mean
global(sub_ras_stack, mean, na.rm= T) # Mean of each raster layer. Try modal, median, min etc. 
##                             mean
## NDVI_resamp_2016-01-01 0.2840846
## NDVI_resamp_2016-02-02 0.2948298
## NDVI_resamp_2016-03-05 0.3057182
## NDVI_resamp_2016-05-24 0.4618550
## NDVI_resamp_2016-06-25 0.4997354
# Layer-wise quantiles
global(sub_ras_stack, quantile, probs=c(.25,.75), na.rm= T)
##                              X25.      X75.
## NDVI_resamp_2016-01-01 0.07188153 0.5135894
## NDVI_resamp_2016-02-02 0.08442930 0.5177374
## NDVI_resamp_2016-03-05 0.09917563 0.5113073
## NDVI_resamp_2016-05-24 0.23671686 0.6695234
## NDVI_resamp_2016-06-25 0.26000642 0.7303487
# User-defined statistics by defining own function
quant_fun = function(x) {quantile(x, probs = c(0.25, 0.75), na.rm=TRUE)} 
global(sub_ras_stack, quant_fun) # 25th, and 75th percentile of each layer
##                              X25.      X75.
## NDVI_resamp_2016-01-01 0.07188153 0.5135894
## NDVI_resamp_2016-02-02 0.08442930 0.5177374
## NDVI_resamp_2016-03-05 0.09917563 0.5113073
## NDVI_resamp_2016-05-24 0.23671686 0.6695234
## NDVI_resamp_2016-06-25 0.26000642 0.7303487
# Custom function for mean, variance and skewness
my_fun = function(x){ 
  meanVal=mean(x, na.rm=TRUE)              # Mean 
  varVal=var(x, na.rm=TRUE)                # Variance
  skewVal=moments::skewness(x, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)         # Combine all statistics
  names(output)=c("Mean", "Var","Skew")    # Rename output variables
  return(output)                           # Return output
} 

global(sub_ras_stack, my_fun) # Mean, Variance and skewness of each layer
##                             Mean        Var       Skew
## NDVI_resamp_2016-01-01 0.2840846 0.07224034  0.5409973
## NDVI_resamp_2016-02-02 0.2948298 0.06677124  0.5075244
## NDVI_resamp_2016-03-05 0.3057182 0.06171894  0.5018285
## NDVI_resamp_2016-05-24 0.4618550 0.05944947 -0.2721941
## NDVI_resamp_2016-06-25 0.4997354 0.06587261 -0.3392214
#You can also use summary() to retrieve common descriptive statistics
summary(sub_ras_stack)
##  NDVI_resamp_2016.01.01 NDVI_resamp_2016.02.02 NDVI_resamp_2016.03.05
##  Min.   :-0.17          Min.   :-0.19          Min.   :-0.16         
##  1st Qu.: 0.07          1st Qu.: 0.08          1st Qu.: 0.10         
##  Median : 0.19          Median : 0.22          Median : 0.23         
##  Mean   : 0.28          Mean   : 0.29          Mean   : 0.31         
##  3rd Qu.: 0.51          3rd Qu.: 0.52          3rd Qu.: 0.51         
##  Max.   : 0.92          Max.   : 0.90          Max.   : 0.90         
##  NA's   :77736          NA's   :77094          NA's   :77106         
##  NDVI_resamp_2016.05.24 NDVI_resamp_2016.06.25
##  Min.   :-0.18          Min.   :-0.19         
##  1st Qu.: 0.24          1st Qu.: 0.26         
##  Median : 0.50          Median : 0.55         
##  Mean   : 0.46          Mean   : 0.50         
##  3rd Qu.: 0.67          3rd Qu.: 0.73         
##  Max.   : 0.89          Max.   : 0.91         
##  NA's   :77100          NA's   :77091

Global cell-statistics can be calculated by converting SpatRaster to a vector and then using regular statistics functions. We will use as.vector() to convert rasters to vector array before arithmetic operation.

###############################################################
#~~~ PART 3.3.2: Global cell-statistics

# By vectorizing the SpatRaster and finding statistics
min_val=min(as.vector(sub_ras_stack), na.rm=T) 
max_val=max(as.vector(sub_ras_stack), na.rm=T)
print(c(min_val,max_val))
## [1] -0.20000  0.95065
# Another example of statistics of vectorized SpatRaster
quant=quantile(as.vector(sub_ras_stack), c(0.01,0.99), na.rm=TRUE) #1st and 99th percentiles
print(quant)
##          1%         99% 
## -0.05663428  0.84776928

Layer-wise arithmetic operations on SpatRaster:

###############################################################
#~~~ PART 3.3.3: Layer-wise operations on SpatRaster

# Arithmetic operations on SpatRaster are same as lists
add = sub_ras_stack+10                  # Add a number to raster layers
mult = sub_ras_stack*5                  # Multiply a number to raster layers
subset_mult = sub_ras_stack[[1:3]]*10   # Multiply a number to a subset of raster layers

# Data filtering based on cell-value
filter_stack = sub_ras_stack            # Create a RasterBrick to operate on
filter_stack[filter_stack<0.5] = NA     # Assign NA to any value less than 0.5

# Let's plot the filtered rasters
plot(filter_stack, 
     col=mypal2,  # Color pal  
     asp=NA,      # Aspect ratio: NA, fill to plot space
     nc=2,        # Number of columns to arrange plots
    fun=function(){plot(vect(coastlines), add=TRUE)} # Add background map
)                           

# Layer-wise Log-transformation
log_ras_stack=log(sub_ras_stack) 

# Normalize raster layers to [0,1] based on min and max of raster brick/stack
norm_stk=(sub_ras_stack-min_val)/(max_val-min_val) # Notice that the values are between [0,1]

# Plot in Robinson projection
mypal3 = cetcolor::cet_pal(20, name = "l5")
unikn::seecol(mypal3)

WorldSHP = st_as_sf(spData::world)

norm_NDVI= tm_shape(WorldSHP, projection = 'ESRI:54030', 
                    ylim = c(-65, 90)*100000, 
                    xlim = c(-152,152)*100000, 
                    raster.warp = T)+
  tm_sf()+
  tm_shape(norm_stk[[2:5]], projection = 'ESRI:54030', raster.warp = FALSE) +
  tm_raster(palette = rev(mypal3), title = "Normalized NDVI", style = "cont")+
  tm_layout(main.title = "NDVI",
            main.title.fontfamily = "Times",
            legend.show = T,
            legend.outside = T,
            legend.outside.position = c("right", "top"),
            frame = FALSE, 
            #earth.boundary = c(-160, -65, 160, 88),
            earth.boundary.color = "grey",
            earth.boundary.lwd = 2,
            fontfamily = "Times")+
  tm_graticules(alpha = 0.2,                               # Add lat-long graticules
                labels.inside.frame = FALSE,  
                col = "lightgrey", n.x = 4, n.y = 3)+
  tm_facets(ncol = 2)

print(norm_NDVI)

3.4. Cell-wise operations with app, lapp, tapp

We will now carry-out cell-wise operations on SpatRaster using the app(), tapp() and lapp() functions. These functions carry operations on each cell of a SpatRaster for all layers and are generally used to summarize the values of multiple layers into one layer. They are preferable in the presence of large raster datasets. Additionally, they allow you to save an output file directly.


3.4.1. app: Cell-wise operation on all layers

The app() function applies a function to each cell of a raster and is used to summarize (e.g., calculating the sum) the values of multiple layers into one layer.

#Let's look at the help section for app()
?terra::app

# Calculate mean of each grid cell across all layers
mean_ras = app(sub_ras_stack, fun=mean, na.rm = T)

# Calculate sum of each grid cell across all layers
sum_ras = app(sub_ras_stack, fun=sum, na.rm = T)

#~~ A user-defined function for mean, variance and skewness
my_fun = function(x){ 
  meanVal=mean(x, na.rm=TRUE)              # Mean 
  varVal=var(x, na.rm=TRUE)                # Variance
  skewVal=moments::skewness(x, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)         # Combine all statistics
  names(output)=c("Mean", "Var","Skew")    # Rename output variables
  return(output)                           # Return output
} 

# Apply user function to each cell across all layers
stat_ras = app(sub_ras_stack, fun=my_fun)

# Plot statistics
plot(stat_ras, 
     col=mypal2,  # Color pal  
     asp=NA,      # Aspect ratio: NA, fill to plot space
     nc=2,        # Number of columns to arrange plots
     fun=function(){plot(vect(coastlines), add=TRUE)} # Add background map
)

DIY:Try plotting the same using tmap.

3.4.2. tapp: Cell-wise operation on layer groups

tapp() is an extension of app(), allowing us to select a subset of layers for which we want to perform a certain operation. Let’s have the first two layers as group 1 and the next three as group 2. Function will be applied to each group separately and 2 layers of output will be generated.

#Let's look at the help section for app()
?terra::tapp

#The layers are combined based on indexing.
stat_ras = terra::tapp(sub_ras_stack,
                     index=c(1,1,2,2,2),
                     fun= mean)

# Try other functions: "sum", "mean", "median", "modal", "which", "which.min", "which.max", "min", "max", "prod", "any", "all", "sd", "first".

names(stat_ras) = c("Mean_of_rasters_1_to_2", "Mean_of_rasters_3_to_5")
# Two layers are formed, one for each group of indices

# Lets plot the two output rasters
plot(stat_ras, 
     col=mypal2,  # Color pal  
     asp=NA,      # Aspect ratio: NA, fill to plot space
     nc=2,        # Number of columns to arrange plots
     fun=function(){plot(vect(coastlines), add=TRUE)} # Add background map
)

DIY: Try plotting the same using tidyterra.

3.4.3. lapp: layers as function arguments

The lapp() function allows to apply a function to each cell using layers as arguments.

#Let's look at the help section for app()
?terra::lapp

#User defined function for finding difference
diff_fun = function(a, b){ return(a-b) }

diff_rast = lapp(sub_ras_stack[[c(4, 2)]], fun = diff_fun)

#Plot NDVI difference
plot(diff_rast, 
     col=mypal2,  # Color pal  
     asp=NA,      # Aspect ratio: NA, fill to plot space
     nc=2,        # Number of columns to arrange plots
     fun=function(){plot(vect(coastlines), add=TRUE)} # Add background map
)


CHAPTER 4. Parallel computation for geospatial analysis


Many computations in R can be made faster by the use of parallel computation. Generally, parallel computation is the simultaneous execution of different pieces of a larger computation across multiple computing processors or cores. The basic idea is that if you can execute a computation in X seconds on a single processor, then you should be able to execute it in X/n seconds on n processors.

Such a speed-up might not possible because of overhead and various barriers to splitting up a problem into n pieces, but it is often possible to come close in simple problems. For more details read: https://www.linkedin.com/pulse/thinking-parallel-high-performance-computing-hpc-debasish-mishra.

library(parallel)

SMAPBrk=rast("./SampleData-master/SMAP_L3_USA.nc")

plot(mean(SMAPBrk, na.rm=TRUE), asp=NA)

4.1. Cellwise implimentation of functions

4.1.1. Apply custom function to pixel time series

Once we have imported the NetCDF file as SpatRaster, we wil apply a slightly modified version of previously used function my_fun for calculating mean, variance and skewness for time series data for each cell in parallel. We will use terra::app function to apply my_fun on SpatRaster in parallel.


Expert Note: For seamless implementation of function in parallel mode, care must be taken that all necessary are accessible to ALL cores and error exceptions are handles appropriately. We will modify my_fun slightly to highlight what it means in practice.


  1. We will convert input x to a numeric array
  2. We will remove NA values from dataset before calculation
  3. We will use minSamp to fix minimum sample counts for calculation
  4. We will use tryCatch to handle error exceptions

The basic rules to avoid errors: (a) checking that inputs are correct, (b) avoiding non-standard evaluation, and (c) avoiding functions that can return different types of output.

#~~ We will make some changes in the custom function for mean, variance and skewness
minSamp = 50                           # Minimum assured samples for statistics

my_fun = function(x, minSamp, na.rm=TRUE){    
  smTS=as.numeric(as.vector(x))     # Convert dataset to numeric array
  smTS=as.numeric(na.omit(smTS))    # Omit NA values 
  
  # Implement function with trycatch for catching exception 
  tryCatch(if(length(smTS)>minSamp) {      # Apply minimum sample filter
  
  ######## OPERATION BEGINS #############    
  meanVal=mean(smTS, na.rm=TRUE)              # Mean 
  varVal=var(smTS, na.rm=TRUE)                # Variance
  skewVal=moments::skewness(smTS, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)            # Combine all statistics
  return(output)                              # Return output
  ######## OPERATION ENDS #############    

  } else {
    return(rep(NA,3))                         # If conditions !=TRUE, return array with NA
  },error =function(e){return(rep(NA, 3))})   # If Error== TRUE, return array with NA
}


# Apply function to all grids in parallel
tic()
stat_brk = app(SMAPBrk, 
               my_fun, 
               minSamp = 50,                                          # Minimum assured samples for statistics
               cores =parallel::detectCores(logical = FALSE) - 1)     # Leave one core for housekeeping

# Beware while using detectCores(). 
# The argument logical = FALSE returns the number of physical cores.
# logical = TRUE returns the number of available hardware threads. 


names(stat_brk)=c("Mean", "Variance", "Skewness")      # Add layer names
toc()
## 6.19 sec elapsed
plot(stat_brk, col=mypal2)

4.2.1. Best practices for large-scale operations

Error handling is the art of debugging unexpected problems in your code. One easy solution when looping through customized functions is to include print() messages after each major operation which can help indicate where the error might be happening. Furthermore when dealing with large spatial data:

  1. Try parallel operation on a smaller region before submitting large jobs to HPRC. Pixel-wise implementation of the function can help identify errors in the code. Convert the cropped region into a data frame and apply function to time series of each cell. If your code throws error, troubleshoot carefully for the series which generates the error.
library(terra)
e <- ext( c(-110,-108, 35,37) )   # Sample 2X2 degree domain
p <- as.polygons(e)
crs(p) <- "EPSG:4326"

# Use this polygon to crop and mask the larger SpatRaster
  1. Use tryCatch carefully as it may suppress legitimate errors as well, generating spurious results. Test the codes for smaller region without tryCatch to test the robustness of your codes.

Expert Note: Parallel computing may have some overheads upon creation and closing of clusters. A significant improvement in computing times using parallel techniques would be visible for large jobs.


4.2. Layerwise implimentation of functions

4.2.1. Data cubes as lists

We will convert Spatraster to a list of rasters and then we will apply my_fun to each element of the list in parallel using future_lapply.

Beware while using detectCores(). The argument logical = FALSE returns the number of physical cores and logical = TRUE returns the number of available hardware threads.

# Convert Spatraster to a list of rasters
rasList=as.list(SMAPBrk[[1:10]])           #What will happen if we pass rast(rasList)?
length(rasList)
## [1] 10
my_fun = function(x){                
  x=as.numeric(values(x))                  # Create vector of numeric values of SpatRaster
  meanVal=base::mean(x, na.rm=TRUE)        # Mean 
  varVal=stats::var(x, na.rm=TRUE)         # Variance
  skewVal=moments::skewness(x, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)         # Combine all statistics
  return(output)                           # Return output
} 

# Test the function for one raster
my_fun(rasList[[1]])
## [1] NaN  NA NaN
# Apply function in parallel for all layers
library(parallel) 
library(future.apply) 
library(future)
library(tictoc)

# A multisession future: employ max core-1 for processing
future::plan(multicore, workers = detectCores(logical = FALSE) - 1)

# Deploy function in parallel 
tic()
outStat= future_lapply(rasList, my_fun)
toc()
## 27.25 sec elapsed
# Check output for one layer
# outStat[[2]]

4.2.2. Blockwise summary of feature extracted data

In this section we will use a shapefile to extract cell values from a SpatRaster as a list using exact_extract. Summary statistics will be calculated in parallel using my_fun for dataset for each feature.




Expert Note: Function exactextractr::exact_extract is faster and more suited for large applications compared to terra::extract. Although both perform similar operation with little changes in output format


#~ Extract feature data as data frame
library(exactextractr)
library(sf)
library(sp)

featureData=exact_extract(SMAPBrk,   # Raster brick 
                st_as_sf(conus),     # Convert shapefile to sf (simple feature)
                force_df = FALSE,    # Output as a data.frame?
                include_xy = FALSE,  # Include cell lat-long in output?
                fun = NULL,          # Specify the function to apply for each feature extracted data
                progress = TRUE)     # Progressbar
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%
length(featureData) # Same as feature count in CONUS? i.e. nrow(conus) 
## [1] 49
# Lets try out data for Texas
which(conus$NAME=="Texas")  # Find feature number for Texas
## [1] 5
# View(featureData[[5]])    # View the extracted data frame
nrow(featureData[[5]])      # No. pixels within selected feature
## [1] 694

Each row in featureData[[5]] is the time series of cell values which fall within the boundary of feature number 5, i.e. Texas. Since exact_extract function provides coverage_fraction for each pixel in the output, we will make some minor change in the my_fun function to remove this variable before calculating the statistics.

# Extract SM time series for first pixel by removing percentage fraction
cellTS=as.numeric(featureData[[5]][1,1:nlyr(SMAPBrk)])

# Plot time time series for the selected feature
plot(cellTS, type="l", xlab="Time", ylab="Soil moisture")

#~~ We will make another small change in the custom function for mean, variance and skewness
minSamp=50   # Minimum assured samples for statistics

my_fun = function(x, na.rm=TRUE){    
  xDF=data.frame(x)                  # Convert list to data frame
  xDF=xDF[ , !(names(xDF) %in% 'coverage_fraction')] # Remove coverage_fraction column
  xData=as.vector(as.matrix(xDF))    # Convert data.frame to 1-D matrix
  smTS=as.numeric(na.omit(xData))    # Omit NA values                   
  
  # Implement function with trycatch for catching exception 
  tryCatch(if(length(smTS)>minSamp) {      # Apply minimum sample filter
  
  ######## OPERATION BEGINS #############    
  meanVal=mean(smTS, na.rm=TRUE)              # Mean 
  varVal=var(smTS, na.rm=TRUE)                # Variance
  skewVal=moments::skewness(smTS, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)         # Combine all statistics
  return(output)                           # Return output
  ######## OPERATION ENDS #############    

  } else {
    return(rep(NA,3))   # If conditions !=TRUE, return array with NA
  },error =function(e){return(rep(NA, 3))}) # If Error== TRUE, return array with NA
}

Let’s apply my_fun to extracted data for each feature.

# Test the function for one block
my_fun(featureData[[5]])
## [1] 0.1727930 0.0095579 0.9876028
# Apply function in parallel for all layers
library(parallel) 
library(snow)
library(future.apply) 
library(future)

future::plan(multisession, workers = detectCores(logical = FALSE) - 1)
outStat= future_lapply(featureData, my_fun)

# Test output for one feature
outStat[[5]]  # Is this the same as before?
## [1] 0.1727930 0.0095579 0.9876028
# Extract each summary stats for all features from the output list  
FeatureMean=sapply(outStat,"[[",1)  # Extract mean for all features
FeatureVar=sapply(outStat,"[[",2)   # Extract variance for all features
FeatureSkew=sapply(outStat,"[[",3)  # Extract skewness for all features

# Let's place mean statistics as an attribute to the shapefile
conus$meanSM=FeatureMean

# Plot mean soil moisture map for CONUS 
library(rcartocolor)
library(ggplot2)
library(sf)
library(sp)

mean_map=ggplot() + 
  geom_sf(data = st_as_sf(conus), # CONUS shp as sf object (simple feature)
          aes(fill = meanSM)) +   # Plot fill color= mean soil moisture
  scale_fill_carto_c(palette = "BluYl",     # Using carto color palette
                     name = "Mean SM",      # Legend name
                     na.value = "#e9e9e9",  # Fill values for NA 
                     direction = 1)+        # To invert color, use -1
  coord_sf(crs = 2163)+   # Reprojecting polygon 4326 or 3083 
  theme_void() +          # Plot theme. Try: theme_bw
  theme(legend.position = c(0.2, 0.1),  
        legend.direction = "horizontal",
        legend.key.width = unit(5, "mm"),
        legend.key.height = unit(4, "mm"))
mean_map


CHAPTER 5. Supplementary material


5.1. Exporting SpatRaster to NetCDF Data

When working on a project, you will often need to use the same SpatRaster over and over again. Hence, its convenient to store the model outputs as NetCDFs which can be imported as SpatRaster later for further analysis. Here we will look at the method to export SpatRaster to NetCDF.


Expert Note: a) NetCDF write speeds are significantly slower on cloud platforms like GoogleDrive or OneDrive. Hence, its recommended to export NetCDF files to local disk for faster write speeds. b) Before writing NetCDF file, convert your list of SpatRaster (if any) to one Multi-layer SpatRaster for faster write speed.


library(ncdf4)

rb=rast("./SampleData-master/SMAP_L3_USA.nc")
r=rb[[1]] # raster taken from a first layer of a stack
xy<-xyFromCell(r,1:length(r)) #matrix of logitudes (x) and latitudes(y)
lon=unique(xy[,1])
lat=unique(xy[,2])
 

# first we write the netcdf file to disk
terra::writeCDF(rb,                               #SpatRaster
         file.path(tempdir(), "test.nc"),         #Output filename
         overwrite=TRUE,
         varname="soil_moisture",
         unit="[m3/m3]",
         longname="SMAPL3-V7 SM, 2-day interpolated, 36KM",
         zname='time')

5.2. Updating R using RStudio

The operations in this tutorial are based on R version 4.1.1- Kick Things and version 4.0.3 - Bunny-Wunnies Freak Out. If necessary, update R from Rstudio using the updateR function from the installr package.

install.packages("installr")
library(installr)
updateR(keep_install_file=TRUE)

5.3. Changing Temp files for large storage

When processing large rasters/ vectors, you may run out of storage space due to large size of temporary files being generated during processing. Changing temp directory to an external disk with larger storage helps in such case. Follow this discussion for more details: https://stackoverflow.com/questions/17107206/change-temporary-directory.

5.4. Resampling with rgdalwarp

When processing high resolution rasters, you may run out of storage space when using the terra::resample(). One way to get around it is by using rgdalwarp(). It is an R wrapper for the ‘gdalwarp’ function that is part of the Geospatial Data Abstraction Library (GDAL), and provides utility for reprojecting rasters into any valid projection. For more details read: https://www.rdocumentation.org/packages/gdalUtils/versions/2.0.3.2/topics/gdalwarp.
The following is an example for resampling MODIS ET (500m) data to 25km resolution.

library(rgdal); library(gdalUtils)

###Resampling with gdalwarp():

#Run the code for the year:

for (year_id in year_first:year_last) {

 #Create a date/doy list:

  Filenames_tif <- list.files("~/.../MODIS_ET", full.names = F)
  Files_subset <- str_subset(Filenames_tif, str_c("_doy",year_id))
  
  #Extract Dates from file names
  doy_list <- str_extract(Files_subset, str_sub(Files_subset, 30, 32))
  
  for (doyRun in seq_along(doy_list)) {
   #Resample the data
   Resampled_Raster<- gdalwarp(
     #source file name
     srcfile = str_c('~/.../MCD15A2H.061_Lai_500m_doy',year_id,doy_list[doyRun],'_aid0001.tif'), 
     #destination file name
     dstfile = str_c('~/.../MCD15A2H.061_Lai_25km_doy',year_id,doy_list[doyRun],'_aid0001.tif'),
     srcnodata = c("249":"255"),  #set nodata values
     tr = c(0.25, 0.25),          #output file resolution: Resampling to 0.25x0.25 degrees (~25km)
     te = c(-180,-90,180,90),     #georeferenced extents of output file
     overwrite=TRUE, 
     output_Raster = TRUE)
   }
}

Follow Part 1: Data Visualization with R for a refresher on ggplot2, wordcloud, OfficeR and text mining.


Data Visualization and Geospatial Analysis With R (2023)
Vinit Sehgal | Louisiana State University
Debasish Mishra | Texas A&M University
Correspondence: (https://orcid.org/0000-0002-8837-5864)