Data Visualization and Geospatial Analysis With R

Large-scale Geospatial Analysis (2021)


Taking examples from global satellite data in gridded/ raster format, we will demonstrate several common geospatial operations like projections, resampling, cropping, masking, spatial extraction etc. using rasters, shapefiles, and spatial data frames. For a seamless analysis across different datatypes and platforms, conversion from/to different data formats like data frames, matrices, rasters, and structured data like NetCDF will be discussed. Advanced topics will include working with data cubes (RasterStack/ RasterBrick), layer-wise operations on data cubes, cell-wise operations on raster time series by implementing user-defined functions with stackApply and calc. Parallel implementation of custom functions will be demonstrated for large-scale datasets.


Course level: Advanced. Prior experience in R required.


Materials: For this excercise, we will use global dataset from two NASA satellites- MODIS for NDVI and SMAP for surface (~5cm) soil moisture. We will use global aridity estimates (from [Global Aridity and PET Database] (https://cgiarcsi.community/data/global-aridity-and-pet-database/) as a general climate classification (Hyper-arid, arid, semi-arid, sub-humid and humid).

The codes are tested on R version 4.1.1- Kick Things and version 4.0.3 - Bunny-Wunnies Freak Out. To update, see section 5.3.


CHAPTER 0. Import libraries and sample dataset


We will begin by loading necessary libraries. The sample dataset for this excercise can be downloaded manually from GitHub by accessing https://github.com/Vinit-Sehgal/SampleData


Alternatively, use the following code to download and extract the sample data from GitHub repository.

Install all necessary packages (Run once).

###############################################################
#~~~ Load required libraries
lib_names=c("raster","ggplot2","unikn","mapview",
            "gridExtra","rgdal","fields",
            "RColorBrewer","ncdf4","rasterVis",
            "rcartocolor","pacman","purrr","moments","tictoc", 
            "sf", "sp", "exactextractr","readxl", 
            "snow","future.apply","parallel")
invisible(suppressMessages
          (suppressWarnings
            (lapply
              (lib_names,install.packages, character.only = T))))
# Load necessary packages
invisible(suppressMessages
          (suppressWarnings
            (lapply
              (lib_names,require, character.only = T))))

# An easy way to load multiple packages is through pacman::p_load
# pacman::p_load("raster","ggplot2","unikn","mapview",
#             "gridExtra","rgdal","fields",
#             "RColorBrewer","ncdf4","rasterVis", "moments", "tictoc")

We will now download the workshop repository, which contains all data we will use for this exercise.

###############################################################
#~~~ Import sample data from GitHub repository
download.file(url = "https://github.com/Vinit-Sehgal/SampleData/archive/master.zip",
destfile = "SampleData-master.zip")    # Download ".Zip"

# Unzip the downloaded .zip file
unzip(zipfile = "SampleData-master.zip")
# getwd()                           # Current working directory
list.files("./SampleData-master")   # list folder contents
##  [1] "aridity_36km.tif"               "CMIP_land"                     
##  [3] "desktop.ini"                    "functions"                     
##  [5] "GADM_2.8_USA_adm1.rds"          "HUC4"                          
##  [7] "images"                         "location_points.xlsx"          
##  [9] "mat2ras.R"                      "ne_10m_coastline"              
## [11] "precip.V1.0.2020.nc"            "raster_files"                  
## [13] "README.md"                      "robin_ras.R"                   
## [15] "sample_pdfs"                    "SM_RE06_MIR_CDF3SA_20181225.nc"
## [17] "SMAP_L3_USA.nc"                 "SMAP_SM.tif"                   
## [19] "SMAPL4_H5"                      "SMOS_nc"                       
## [21] "SPL3SMP_2020.nc"                "TreeRingData.csv"              
## [23] "USA_states"                     "Workbook_DVGAR-Part1.html"     
## [25] "Workbook_DVGAR-Part2.html"

CHAPTER 1. Raster and shapefile visualization


A geographic information system, or GIS refers to a platform which can map, analyzes and manipulate geographically referenced dataset. A geographically referenced data (or geo-referenced data) is a spatial dataset which can be related to a point on Earth with the help of geographic coordinates. Types of geo-referenced spatial data include: rasters (grids of regularly sized pixels) and vectors (polygons, lines, points).



A quick and helpful review of spatial data can be found here: https://spatialvision.com.au/blog-raster-and-vector-data-in-gis/

1.1. Plotting raster data

In this section, we will plot global raster data of surface (~5 cm) soil moisture from SMAP. In this proess we will explore functions from raster, ggplot2 and mapview packages.

Let’s start by importing the necessary packages.

# For raster operations
library(raster)  
library(mapview) 
# Libraries for fancy color palettes
library(RColorBrewer)
library(unikn)
library(rcartocolor)
 
# Import SMAP soil moisture raster from the downloaded folder
sm=raster("./SampleData-master/raster_files/SMAP_SM.tif") 

Once we have imported the raster. Let’s check its attributes. Notice the dimensions, resolution, extent, crs i.e. coordinate reference system and values.

print(sm)
## class      : RasterLayer 
## dimensions : 456, 964, 439584  (nrow, ncol, ncell)
## resolution : 0.373444, 0.373444  (x, y)
## extent     : -180, 180, -85.24595, 85.0445  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : G:/My Drive/TAMU/Teaching/DataVisWorkshop/DataVisWorkshop2021/PrepFolderDataVizard2021/Part2_2021/SampleData-master/raster_files/SMAP_SM.tif 
## names      : SMAP_SM 
## values     : 0.01999998, 0.8766761  (min, max)
# Try 
# extent(sm)   # Geographic extent of the raster
# res(sm)      # X-Y resolution of the raster
# crs(sm)      # coordinate reference system

Now let’s plot the raster using raster::plot. Interactive plots can be made by using mapview function.

# Raster plot
plot(sm, main="Soil Moisture")

1.2. Customizing raster plot options

1.2.1 Color palettes

We will generate custom color palettes for better visualization. We will demonstrate use of RColorBrewer (https://colorbrewer2.org; https://rdrr.io/cran/RColorBrewer/man/ColorBrewer.html), unikn (https://github.com/hneth/unikn) and rcartocolor (https://carto.com/carto-colors/) libraries.

For more on colors in R, follow: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

# Make custom color palette
#~~ A) User defined color palette using unikn library
mypal=unikn::seecol(pal = usecol(c("brown","orange","yellow","darkgreen","cyan","darkblue","black"), n = 20)) 

#~~ B) User defined color palette using carto library
mypal = carto_pal(20, "Fall") 
unikn::seecol(pal=mypal, n=20) # View the new color palettes

#~~ C) Use "brewer.pal" function from RColorBrewer 
mypal= brewer.pal(11,"Spectral")    # Max colors available in the palette=11 
mypal= colorRampPalette(mypal)(20)  # Expand color palette to 20
unikn::seecol(pal=mypal, n=20)      # View the new color palettes

1.2.2 Customize raster plots

There is a long list of customizable operations available while plotting rasters in R. Let’s play with some of these options. Try horizontal=TRUE, interpolate=FALSE, change xlim=c(-180, 180) with asp=1, or try legend.shrink=0.4.

sm=raster("./SampleData-master/raster_files/SMAP_SM.tif") # SMAP soil moisture data

raster::plot(sm, 
     main="Raster: New fancy plot", # Title of the plot
     # Color options      
     col=brewer.pal(10,"Spectral"), # Custom colormap 
     lab.breaks=seq(0, 1, by=0.1),  # Sequence from 0-1 with 0.1 increment
     breaks=seq(0, 1, by=0.1),      # Sequence from 0-1 with 0.1 increment
     colNA="lightgray",             # Color of cells with NA values
     
     # Axis options      
     axes=TRUE,              # Plot axes: TRUE/ FALSE
     xlim=c(-180, 180),      # X-axis limit
     ylim=c(-90, 90),        # Y-axis limit
     zlim=c(0,1),            # Z-axis limit
     xlab="Longitude",       # X-axis label
     ylab="Latitue",         # Y-axis label
     
     # Legend options      
     legend=TRUE,            # Plot legend: TRUE/ FALSE
     horizontal = FALSE,     # Plot legend horizontally: TRUE/ FALSE
     legend.args = list(text='v/v', side=2,
                        font=2, cex=0.8), # Legend text and properties
     legend.width = 1,       # Legend width
     legend.shrink=1,        # Legend height; 1= plot height
     
     # Miscellaneous
     box=TRUE,               # Disable box around the plots: TRUE/ FALSE
     asp = NA,               # Aspect ratio; asp=NA fills plot to window
     interpolate=TRUE        # Interpolate output plot
)

# To add grid to the plot
# grid()

Let us try some interactive map options for raster data using mapview package. Try plotting the map with and without at option.

# Interactive plot
mapview(sm,                  # Raster data
        col.regions = mypal, # Color palette 
        at=seq(0, 0.8, 0.1) # Breaks
        )
# mapview(sm, col.regions = mypal)

1.3. Plotting raster data using ggplot2

ggplot2 has several options to make publication-ready plots in R. We will use geom_tile to display raster data. As geom_tile requires data in the long format (Lat, Long, Z), we will first convert soil moisture raster to a spatial pixel dataframe. We will use the option scale_fill_gradientn for colormap as it gives the convenience of defining labels, breaks and limits for the plotted data.

library(ggplot2)

#~~~ 1.2.1. Transformation of raster to x-y-z format. 
sm_df_smap = as.data.frame(as(sm, "SpatialPixelsDataFrame")) # Spatial dataframe of SM data
colnames(sm_df_smap) = c("SM", "Long", "Lat") # Rename the columns of the dataframe
head(sm_df_smap)                              # Checkout the spatial dataframe
##           SM      Long      Lat
## 1 0.06211190 -92.05394 78.88268
## 2 0.05005269 -89.06639 78.88268
## 3 0.06211190 -92.05394 78.50923
## 4 0.05005269 -89.06639 78.50923
## 5 0.06211190 -92.05394 78.13579
## 6 0.05005269 -89.06639 78.13579
sm_world=ggplot() +                             # Initialize the plot
  geom_tile(data=sm_df_smap,                    # Add soil moisture data
            aes(x=Long, y=Lat, fill=SM))+       # Provide X, Y and Z data
   scale_fill_gradientn(colors=mypal,           # Use user-defined colormap
   name = "SM",                                 # Name of the colorbar
   na.value = "transparent",                    # transparent NA cells
   labels=(c("0", "0.2", "0.4", "0.6", "0.8")), # Labels of colorbar
   breaks=seq(0,0.8,by=0.2),                    # Set breaks of colorbar
   limits=c(0,0.8))+                            # Set limits of colorbar
  coord_equal()+                                # Aspect ratio of plot
  xlab("Longitude")+                            # X-axis label
  ylab("Latitude")+                             # Y-axis label
  borders("world",                              # Add global landmass boundaries
          colour="gray43",                      # Fill light-gray color to the landmass
          fill="transparent")+                  # Transparent background
  borders("state",                              # Add US state borders
          colour="gray43",                      # Use light-gray color
          fill="transparent")                   # Use transparent background      
print(sm_world)

We can plot the map for a specific extent (CONUS, in this case) by changing the range of coord_fixedoption. We will also use a different theme: theme_bw. Try xlim = c(114,153) and ylim = c(-43,-11)!

sm_conus=ggplot() +                               # Initialize the plot
  geom_tile(data=sm_df_smap,                    # Add soil moisture data
            aes(x=Long, y=Lat, fill=SM))+       # Provide X, Y and Z data
   scale_fill_gradientn(colors=mypal,           # Use user-defined colormap
   name = "SM",                                   # Name of the colorbar
   na.value = "transparent",                      # transparent NA cells
   labels = (c("0", "0.2", "0.4", "0.6", "0.8")), # Labels of colorbar
   breaks = seq(0,0.8,by =0.2),                   # Set breaks of colorbar
   limits = c(0,0.8))+                            # Set limits of colorbar
  xlab("Longitude")+            # X-axis label
  ylab("Latitude")+             # Y-axis label
  borders("world",              # Add global landmass boundaries
          colour ="gray43",     # Fill light-gray color to the landmass
          fill ="transparent")+ # Transparent background
  borders("state",              # Add US state borders
          colour = "gray43",          # Use light-gray color
          fill = "transparent")+      # Use transparent background
  coord_fixed(xlim = c(-125,-67),     # Add extent for CONUS
              ylim = c(24,50))+               
  theme_bw()                          # Try black-and-white theme. 

print(sm_conus)

For a discrete color map, we will specify the breaks in the data and use scale_fill_manual for a discrete color palette.

brk=seq(0,0.8, by=0.1)  # Breakpoints in the data

sm_conus=ggplot() +                   # Initialize the plot
       geom_tile(data=sm_df_smap,     # Dataframe used for the plot
        aes(x=Long, 
            y=Lat, 
            fill=cut(SM,breaks=brk)), # Specify X, Y and fill (z)
       alpha=1)+                      # Transparency of the plot layer  
       scale_fill_manual(values = brewer.pal(length(brk)-1,"Spectral"), # Discrete palette 
         na.value="transparent",      # Set transparent fill values for NA cells
         name="SM",                   # Name of the legend bar
         labels=levels(cut(sm_df_smap$SM, breaks=brk))) + # Legend labels Cut data using breaks
  xlab("Longitude")+              # X-axis label
  ylab("Latitude")+               # Y-axis label
  borders("world",                # Add global landmass boundaries
          colour = "gray43",      # Fill light-gray color to the landmass
          fill = "transparent")+  # Transparent background
  borders("state",                # Add US state borders
          colour = "gray43",      # Use light-gray color
          fill = "transparent")+  # Use transparent background
  coord_fixed(xlim = c(-125,-67), # Add extent for CONUS
              ylim = c(24,50))+               
  theme_bw()                      # Try black-and-white theme. 

print(sm_conus)

# Saving the plot to disk:
ggsave(plot = sm_conus,          # ggplot object to be exported
       filename ="sm_conus.png", # Export file name
       bg = "white",           # Background color
       units="in", dpi=300,    # Export dim unit & Dots per inch (resolution)
       width=5, height=5)      # Height, width

1.4. Plotting vector data

Importing and plotting shapefiles is equally easy in R. We will import the shapefile of the updated global IPCC climate reference regions (https://doi.org/10.5194/essd-12-2959-2020) as OpenGIS Simple Features Reference (OGR). We will also use global coastline shapefile from the web for plotting.

###############################################################
#~~~ PART 1.4.1: Importing and visualizing shapefiles
library(rgdal)  # For manipulating shapefiles
library(raster) # For plotting raster and shapefile data

# Import the shapefile of global IPCC climate reference regions (only for land) 
IPCC_shp = rgdal::readOGR("./SampleData-master/CMIP_land/CMIP_land.shp",verbose = FALSE)
# View attribute table of the shapefile
head(IPCC_shp@data) # Notice the attributes look like a data frame
##              V1                V2  V3 V4
## 0        ARCTIC Greenland/Iceland GIC  1
## 1 NORTH-AMERICA        N.E.Canada NEC  2
## 2 NORTH-AMERICA   C.North-America CNA  3
## 3 NORTH-AMERICA   E.North-America ENA  4
## 4 NORTH-AMERICA N.W.North-America NWN  5
## 5 NORTH-AMERICA   W.North-America WNA  6
# Load global coastline shapefile 
coastlines = rgdal::readOGR("./SampleData-master/ne_10m_coastline/ne_10m_coastline.shp",verbose = FALSE)

# Alternatively, download global coastlines from the web 
# NOTE: May not work if the online server is down
# download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_coastline.zip?version=4.0.1",destfile = 'ne_110m_coastline.zip')
# # Unzip the downloaded file
# unzip(zipfile = "ne_110m_coastline.zip",exdir = 'ne-coastlines-110m')

# Plot a raster with overlaying shapefiles
raster::plot(IPCC_shp)             # Add IPCC land regions in blue color
raster::plot(coastlines, add=TRUE) # Add global coastline

#Subset shapefile for Eastern North-America
raster::plot(IPCC_shp[4,], main="Polygon for Eastern North-America")

# Combine raster plots with overlaying shapefiles
plot(sm, 
     col = brewer.pal(10,"Spectral"),
     lab.breaks = seq(0, 1, by=0.1),  # Sequence from 0-0.8 with 0.1 increment
     breaks = seq(0, 1, by=0.1))      # Global SMAP soil moisture
raster::plot(IPCC_shp, add=TRUE)   # Add IPCC land regions in blue color
raster::plot(coastlines, add=TRUE) # Add global coastline

###############################################################
#~~~ PART 1.4.2: Add spatial point to shapefile/ raster
#~~ Make base map
raster::plot(IPCC_shp[c(3,4,6,7),], # IPCC land regions for Contiguous US.
             col = "lightgray",     # Background color
             border = "black")      # Border color

#~~ Add global coastline
raster::plot(coastlines, 
             col= "maroon", 
             add=TRUE)  
#~~ Add spatial point to the plot
Long=-96.33; Lat=30.62                 # Lat- Long of College Station, TX
points(SpatialPoints(cbind(Long,Lat)), # Lat-Long as Spatial Points
       col="blue", pch=16, cex=1.2)    # Shape, size and color of point

1.5. Reprojection of rasters using projectRaster

A coordinate reference system (CRS) is used to relate locations on Earth (which is a 3-D spheroid) to a 2-D projected map using coordinates (for example latitude and longitude). A good description of coordinate reference systems and their importance can be found here: https://docs.qgis.org/3.4/en/docs/gentle_gis_introduction/coordinate_reference_systems.html https://datacarpentry.org/organization-geospatial/03-crs/



In R, the coordinate reference systems or CRS are commonly specified in EPSG (European Petroleum Survey Group) or PROJ4 format (See: https://epsg.io/).

# Importing SMAP soil moisture data
sm=raster("./SampleData-master/raster_files/SMAP_SM.tif") 

#~~ Projection 1: NAD83 (EPSG: 4269)
sm_proj1 = projectRaster(sm, crs = CRS("+init=epsg:4267"));
raster::plot(sm_proj1, 
             main = "NAD83",    # Title of the plot
             col = mypal,       # Colormap for the plot
             axes = FALSE,      # Disable axes
             box = FALSE,       # Disable box around the plots
             asp = NA,          # No fixed aspect ratio; asp=NA fills plot to window
             legend=FALSE,      # Disable legend
             interpolate=TRUE)  # Interpolate output raster

#~~ Projection 2: Azimuthal equidistant projection centered on Lat=0 and Lon=0
test_proj="+proj=aeqd +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
sm_proj4 = projectRaster(sm, crs = test_proj);
plot(sm_proj4, main="Azimuthal", col=mypal, axes=FALSE, 
     box=FALSE,asp = NA,legend=FALSE,interpolate=TRUE)

Other than the projections demonstrated, try the following:

"+init=epsg:3857" For Mercator (EPSG: 3857): Used in Google Maps, Open Street Maps, etc.
"+proj=robin +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs" for World Robinson Projection
"+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs" for World Mercator projection

We will use a custom function provided with the material to plot the global raster in Robinson projection.

#~~~ Plot a fancier re-projected maps in Robinson projection

#~ User-defined function for plotting
source("./SampleData-master/functions/robin_ras.R") 
robin_plot=robin_ras(sm,                            # Raster to plot
                     map_col=rev(mypal),            # Colormap
                     lim=c(0,0.8),                  # Set min and max z-limits
                     brk=seq(0,0.8,0.2),            # Set breaks in colormap
                     name="Global Surface Soil Moisture from SMAP")   # Name of the plot
robin_plot                          # print the reprojected plot

Useful references: More excellent examples on making maps in R can be found here: https://bookdown.org/nicohahn/making_maps_with_r5/docs/introduction.html. Quintessential resource for reference on charts and plots in R: https://www.r-graph-gallery.com/index.html


CHAPTER 2. Geospatial operations on raster/vector data


2.1. Raster resampling

We will demonstrate how to change the resolution of raster files using raster::aggregate (fine to coarse), raster::disaggregate (coarse to fine), and resampling values from one raster to another using raster::resample function. We will use global aridity and soil moisture rasters for this purpose.

# Original resoluton of raster for reference
res(sm)
## [1] 0.373444 0.373444
#~~ Aggregate raster to coarser resolution
SMcoarse = raster::aggregate(sm,    # Soil moisture raster
                    fact=10,         # Aggregate by x 2
                    fun=mean)       # Function used to aggregate values
plot(SMcoarse)

res(SMcoarse)
## [1] 3.73444 3.73444
#~~ Disaggregate raster to finer resolution
SMfine = raster::disaggregate(sm, # Soil moisture raster
                   fact=3, 
                   method='bilinear') # Interpolation method. Option:""
res(SMfine)
## [1] 0.1244813 0.1244813
#~~ Raster resampling
# Import global aridity raster
aridity=raster("./SampleData-master/raster_files/aridity_36km.tif") 
# Plot aridity map
plot(aridity, 
     col=mypal, 
     interpolate=TRUE, 
     lab.breaks=c(0:5), 
     breaks=c(0:5),
     main="Global aridity map [1=driest, 5= wettest]") 

# Resample aridity raster to coarse resolution  
aridityResamp=raster::resample(aridity,      # Original raster
                       SMcoarse,     # Target resolution raster
                       method='ngb') # bilinear or ngb (nearest neighbor) 
plot(aridityResamp, 
     col=mypal, 
     lab.breaks=c(0:5), 
     breaks=c(0:5),
     main="Resampled aridity to coarser resolution") 

2.2. Raster summary statistics

Arithmetic operations a.k.a Arith-methods (+, -, *, /, ^, %%, %/%) on rasters closely resemble simple vector-like operations. More details on Arith-methods can be found here: https://rdrr.io/cran/raster/man/Arith-methods.html. We will use cellStats function to apply summary statistics and user-defined operations on cells of a raster.

# Simple arithmetic operations
sm2=sm*2
print(sm2) # Try sm2=sm*10, or sm2=sm^2 and see the difference in sm2 values
## class      : RasterLayer 
## dimensions : 456, 964, 439584  (nrow, ncol, ncell)
## resolution : 0.373444, 0.373444  (x, y)
## extent     : -180, 180, -85.24595, 85.0445  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : SMAP_SM 
## values     : 0.03999996, 1.753352  (min, max)
# Summary statistics
quantile(sm,probs=c(.25,.75))
##        25%        75% 
## 0.09521707 0.29220164
# We can use cellStats function to apply user defined operation on rasters. 
cellStats(sm, mean) # Try modal, median, min etc. 
## [1] 0.209402
# User-defined statistics by defining own function
quant_fun = function(x, na.rm=TRUE){ # Remember to add "na.rm" option
  quantile(x, probs = c(0.25, 0.75), na.rm=TRUE)
} 
cellStats(sm, quant_fun) # 25th, and 75th percentile of each layer
##        25%        75% 
## 0.09521707 0.29220164

2.3. Summarizing rasters using shapefles

Let’s explore using a spatial polygon/shapefile for summarizing a raster (in this case, global SMAP soil moisture) by using extract function from the raster library. We will also transform global aridity raster to a polygon using rasterToPolygons function to find the mean soil moisture values for each aridity class.

First, we will use the IPCC shapefile to summarize the soil moisture raster.

###############################################################
#~~~ PART 2.3.1: Using shapefile to summarize a raster
sm_IPCC_df=raster::extract(sm,       # Raster to be summarized
                          IPCC_shp,  # Shapefile/ polygon to summarize the raster
                          df=TRUE,   # Gives the summary statistics as a dataframe
                          fun=mean,  # Desired statistic
                          na.rm=TRUE)# Ignore NA values? TRUE=yes! 

head(sm_IPCC_df)
##   ID   SMAP_SM
## 1  1 0.2545766
## 2  2 0.3839087
## 3  3 0.2345457
## 4  4 0.3788003
## 5  5 0.2383580
## 6  6 0.1475145
###############################################################
#~~~ PART 2.3.2: Extract cell values for each region 
sm_IPCC_list=raster::extract(sm,        # Raster to be summarized
                          IPCC_shp,   # Shapefile/ polygon to summarize the raster
                          df=FALSE,   # Returns a list
                          fun=NULL,   # fun=NULL will output cell values within each region
                          na.rm=TRUE) # Ignore NA values? yes! 

# Apply function on cell values for each region
sm_IPCC_mean=lapply(sm_IPCC_list,mean)                       # Returns a list of regional means 
sm_IPCC_mean=purrr::map(sm_IPCC_list,~ mean(.x, na.rm=TRUE)) # Returns a list of regional means

#~~ Try user defined function
myfun=function (y){return(mean(y, na.rm=TRUE))}    # User defined function for calculating means
#~ Implement function using lapply and map
sm_IPCC_mean=lapply(sm_IPCC_list,myfun)           # Returns a list of regional means 
sm_IPCC_mean=purrr::map(sm_IPCC_list,~ myfun(.x)) # Returns a list of regional means 
sm_IPCC_mean=unlist(sm_IPCC_mean)                 # Unlist to return a vector 

head(sm_IPCC_mean) # Is this the same as the previous result?
## [1] 0.2545766 0.3839087 0.2345457 0.3788003 0.2383580 0.1475145

2.4. DIY: Summarize raster using classified raster

In the next example, we will convert global aridity raster into a polygon based on aridity classification using rasterToPolygons function. Global aridity raster has 5 classes with 5 indicating humid and 1 indicating hyper-arid climate. We will use this polygon to extract values from the raster and summarize soil moisture for each aridity class.

###############################################################
#~~~ PART 2.2.3: Convert a raster to a shapefile
aridity=raster("./SampleData-master/raster_files/aridity_36km.tif") #Global aridity

# Convert raster to shapefile
# NOTE: rgeos package is needed to "dissolve" raster features to polygon
# install.packages("rgeos")
library(rgeos)
arid_poly=raster::rasterToPolygons(aridity, # Input raster
                           na.rm=TRUE,      # Ignore NA values
                           digits=12,       # No. of digits to round the coordinates
                           dissolve=TRUE)   # FALSE will give boundaries for each cell

plot(arid_poly, col=arid_poly$aridity_36km) # Plot aridity polygon

Summarize values of SMAP soil moisture raster for aridity classes:

sm_arid_df=raster::extract(sm,       # Raster to be summarized
                          arid_poly, # Shapefile/ polygon to summarize the raster
                          df=TRUE,   # Gives the summary statistics as a dataframe
                          fun=mean,  # Desired statistic
                          na.rm=TRUE)# Ignore NA values? yes! 

# Lets plot the climate-wise mean of surface soil moisture
plot(sm_arid_df,xaxt = "n",
     xlab="Aridity", ylab="Soil moisture",
     type="b", 
     col="blue", 
     main="Climate-wise mean of surface soil moisture")
axis(1, at=1:5, labels=c("Hyper-arid", "Arid", "Semi-Arid","Sub-humid","Humid"))


CHAPTER 3. Data-cubes: RasterStack/RasterBrick


Whats is a RasterStack/ RasterBrick?

A RasterStack/ RasterBrick (from library raster) is a collection of RasterLayer objects with the same spatial extent and resolution.

Raster brick exist entirely in memory takes longer time to create a raster brick compared to Raster Stack. However, various raster operations are significantly faster for RasterBrick over RasterStacks.

3.1. Create, subset and visualize RasterStack/ RasterBrick

We will create multilayer raster stacks/ bricks for NDVI and SMAP SM from the sample dataset and demonstrate several applications and operations. Working with Raster stack/brick are very similar to working with regular arrays or lists.

###############################################################
#~~~ PART 3.1.1: Create and plot NDVI RasterBrick/ Stack 
library(raster)
library(fields)
library(unikn)

# Location of the NDVI raster files
ndvi_path="./SampleData-master/raster_files/NDVI/" #Specify location of NDVI rasters

# List of all NDVI rasters
ras_list=list.files(ndvi_path,pattern='*.tif',full.names=TRUE)
head(ras_list)
## [1] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-01-01.tif"
## [2] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-01-17.tif"
## [3] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-02-02.tif"
## [4] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-02-18.tif"
## [5] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-03-05.tif"
## [6] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-03-21.tif"

Let’s import NDVI rasters from the location and store as a RasterBrick. We will use lapply, map, and for loop using addlayer function to import rasters from paths stored ras_list. We will also learn how to add shapefiles to the background of all plots made using RasterBrick/Stack.


Expert Note: For memory intensive rasters to brick formation, the “for” loop method may be better to avoid memory bottleneck.


# Method 1: Use lapply to create raster layers from the raster location
ras_stack = lapply(paste(ras_list, sep = ""), raster)
ras_brick = brick(ras_stack)     # Working on a RasterBrick is faster than stacks!!! 

# Method 2: Using pipes to create raster layers from the raster location
library(tidyr) # For piping
ras_stack = ras_list %>%  purrr::map(~ raster(.x))  # Import the raster
ras_brick = brick(ras_stack)  # Convert RasterStack to RasterBrick

# Method 3: Using for loop to create raster layers from the raster location
ras_stack=stack()
for (nRun in 1:length(ras_list)){
  ras_stack=addLayer(ras_stack,raster(ras_list[[nRun]]))
}
ras_brick = brick(ras_stack)  # Convert RasterStack to RasterBrick

# Check dimension of RasterBrick
dim(ras_brick) #[x: y: z]- 23 raster layers with 456 x 964 cells
## [1] 456 964  23
# Number of layers in raster brick
nlayers(ras_brick)
## [1] 23
# Check variable names 
names(ras_brick)
##  [1] "NDVI_resamp_2016.01.01" "NDVI_resamp_2016.01.17" "NDVI_resamp_2016.02.02"
##  [4] "NDVI_resamp_2016.02.18" "NDVI_resamp_2016.03.05" "NDVI_resamp_2016.03.21"
##  [7] "NDVI_resamp_2016.04.06" "NDVI_resamp_2016.04.22" "NDVI_resamp_2016.05.08"
## [10] "NDVI_resamp_2016.05.24" "NDVI_resamp_2016.06.09" "NDVI_resamp_2016.06.25"
## [13] "NDVI_resamp_2016.07.11" "NDVI_resamp_2016.07.27" "NDVI_resamp_2016.08.12"
## [16] "NDVI_resamp_2016.08.28" "NDVI_resamp_2016.09.13" "NDVI_resamp_2016.09.29"
## [19] "NDVI_resamp_2016.10.15" "NDVI_resamp_2016.10.31" "NDVI_resamp_2016.11.16"
## [22] "NDVI_resamp_2016.12.02" "NDVI_resamp_2016.12.18"
# Subset raster stack/brick (notice the double [[]] bracket and similarity to lists)
sub_ras_brick=ras_brick[[c(1,3,5,10,12)]] #Select 1st, 3rd, 5th, 10th and 12th layers

# Plot first 4 elements of NDVI raster brick
addCoastlines=function(){plot(coastlines, add=TRUE)} #Function to add shapefile to a raster plot

plot(sub_ras_brick[[1:4]],         # Select raster layers to plot
     col=brewer.pal(10,"Spectral"),# Specify colormap
     interpolate=TRUE,             # Interpolate the output map
     asp=NA,                       # Aspect ratio= NA
     addfun=addCoastlines)         # Add coastline to each layer

DIY: This time, let’s use levelplot from rasterVis library to plot RasterBrick layers.

mapTheme = rasterTheme(region = brewer.pal(10,"Spectral")) # Set raster theme (color etc.)
my.at=seq(-1,1,0.1)                               # Set colorkey/ colorbar sequence
myColorkey = list(at=my.at, space="right")        # Set colorkey/colorbar to the right

plt= rasterVis::levelplot(sub_ras_brick[[1:4]],   # Select first 4 raster layers to plot
               margin = F,          
               pretty=TRUE,             
               layout=c(2,2),           # Layout of the plots
               row.values=2,            # Number of rows
               par.settings = mapTheme, # Add user define theme
               colorkey=myColorkey,     # Add user defined 
               main="NDVI")
# Add coastline to the plots 
# NOTE: We use "layer" function from "latticeExtra" package. 
# "ggplot2" has a "layer" function too, but with different application.
plt=plt + latticeExtra::layer(sp.lines(coastlines, col="black", lwd=0.1)) 
print(plt)

3.2. Geospatial operations on RasterStack/ RasterBrick

In this section, we will demonstrate application of several operations on RasterStack/RasterBrick for e.g. extracting time series at user-defined locations, spatial data extraction using spatial polygons/raster, crop and masking of raster brick/stack.

3.2.1. Time-series extraction from RasterBrick/Stack

We will demonstrate three ways of extracting time series information from raster stack/brick.

Method 1: By converting raster brick to a spatial data frame using as.data.frame or raster::rasterToPoints functions

###############################################################
#~~~~ Method 1.1: Convert raster brick to spatial data frame
spdf = as(ras_brick, "SpatialPixelsDataFrame")
df_spdf = as.data.frame(spdf) # "x" and "y" in the df are long and lat
df_spdf[1:3,1:6] # Notice values of all raster layers are extracted for each cell
##   NDVI_resamp_2016.01.01 NDVI_resamp_2016.01.17 NDVI_resamp_2016.02.02
## 1                     NA            -0.09864961            -0.06309947
## 2                     NA            -0.11487599            -0.07173216
## 3                     NA            -0.09752892            -0.07198682
##   NDVI_resamp_2016.02.18 NDVI_resamp_2016.03.05 NDVI_resamp_2016.03.21
## 1            -0.03692058            -0.03954601            -0.03983472
## 2            -0.04443348            -0.04195234            -0.03798087
## 3            -0.04526810            -0.04376680            -0.03974375
###############################################################
#~~~~ Method 1.2: Use rasterToPoints function
r.pts = rasterToPoints(ras_brick, spatial=TRUE)
df_spdf = data.frame(long=coordinates(r.pts)[,1],
                         lat=coordinates(r.pts)[,2],
                         r.pts@data)
df_spdf[1:3,1:6] # Notice values of all raster layers are extracted for each cell
##        long      lat NDVI_resamp_2016.01.01 NDVI_resamp_2016.01.17
## 1 -167.4896 66.18558                     NA            -0.09864961
## 2 -167.1162 66.18558                     NA            -0.11487599
## 3 -166.7427 66.18558                     NA            -0.09752892
##   NDVI_resamp_2016.02.02 NDVI_resamp_2016.02.18
## 1            -0.06309947            -0.03692058
## 2            -0.07173216            -0.04443348
## 3            -0.07198682            -0.04526810

Method 2: Using raster::extract function to extract time series data for specific location(s)

###############################################################
#~~~~ Method 2.1: Extract values for a single location

# User defined lat and long
Long=-96.33; Lat=30.62 

# Extract time series for the location
ndvi_val=raster::extract(ras_brick,
                 SpatialPoints(cbind(Long,Lat)), # lat-long as spatial locations
                 method='bilinear')              # or "simple" for nearest neighbor

# Create a dataframe using the dates (derived from raster layer names) and extracted values
ndvi_ts=data.frame(Time=c(1:nlayers(ras_brick)), # Sequence of retrieval time
                   NDVI=as.vector(ndvi_val))     #NDVI values
# Try changing Time to as.Date(substr(colnames(ndvi_val),13,22), format = "%Y.%m.%d")

# Plot NDVI time series extracted from raster brick/stack
plot(ndvi_ts, type="l", col="maroon", ylim=c(0.2,0.8))

Let us use Method 2 (described above) for multiple locations.

#~~~~ Method 2.2:  Extract values for multiple locations
# Import sample locations from contrasting hydroclimate
library(readxl)
loc= read_excel("./SampleData-master/location_points.xlsx")
print(loc)
## # A tibble: 3 x 4
##   Aridity   State     Longitude Latitude
##   <chr>     <chr>         <dbl>    <dbl>
## 1 Humid     Louisiana     -92.7     34.3
## 2 Arid      Nevada       -116.      38.7
## 3 Semi-arid Kansas        -99.8     38.8
# Extract time series using "raster::extract"
loc_ndvi=raster::extract(ras_brick,
                         #Convert lat -long to spatial locations
                         SpatialPoints(cbind(loc$Longitude,loc$Latitude)),
                         # Use bilinear interpolation (or simple) option
                         method='bilinear')

The rows in loc_ndvi variable corresponds to the points specified by the user, namely humid, arid and semi-arid locations. Let’s plot the extracted NDVI for three locations to look for patterns (influence on climate on vegetation).

# Create a new data frame with dates of retrieval and NDVI for different hydroclimates
ndvi_locs = data.frame(Date=as.Date(substr(colnames(ndvi_val),13,22),format = "%Y.%m.%d"),
Humid = as.vector(loc_ndvi[1,]),    #Location 1
Arid = as.vector(loc_ndvi[2,]),     #Location 2
SemiArid = as.vector(loc_ndvi[3,])) #Location 3

# Convert data frame in "long" format for plotting using ggplot
library(tidyr)
df_long=ndvi_locs %>% gather(Climate,NDVI,-Date)
#"-" sign indicates decreasing order
head(df_long,n=4)
##         Date Climate      NDVI
## 1 2016-01-01   Humid 0.5919486
## 2 2016-01-17   Humid 0.5762772
## 3 2016-02-02   Humid 0.5570874
## 4 2016-02-18   Humid 0.5752835
# Plot NDVI for different locations (hydroclimates). Do we see any pattern?? 
library(ggplot2)
l = ggplot(df_long,              # Dataframe with input Dataset 
          aes(x=Date,            # X-variable
              y=NDVI,            # Y-variable
              group=Climate)) +  # Group by climate 
  geom_line(aes(color=Climate))+ # Line color based on climate variable
  geom_point(aes(color=Climate)) # Point color based on climate variable 
print(l)

Method 3: Use rowFromY or rowFromX to find row and column number of the raster for the location.

###############################################################
#~~~~ Method 3: Retrieve values using cell row and column number
row = rowFromY(ras_brick[[1]], Lat)
col = colFromX(ras_brick[[1]], Long)
ras_brick[row,col][1:5]
## [1] 0.5614925 0.5193725 0.4958409 0.5731566 0.6259534

3.2.2. Spatial extraction/summary from RasterBrick/Stack

Let’s try extracting values of RasterBrick (for all layers simultaneously) based on polygon features using extract function.

#~~~~ Method 1: Extract values based on spatial polygons
IPCC_shp = rgdal::readOGR("./SampleData-master/CMIP_land/CMIP_land.shp",verbose = FALSE)

ndvi_sp = raster::extract(ras_brick,IPCC_shp, fun=mean, na.rm=T, df=TRUE, method='bilinear')
head(ndvi_sp,n=3)[1:5]
##   ID NDVI_resamp_2016.01.01 NDVI_resamp_2016.01.17 NDVI_resamp_2016.02.02
## 1  1            -0.08325353            -0.05983509            -0.04461097
## 2  2             0.02765715             0.02737234             0.03172858
## 3  3             0.21922899             0.22114161             0.21826435
##   NDVI_resamp_2016.02.18
## 1            -0.03795020
## 2             0.03084664
## 3             0.27331317

We can also use a classified raster like aridity data we previously used, to extract values corresponding to each class of values using lapply and selective filtering. The two rasters must have same extent and resolution.

#~~~~ Method 2: Extract values based on another raster
aridity = raster("./SampleData-master/raster_files/aridity_36km.tif")
climate_ndvi = list()
# Extracts the time series of NDVI for pixels for each climate region
climate_ndvi = lapply(list(1,2,3,4,5),function(x) (na.omit((ras_brick[aridity==x]))))

# Calculate and store mean NDVI for each climate region
climate_ndvi_mean = lapply(list(1,2,3,4,5),function(x) (mean(ras_brick[aridity==x], na.rm=TRUE)))

plot(unlist(climate_ndvi_mean),type = "b")

3.2.3. Crop and mask RasterBrick/RasterStack

We will now crop and mask raster brick using shapefile from the disk or from imported shapefile from raster::getData. crop function clips the raster to the extent provided. mask removes the area outside the provided shapefile. Alternatively, we can import shapefile from disk using rgdal::readOGR function.

# Import polygons for polygons for USA at level 1 i.e. state from disk
library(raster)
state_shapefile = rgdal::readOGR("./SampleData-master/USA_states/cb_2018_us_state_5m.shp",verbose = FALSE)

# Alternatively, try downloading level 1 (i.e. state level) USA polygons from GADM dataset online
# state_shapefile = raster::getData("GADM",country="USA",level=1) # Depends host server availability 

# Print variable names
names(state_shapefile)
## [1] "STATEFP"  "STATENS"  "AFFGEOID" "GEOID"    "STUSPS"   "NAME"     "LSAD"    
## [8] "ALAND"    "AWATER"
# Print state/ territory names
print(state_shapefile$NAME)
##  [1] "Nebraska"                                    
##  [2] "Washington"                                  
##  [3] "New Mexico"                                  
##  [4] "South Dakota"                                
##  [5] "Texas"                                       
##  [6] "California"                                  
##  [7] "Kentucky"                                    
##  [8] "Ohio"                                        
##  [9] "Alabama"                                     
## [10] "Georgia"                                     
## [11] "Wisconsin"                                   
## [12] "Oregon"                                      
## [13] "Pennsylvania"                                
## [14] "Mississippi"                                 
## [15] "Missouri"                                    
## [16] "North Carolina"                              
## [17] "Oklahoma"                                    
## [18] "West Virginia"                               
## [19] "New York"                                    
## [20] "Indiana"                                     
## [21] "Kansas"                                      
## [22] "Idaho"                                       
## [23] "Nevada"                                      
## [24] "Vermont"                                     
## [25] "Montana"                                     
## [26] "Minnesota"                                   
## [27] "North Dakota"                                
## [28] "Hawaii"                                      
## [29] "Arizona"                                     
## [30] "Delaware"                                    
## [31] "Rhode Island"                                
## [32] "Colorado"                                    
## [33] "Utah"                                        
## [34] "Virginia"                                    
## [35] "Wyoming"                                     
## [36] "Louisiana"                                   
## [37] "Michigan"                                    
## [38] "Massachusetts"                               
## [39] "Florida"                                     
## [40] "United States Virgin Islands"                
## [41] "Connecticut"                                 
## [42] "New Jersey"                                  
## [43] "Maryland"                                    
## [44] "South Carolina"                              
## [45] "Maine"                                       
## [46] "New Hampshire"                               
## [47] "District of Columbia"                        
## [48] "Guam"                                        
## [49] "Commonwealth of the Northern Mariana Islands"
## [50] "American Samoa"                              
## [51] "Iowa"                                        
## [52] "Puerto Rico"                                 
## [53] "Arkansas"                                    
## [54] "Tennessee"                                   
## [55] "Illinois"                                    
## [56] "Alaska"
# Exclude states outside of CONUS 
conus = state_shapefile[!(state_shapefile$NAME %in% c("Alaska","Hawaii","American Samoa",
                        "United States Virgin Islands","Guam", "Puerto Rico",
                        "Commonwealth of the Northern Mariana Islands")),] 
plot(conus)

# Crop rasters using USA polygon
usa_crop = crop(ras_brick, extent(conus))       # Crop
plot(usa_crop[[1:4]],                           # Cropped rasters   
     col=mypal,                                 # Color pal.
     addfun=function(){plot(conus, add=TRUE)}   # Add shapefile to the plots
     )

# Mask rasters using USA polygon
ndvi_mask_usa = raster::mask(usa_crop,conus)    # Mask
plot(ndvi_mask_usa[[1:4]],                      # Masked rasters   
     col=mypal,                                 # Color pal.
     addfun=function(){plot(conus, add=TRUE)}   # Add shapefile to the plots
     )

Let’s crop raster for selected states.

# Mask raster for selected states
states = c('Colorado','Texas','New Mexico')
# Subset shapefile for selected attributes 
state_plot = state_shapefile[(state_shapefile$NAME %in% states),]  
states_trim = crop(ras_brick, extent(state_plot))       # Crop raster
ndvi_mask_states = raster::mask(states_trim,state_plot) # Mask

# Plot raster and shapefile
plot(ndvi_mask_states[[1]],col=mypal);
plot(state_plot, add=TRUE)

3.3. Layer-wise operations on RasterStack/ RasterBrick

We will demonstrate come layer-wise arithmetic operations, summary statistics and data transformation on RasterStacks/RasterBricks.

Arithmetic operations a.k.a Arith-methods (+, -, *, /, ^, %%, %/%) on RasterStacks/RasterBricks closely resemble simple vector-like operations. More details on Arith-methods can be found here: https://rdrr.io/cran/raster/man/Arith-methods.html. For layer-wise summary statistics and user-defined operations, we will use cellStats function.

Let’s try some layer-wise cell-statistics:

###############################################################
#~~~ PART 3.3.1: Layer-wise cell-statistics 
# Layer-wise Mean
cellStats(sub_ras_brick, mean) # Mean of each raster layer. Try modal, median, min etc. 
## NDVI_resamp_2016.01.01 NDVI_resamp_2016.02.02 NDVI_resamp_2016.03.05 
##              0.2840846              0.2948298              0.3057182 
## NDVI_resamp_2016.05.24 NDVI_resamp_2016.06.25 
##              0.4618550              0.4997354
# Layer-wise quantiles
quantile(sub_ras_brick,probs=c(.25,.75))
##                               25%       75%
## NDVI_resamp_2016.01.01 0.07200430 0.5136818
## NDVI_resamp_2016.02.02 0.09046099 0.5258916
## NDVI_resamp_2016.03.05 0.10357170 0.5185570
## NDVI_resamp_2016.05.24 0.23308019 0.6722358
## NDVI_resamp_2016.06.25 0.25382688 0.7282458
# User-defined statistics by defining own function
quant_fun = function(x, na.rm=TRUE) # Remember to add "na.rm" option
                    {quantile(x, probs = c(0.25, 0.75), na.rm=TRUE)} 
cellStats(sub_ras_brick, quant_fun) # 25th, and 75th percentile of each layer
##     NDVI_resamp_2016.01.01 NDVI_resamp_2016.02.02 NDVI_resamp_2016.03.05
## 25%             0.07188153              0.0844293             0.09917563
## 75%             0.51358944              0.5177374             0.51130731
##     NDVI_resamp_2016.05.24 NDVI_resamp_2016.06.25
## 25%              0.2367169              0.2600064
## 75%              0.6695234              0.7303487
# Custom function for mean, variance and skewness
my_fun = function(x, na.rm=TRUE){ # Remember to add "na.rm" option
  meanVal=mean(x, na.rm=TRUE)              # Mean 
  varVal=var(x, na.rm=TRUE)                # Variance
  skewVal=moments::skewness(x, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)         # Combine all statistics
  return(output)                           # Return output
} 
cellStats(sub_ras_brick, my_fun) # Mean, Variance and skewness of each layer
##      NDVI_resamp_2016.01.01 NDVI_resamp_2016.02.02 NDVI_resamp_2016.03.05
## [1,]             0.28408458             0.29482983             0.30571816
## [2,]             0.07224034             0.06677124             0.06171894
## [3,]             0.54099728             0.50752435             0.50182848
##      NDVI_resamp_2016.05.24 NDVI_resamp_2016.06.25
## [1,]             0.46185503             0.49973536
## [2,]             0.05944947             0.06587261
## [3,]            -0.27219413            -0.33922138

Global cell-statistics can be calculated by converting raster stack to a vector and then using regular statistics functions.

###############################################################
#~~~ PART 3.3.2: Global cell-statistics

min_val=min(raster::minValue(sub_ras_brick))
max_val=max(raster::maxValue(sub_ras_brick))
print(c(min_val,max_val))
## [1] -0.20000  0.95065
# Or by vectorizing the RasterBrick and finding statistics
min_val=min(as.vector(sub_ras_brick), na.rm=T) 
max_val=max(as.vector(sub_ras_brick), na.rm=T)
print(c(min_val,max_val))
## [1] -0.20000  0.95065
# Another example of statistics of vectorized RasterBrick
quant=quantile(as.vector(sub_ras_brick), c(0.01,0.99), na.rm=TRUE) #1st and 99th percentiles
print(quant)
##          1%         99% 
## -0.05663428  0.84776928

Layer-wise arithmetic operations on RasterStack/RasterBrick:

###############################################################
#~~~ PART 3.3.3: Layer-wise operations on RasterStack/RasterBrick

# Arithmetic operations on RasterBrick/ RatserStacks are same as lists
add = sub_ras_brick+10                  # Add a number to raster layers
mult = sub_ras_brick*5                  # Multiply a number to raster layers
subset_mult = sub_ras_brick[[1:3]]*10   # Multiply a number to a subset of raster layers

# Data filtering based on cell-value
filter_brick = sub_ras_brick            # Create a RasterBrick to operate on
filter_brick[filter_brick<0.5] = NA     # Assign NA to any value less than 0.5

# Let's plot the filtered rasters
plot(filter_brick, 
     asp=NA,                         # Aspect ratio. NA= No white space
     colNA="gray95",                 # Background color
     nr=2,                           # Number of rows to arrange the plots
     addfun= addCoastlines)          # Add coastlines to each plot                   

# Layer-wise Log-transformation
log_ras_brick=log(sub_ras_brick) 

# Normalize raster layers to [0,1] based on min and max of raster brick/stack
norm_brk=(sub_ras_brick-min_val)/(max_val-min_val) # Notice that the values are between [0,1]

# Plot in Robinson projection
source("./SampleData-master/functions/robin_ras.R")  # User defined function
ndvi_robin_plot = robin_ras(norm_brk[[1]],           # Select first raster
                     map_col=rev(mypal),             # Colormap
                     lim=c(0,1),                     # Set min and max z-limits
                     brk=c(0, 0.2, 0.4, 0.6, 0.8, 1),# Set breaks in colormap
                     name="Normalized NDVI")         # Name of the plot
ndvi_robin_plot # print the reprojected plot

3.4. Cell-wise operations with stackApply and calc

We will now carry-out cell-wise operations on raster stack/brick. These functions carry operations on each cell of a RasterStack/RasterBrick for all layers.

While we can use some common summary statistical functions (mean/ median, min, max etc) directly, stackApply and calc will be used for more complex operations.

# Mean of all cells across layers
mean_ras = mean(sub_ras_brick, na.rm=TRUE)  #Try modal, median, min etc.

# Approximate NA values by interpolating between Layers
approx_brick = approxNA(sub_ras_brick,method="linear") #Try "constant"

3.4.1. Stackapply

What if we have a more complex function to apply to the cells? To apply user defined functions, raster::stackApply and raster::calc can come handy.


Expert Note: raster::stackApply is used when the output of the function is a vector of size 1. The result of using raster::stackApply on a RasterStack/RasterBrick would be a single raster. We can apply raster::stackApply function on subsets of original RasterStack/RasterBrick by using the option indices. The output is saved as a RasterStack, with each index class generating its own output layer.


raster::calc is more versatile and can handle functions with output of length>1. raster::calc can be easily parallelized with the help of clusterR package. Let’s see what this means!!

#~~ A user-defined function for mean, variance and skewness
my_fun = function(x, na.rm=TRUE){ # Remember to add "na.rm" option
  meanVal=mean(x, na.rm=TRUE)              # Mean 
  return(meanVal)                          # Return output
} 

#~~ Using "stackApply" to apply "quant_fun"
# "Indices" specifies the group/subset of layers on which to apply the "fun". 
# Same index for ALL layers will mean function applies to ALL layers. 

stat_ras = raster::stackApply(sub_ras_brick, 
          fun=my_fun,                             # User defined function
          indices=rep(1, nlayers(sub_ras_brick))) # function applied to All layers

# Lets plot the two output rasters
plot(stat_ras,             # Layers to plot
     asp=NA,               # Aspect ratio. NA= No white space
     colNA="gray95",       # Background color
     nr=2,                 # Number of rows to arrange the plots
     addfun=addCoastlines) # Add coastlines to each plot 

# Now, let's have the first two layers as group 1 and the next three as group 2.
# Function will be applied to each group separately and 2 layers of output will be generated. 
stat_ras = raster::stackApply(sub_ras_brick, 
                             fun=my_fun,
                             indices=c(1,1,2,2,2)) 
names(stat_ras) = c("Mean_of_rasters_1_to_2", "Mean_of_rasters_3_to_5")
# Two layers are formed, one for each group of indices

# Lets plot the two output rasters
plot(stat_ras,            # Layers to plot
     asp=NA,               # Aspect ratio. NA= No white space
     colNA="gray95",       # Background color
     nr=2,                 # Number of rows to arrange the plots
     addfun=addCoastlines) # Add coastlines to each plot            

3.4.2. Calc

raster::calc can handle a modification of the user defined function,with several outputs. Let’s see what it means. We will change the my_fun slightly to calculate mean, variance and skewness of the data (length of the output =3).

#~~ Custom function for mean, variance and skewness
my_fun = function(x, na.rm=TRUE){     # Notice:"na.rm" not required 
  meanVal=mean(x, na.rm=TRUE)              # Mean 
  varVal=var(x, na.rm=TRUE)                # Variance
  skewVal=moments::skewness(x, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)         # Combine all statistics
  return(output)                           # Return output
} 

# Using "calc" function for user defined operations 
stat_brk=calc(sub_ras_brick,      # RasterStack/ RasterBrick
               fun = my_fun,      # Function to be used
               progress='text')   # Progress bar

nlayers(stat_brk)   # Notice the number of output layers = 3
## [1] 3
# Name corresponding layers
names(stat_brk)=c("Mean", "Variance", "Skewness") 
# Plot output raster
plot(stat_brk, col=mypal)


CHAPTER 4. Parallel computation for geospatial analysis


This would be a good time to introduce ourselves to NetCDF (Network Common Data Form) data format. Most large-scale climate/ satellite datasets are commonly provided in NetCDF or HDF (Hierarchical Data Format), which are machine-independent data formats for creation, access, and sharing of array-oriented scientific data. In our context, NetCDF or HDFfiles contain 3-D gridded geospatial dataset with information on lat-long, extent, resolution and time. These files can be imported in R as RasterStack/Bricks.

SMAPBrk=brick("./SampleData-master/SMAP_L3_USA.nc")
plot(mean(SMAPBrk, na.rm=TRUE), asp=NA)

4.1. Cellwise implimentation of functions

4.1.1. Apply custom function to pixel time series

Once we have imported the NetCDF file as raster brick, we wil apply a slightly modified version of previously used function my_fun for calculating mean, variance and skewness for time series data for each cell in parallel. We will use raster::clusterR function to apply my_fun on RasterBrick in parallel.


Expert Note: For seamless implementation of function in parallel mode, care must be taken that all necessary are accessible to ALL cores and error exceptions are handles appropriately. We will modify my_fun slightly to highlight what it means in practice.


  1. We will convert input x to a numeric array
  2. We will remove NA values from dataset before calculation
  3. We will use minSamp to fix minimum sample counts for calculation
  4. We will use tryCatch to handle error exceptions
#~~ We will make some changes in the custom function for mean, variance and skewness
minSamp=50   # Minimum assured samples for statistics

my_fun = function(x, na.rm=TRUE){    
  smTS=as.numeric(as.vector(x))     # Convert dataset to numeric array
  smTS=as.numeric(na.omit(smTS))    # Omit NA values 
  
  # Implement function with trycatch for catching exception 
  tryCatch(if(length(smTS)>minSamp) {      # Apply minimum sample filter
  
  ######## OPERATION BEGINS #############    
  meanVal=mean(smTS, na.rm=TRUE)              # Mean 
  varVal=var(smTS, na.rm=TRUE)                # Variance
  skewVal=moments::skewness(smTS, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)         # Combine all statistics
  return(output)                           # Return output
  ######## OPERATION ENDS #############    

  } else {
    return(rep(NA,3))   # If conditions !=TRUE, return array with NA
  },error =function(e){return(rep(NA, 3))}) # If Error== TRUE, return array with NA
}


# Apply function to all grids in parallel
library(tictoc)
library(parallel)
numCores=detectCores()-1; # Always leave one node free for housekeeping tasks
beginCluster(n=numCores)           # Begin cluster with n nodes
tic()
StatOut=clusterR(SMAPBrk,        # RasterBrick
            calc,                  # Apply *raster::calc* in parallel
            args=list(fun=my_fun), # Export function to all nodes
            export=c("minSamp"),   # Export necessary variables to all cores
            progress='text')             # Progressbar as text
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |======================================================================| 100%
## 
toc()
## 3.4 sec elapsed
endCluster()                       # End open cluster 
names(StatOut)=c("Mean", "Variance", "Skewness")

# Apply function using Calc to see the difference in computing time
tic();
StatOutCalc=calc(SMAPBrk,my_fun)
toc()
## 3.86 sec elapsed
names(StatOutCalc)=c("Mean", "Variance", "Skewness")
plot(StatOutCalc)

4.2.1. Best practices for large-scale operations

  1. Try parallel operation on a smaller region before submitting large jobs. Pixel-wise implementation of the function can help identify errors in the code. Convert the cropped region into a data frame and apply function to time series of each cell. If your code throws error, troubleshoot carefully for the series which generates the error.
# Crop a small region (2x2 degrees)
e = extent( c(-100, -98, 43, 45)) #Xmin, Xmax, Ymin, Ymax
smapCrp=raster::crop(smapBrk, e)  # Try "my_fun" over "smapCrp" with clusterR as shown above

# Pixel-wise test (for trouble shooting error sources)
smDF=raster::as.data.frame(smapCrp)

parList=list()
for (i in 1:nrow(smDF)){
   print(i)
   parList[i]=my_fun(smDF[i,])
}
  1. Use tryCatch carefully as it may suppress legitimate errors as well, generating spurious results. Test the codes for smaller region without tryCatch to test the robustness of your codes.
  2. Use global assignment <<- instead of <- or = for variables called in a function within a funtion used with clusterR.

Expert Note: Parallel computing may have some overheads upon creation and closing of clusters. The a significant improvement in computing times using parallel techniques would be visible for large jobs.


4.2. Layerwise implimentation of functions

4.2.1. RasterBrick as a lists

We will convert RasterBrick to a list of rasters and then we will apply my_fun to each element of the list in parallel using future_lapply.

# Convert RasterBrick to a list of rasters
rasList=as.list(SMAPBrk)
length(rasList)
## [1] 1188
# Test the function for one raster
my_fun(rasList[[100]])
## [1] 0.144378162 0.006185985 1.656570393
# Apply function in parallel for all layers
library(parallel) 
library(future.apply) 
plan(multiprocess, workers = detectCores() - 1)
outStat= future_lapply(rasList, my_fun)

# Test output for one layer
outStat[[100]]
## [1] 0.144378162 0.006185985 1.656570393

Expert Note: Special options can be used to remove memory bottleneck when using future_lapply. ***

# Allocate 2.2 GB memory for computing 
# 2200 MB*(1024^2) = 2306867200
options(future.globals.maxSize= 2097152000) 

4.2.2. Blockwise summary of feature extracted data

In this section we will use a shapefile to extract cell values from a RasterBrick as a list using exact_extract. Summary statistics will be calculates in parallel using my_fun for dataset for each feature.


Expert Note: Function exactextractr::exact_extract is faster and more suited for large applications compared to raster::extract. Although both perform similar operation with little changes in output format


#~ Extract feature data as data frame
library(exactextractr)
library(sf)
library(sp)
featureData=exact_extract(SMAPBrk,   # Raster brick 
                st_as_sf(conus),     # Convert shapefile to sf (simple feature)
                force_df = FALSE,    # Output as a data.frame?
                include_xy = FALSE,  # Incluse cell lat-long in output?
                fun = NULL,          # Specify any function to apply for each feature extracted data
                progress = TRUE)     # Progressbar
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%
length(featureData) # Same as feature count in CONUS? i.e. nrow(conus) 
## [1] 49
# Lets try out data for Texas
which(conus$NAME=="Texas")  # Find feature number for Texas
## [1] 5
# View(featureData[[5]])    # View the extracted data frame
nrow(featureData[[5]])      # No. pixels within selected feature
## [1] 694

Each row in featureData[[5]] is the time series of cell values which fall within the boundary of feature number 5, i.e. Texas. Since exact_extract function provides coverage_fraction for each pixel in the output, we will make some minor change in the my_fun function to remove this variable before calculating the statistics.

# Plot SM time series for first pixel by removing percentage fraction
cellTS=as.numeric(featureData[[5]][1,1:nlayers(SMAPBrk)])
#
plot(cellTS, type="l", xlab="Time", ylab="Soil moisture")

#~~ We will make another small change in the custom function for mean, variance and skewness
minSamp=50   # Minimum assured samples for statistics

my_fun = function(x, na.rm=TRUE){    
  xDF=data.frame(x)                  # Convert list to data frame
  xDF=xDF[ , !(names(xDF) %in% 'coverage_fraction')] # Remove coverage_fraction column
  xData=as.vector(as.matrix(xDF))    # Convert data.frame to 1-D matrix
  smTS=as.numeric(na.omit(xData))    # Omit NA values                   
  
  # Implement function with trycatch for catching exception 
  tryCatch(if(length(smTS)>minSamp) {      # Apply minimum sample filter
  
  ######## OPERATION BEGINS #############    
  meanVal=mean(smTS, na.rm=TRUE)              # Mean 
  varVal=var(smTS, na.rm=TRUE)                # Variance
  skewVal=moments::skewness(smTS, na.rm=TRUE) # Skewness
  output=c(meanVal,varVal,skewVal)         # Combine all statistics
  return(output)                           # Return output
  ######## OPERATION ENDS #############    

  } else {
    return(rep(NA,3))   # If conditions !=TRUE, return array with NA
  },error =function(e){return(rep(NA, 3))}) # If Error== TRUE, return array with NA
}

Let’s apply my_fun to extracted data for each feature.

# Test the function for one block
my_fun(featureData[[5]])
## [1] 0.1727930 0.0095579 0.9876028
# Apply function in parallel for all layers
library(parallel) 
library(snow)
library(future.apply) 
plan(multiprocess, workers = detectCores() - 1)
outStat= future_lapply(featureData, my_fun)

# Test output for one feature
outStat[[5]]  # Is this the same as before?
## [1] 0.1727930 0.0095579 0.9876028
# Extract each summary stats for all features from the output list  
FeatureMean=sapply(outStat,"[[",1)  # Extract mean for all features
FeatureVar=sapply(outStat,"[[",2)   # Extract variance for all features
FeatureSkew=sapply(outStat,"[[",3)  # Extract skewness for all features

# Let's place mean statistics as an attribute to the shapefile
conus$meanSM=FeatureMean

# Plot mean soil moisture map for CONUS 
library(rcartocolor)
library(ggplot2)
library(sf)
library(sp)

mean_map=ggplot() + 
  geom_sf(data = st_as_sf(conus), # CONUS shp as sf object (simple feature)
          aes(fill = meanSM)) +   # Plot fill color= mean soil moisture
  scale_fill_carto_c(palette = "BluYl",     # Using carto color palette
                     name = "Mean SM",      # Legend name
                     na.value = "#e9e9e9",  # Fill values for NA 
                     direction = 1)+        # To invert color, use -1
  coord_sf(crs = 2163)+   # Reprojecting polygon 4326 or 3083 
  theme_void() +          # Plot theme. Try: theme_bw
  theme(legend.position = c(0.2, 0.1),  
        legend.direction = "horizontal",
        legend.key.width = unit(5, "mm"),
        legend.key.height = unit(4, "mm"))
mean_map

DIY: Variable extraction from netCDF using ncdf4 package.

# Extract Time variable from NetCDF file
library(ncdf4)
nc = nc_open("./SampleData-master/SMAP_L3_USA.nc")
names(nc$var) # Variable names
names(nc$dim) # Dimension attributes
# Extract time variable
retrieval_date=ncvar_get(nc,"Time")
head(retrieval_date)
# Time to datetime format
retrieval_date=as.Date(retrieval_date, origin="1970-01-01") 
nc_close(nc) # Close connection to NC file
# Add Retrieval dates as layer names in SMAPBrk
names(SMAPBrk)=retrieval_date

plot(SMAPBrk[[c(1,10,20,27)]])

CHAPTER 5. Supplementary material




5.1. Exporting RasterBrick to NetCDF Data

RasterBricks are significantly faster in computation compared to RasterStacks. Hence, its convenient to store the model outputs as NetCDFs which can be imported as rasterBricks later for further analysis. Here we will look at the method to export RasterBrick to NetCDF. The memory hack provided below is not always mandatory. However, it can be helpful if you are working with heavy datasets.


Expert Note: a) NetCDF write speeds are significantly slower on cloud platforms like GoogleDrive or OneDrive. Hence, its recommended to export NetCDF files to local disk for faster write speeds. b) Before writing NetCDF file, convert your RasterStack (if any) to RasterBrick for faster write speed.


rb=brick("./SampleData-master/SMAP_L3_USA.nc")
r=rb[[1]] # raster taken from a first layer of a stack
xy<-xyFromCell(r,1:length(r)) #matrix of logitudes (x) and latitudes(y)
lon=unique(xy[,1])
lat=unique(xy[,2])

#~ hack raster internal function
cs_orig <- raster:::.chunk
cs_hack <- function(x) getOption("rasterChunkSize")
assignInNamespace(".chunk", cs_hack, ns = "raster")
rasterOptions(memfrac = 0.9,
              maxmemory=10e+11,
              chunksize =10e+10,
              todisk=TRUE)

# first we write the netcdf file to disk
writeRaster(rb,
            "C:/Users/vinit/Documents/SMAP_L3_myExport.nc",
            overwrite=TRUE,
            format="CDF",
            varname="soil_moisture",
            varunit="[m3/m3]",
            longname="SMAPL3-V7 SM, 2-day interpolated, 36KM",
            xname="Longitude",
            yname="Latitude",
            zname='Time',
            zunit='Date of retrieval',
            progress="text")

#### Adding lat, long and time to netCDF
# note that we use write=TRUE so that we can change it
library(ncdf4)
nc = nc_open('SMAP_L3_myExport.nc', write = TRUE)

export_dates=seq(as.Date("2015-03-31"),as.Date("2021-09-30"),by=2)
# put additional attributes into dimension and data variables
zvals =as.Date(export_dates) # Add zvals as dates
ncdf4::ncvar_put(nc,'Time',zvals)
ncdf4::ncvar_put(nc, "Longitude", lon)
ncdf4::ncvar_put(nc, "Latitude", lat)
nc_close(nc)

#~~ undo the hack
assignInNamespace(".chunk", cs_orig, ns = "raster")
rasterOptions(default = TRUE)

5.2. Raster manipulation using terra

Terra is a new library to manipulate geographic (spatial) data which improves upon the raster package in terms of speed. In this workshop, we will focus on raster package due to the availability of better legacy resources online.

In most cases, the use of functions from terra library is similar to raster. To make raster plots using terra, we use terra::plot. Raster objects in terra package are imported using rast function. Equivalent terra function names for every raster function can be found in Section XXIII in the following document:
https://www.rdocumentation.org/packages/terra/versions/1.2-5/topics/terra-package

As with raster package, there are numerous customizable options for raster plots in terra. Try type=interval, or horiz=TRUE, or change cex to 1. Use breaks argument only when type is interval or classes.

# Import raster from the downloaded folder using rast
sm=terra::rast("./SampleData-master/raster_files/SMAP_SM.tif") # SMAP soil moisture data
 # Plot raster using plot function in terra package
terra::plot(sm, main="Soil Moisture")


smRast=terra::rast("./SampleData-master/raster_files/SMAP_SM.tif") # SMAP soil moisture data

terra::plot(smRast, 
     main="Terra: New fancy plot",         # Title of the plot
     # Color options      
     col=brewer.pal(10,"Spectral"), # Custom colormap 
     colNA="lightgray",             # Color of cells with NA values
     
     # Axis options      
     axes=TRUE,              # Plot axes: TRUE/ FALSE
     range=c(0,1),           # Z-axis limit. Raster pkg uses zlim

     # Legend options  
     type="continuous",  # continuous, interval, classes
     legend='right',     # Legend position (topright,bottomleft, right etc)
     #breaks=seq(0, 1, by=0.2),  # Sequence from 0-1 with 0.2 increment
     plg=list(horiz=FALSE,   # Plot legend horizontally: TRUE/ FALSE
              cex=0.7,       # Legend text size 
              title = 'v/v', # Legend text
              ncol=2),       # No. of columns of legend labels
     # Miscellaneous
     asp = NA,         # Aspect ratio; asp=NA fills plot to window
     smooth=TRUE       # Interpolate output plot
)
# To add grid to the plot
# grid(col="darkgray")

5.3. Updating R using RStudio

The operations in this tutorial are based on R version 4.1.1- Kick Things and version 4.0.3 - Bunny-Wunnies Freak Out. If necessary, update R from Rstudio using the updateR function from the installr package.

install.packages("installr")
library(installr)
updateR(keep_install_file=TRUE)

Follow Part 1: Data Visualization with R for a refresher on ggplot2, wordcloud, OfficeR and text mining.


Data Visualization and Geospatial Analysis With R (2021)
Vinit Sehgal | Leah Kocian | Shubham Jain | Alan Lewis
Texas A&M University
Correspondence: (https://orcid.org/0000-0002-8837-5864)