Data Visualization and Geospatial Analysis With R
Large-scale Geospatial Analysis (2021)
Taking examples from global satellite data in gridded/ raster format
, we will demonstrate several common geospatial operations like projections
, resampling
, cropping
, masking
, spatial extraction
etc. using rasters, shapefiles, and spatial data frames. For a seamless analysis across different datatypes and platforms, conversion from/to different data formats like data frames, matrices, rasters, and structured data like NetCDF
will be discussed. Advanced topics will include working with data cubes (RasterStack
/ RasterBrick
), layer-wise operations on data cubes, cell-wise operations on raster time series by implementing user-defined functions with stackApply
and calc
. Parallel implementation of custom functions will be demonstrated for large-scale datasets.
Course level: Advanced. Prior experience in R required.
Materials: For this excercise, we will use global dataset from two NASA
satellites- MODIS for NDVI
and SMAP for surface (~5cm) soil moisture
. We will use global aridity estimates (from [Global Aridity and PET Database] (https://cgiarcsi.community/data/global-aridity-and-pet-database/) as a general climate
classification (Hyper-arid, arid, semi-arid, sub-humid and humid).
The codes are tested on R version 4.1.1- Kick Things
and version 4.0.3 - Bunny-Wunnies Freak Out
. To update, see section 5.3.
CHAPTER 0. Import libraries and sample dataset
We will begin by loading necessary libraries. The sample dataset for this excercise can be downloaded manually from
GitHub
by accessing https://github.com/Vinit-Sehgal/SampleData
Alternatively, use the following code to download and extract the sample data from GitHub
repository.
Install all necessary packages (Run once).
###############################################################
#~~~ Load required libraries
lib_names=c("raster","ggplot2","unikn","mapview",
"gridExtra","rgdal","fields",
"RColorBrewer","ncdf4","rasterVis",
"rcartocolor","pacman","purrr","moments","tictoc",
"sf", "sp", "exactextractr","readxl",
"snow","future.apply","parallel")
invisible(suppressMessages
(suppressWarnings
(lapply
(lib_names,install.packages, character.only = T))))
# Load necessary packages
invisible(suppressMessages
(suppressWarnings
(lapply
(lib_names,require, character.only = T))))
# An easy way to load multiple packages is through pacman::p_load
# pacman::p_load("raster","ggplot2","unikn","mapview",
# "gridExtra","rgdal","fields",
# "RColorBrewer","ncdf4","rasterVis", "moments", "tictoc")
We will now download the workshop repository, which contains all data we will use for this exercise.
###############################################################
#~~~ Import sample data from GitHub repository
download.file(url = "https://github.com/Vinit-Sehgal/SampleData/archive/master.zip",
destfile = "SampleData-master.zip") # Download ".Zip"
# Unzip the downloaded .zip file
unzip(zipfile = "SampleData-master.zip")
# getwd() # Current working directory
list.files("./SampleData-master") # list folder contents
## [1] "aridity_36km.tif" "CMIP_land"
## [3] "desktop.ini" "functions"
## [5] "GADM_2.8_USA_adm1.rds" "HUC4"
## [7] "images" "location_points.xlsx"
## [9] "mat2ras.R" "ne_10m_coastline"
## [11] "precip.V1.0.2020.nc" "raster_files"
## [13] "README.md" "robin_ras.R"
## [15] "sample_pdfs" "SM_RE06_MIR_CDF3SA_20181225.nc"
## [17] "SMAP_L3_USA.nc" "SMAP_SM.tif"
## [19] "SMAPL4_H5" "SMOS_nc"
## [21] "SPL3SMP_2020.nc" "TreeRingData.csv"
## [23] "USA_states" "Workbook_DVGAR-Part1.html"
## [25] "Workbook_DVGAR-Part2.html"
CHAPTER 1. Raster and shapefile visualization
A geographic information system
, or GIS
refers to a platform which can map, analyzes and manipulate geographically referenced dataset. A geographically referenced data (or geo-referenced data) is a spatial dataset which can be related to a point on Earth with the help of geographic coordinates. Types of geo-referenced spatial data include: rasters (grids of regularly sized pixels) and vectors (polygons, lines, points).
A quick and helpful review of spatial data can be found here: https://spatialvision.com.au/blog-raster-and-vector-data-in-gis/
1.1. Plotting raster data
In this section, we will plot global raster data of surface (~5 cm) soil moisture from SMAP. In this proess we will explore functions from raster
, ggplot2
and mapview
packages.
Let’s start by importing the necessary packages.
# For raster operations
library(raster)
library(mapview)
# Libraries for fancy color palettes
library(RColorBrewer)
library(unikn)
library(rcartocolor)
# Import SMAP soil moisture raster from the downloaded folder
sm=raster("./SampleData-master/raster_files/SMAP_SM.tif")
Once we have imported the raster. Let’s check its attributes. Notice the dimensions
, resolution
, extent
, crs
i.e. coordinate reference system and values
.
## class : RasterLayer
## dimensions : 456, 964, 439584 (nrow, ncol, ncell)
## resolution : 0.373444, 0.373444 (x, y)
## extent : -180, 180, -85.24595, 85.0445 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : G:/My Drive/TAMU/Teaching/DataVisWorkshop/DataVisWorkshop2021/PrepFolderDataVizard2021/Part2_2021/SampleData-master/raster_files/SMAP_SM.tif
## names : SMAP_SM
## values : 0.01999998, 0.8766761 (min, max)
# Try
# extent(sm) # Geographic extent of the raster
# res(sm) # X-Y resolution of the raster
# crs(sm) # coordinate reference system
Now let’s plot the raster using raster::plot
. Interactive plots can be made by using mapview
function.
1.2. Customizing raster
plot options
1.2.1 Color palettes
We will generate custom color palettes for better visualization. We will demonstrate use of RColorBrewer
(https://colorbrewer2.org; https://rdrr.io/cran/RColorBrewer/man/ColorBrewer.html), unikn
(https://github.com/hneth/unikn) and rcartocolor
(https://carto.com/carto-colors/) libraries.
For more on colors in R, follow: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
# Make custom color palette
#~~ A) User defined color palette using unikn library
mypal=unikn::seecol(pal = usecol(c("brown","orange","yellow","darkgreen","cyan","darkblue","black"), n = 20))
#~~ B) User defined color palette using carto library
mypal = carto_pal(20, "Fall")
unikn::seecol(pal=mypal, n=20) # View the new color palettes
#~~ C) Use "brewer.pal" function from RColorBrewer
mypal= brewer.pal(11,"Spectral") # Max colors available in the palette=11
mypal= colorRampPalette(mypal)(20) # Expand color palette to 20
unikn::seecol(pal=mypal, n=20) # View the new color palettes
1.2.2 Customize raster
plots
There is a long list of customizable operations available while plotting rasters in R. Let’s play with some of these options. Try horizontal=TRUE
, interpolate=FALSE
, change xlim=c(-180, 180)
with asp=1
, or try legend.shrink=0.4
.
sm=raster("./SampleData-master/raster_files/SMAP_SM.tif") # SMAP soil moisture data
raster::plot(sm,
main="Raster: New fancy plot", # Title of the plot
# Color options
col=brewer.pal(10,"Spectral"), # Custom colormap
lab.breaks=seq(0, 1, by=0.1), # Sequence from 0-1 with 0.1 increment
breaks=seq(0, 1, by=0.1), # Sequence from 0-1 with 0.1 increment
colNA="lightgray", # Color of cells with NA values
# Axis options
axes=TRUE, # Plot axes: TRUE/ FALSE
xlim=c(-180, 180), # X-axis limit
ylim=c(-90, 90), # Y-axis limit
zlim=c(0,1), # Z-axis limit
xlab="Longitude", # X-axis label
ylab="Latitue", # Y-axis label
# Legend options
legend=TRUE, # Plot legend: TRUE/ FALSE
horizontal = FALSE, # Plot legend horizontally: TRUE/ FALSE
legend.args = list(text='v/v', side=2,
font=2, cex=0.8), # Legend text and properties
legend.width = 1, # Legend width
legend.shrink=1, # Legend height; 1= plot height
# Miscellaneous
box=TRUE, # Disable box around the plots: TRUE/ FALSE
asp = NA, # Aspect ratio; asp=NA fills plot to window
interpolate=TRUE # Interpolate output plot
)
Let us try some interactive map options for raster data using mapview
package. Try plotting the map with and without at
option.
1.3. Plotting raster data using ggplot2
ggplot2
has several options to make publication-ready plots in R. We will use geom_tile
to display raster data. As geom_tile
requires data in the long format (Lat, Long, Z), we will first convert soil moisture raster to a spatial pixel dataframe
. We will use the option scale_fill_gradientn
for colormap as it gives the convenience of defining labels, breaks and limits for the plotted data.
library(ggplot2)
#~~~ 1.2.1. Transformation of raster to x-y-z format.
sm_df_smap = as.data.frame(as(sm, "SpatialPixelsDataFrame")) # Spatial dataframe of SM data
colnames(sm_df_smap) = c("SM", "Long", "Lat") # Rename the columns of the dataframe
head(sm_df_smap) # Checkout the spatial dataframe
## SM Long Lat
## 1 0.06211190 -92.05394 78.88268
## 2 0.05005269 -89.06639 78.88268
## 3 0.06211190 -92.05394 78.50923
## 4 0.05005269 -89.06639 78.50923
## 5 0.06211190 -92.05394 78.13579
## 6 0.05005269 -89.06639 78.13579
sm_world=ggplot() + # Initialize the plot
geom_tile(data=sm_df_smap, # Add soil moisture data
aes(x=Long, y=Lat, fill=SM))+ # Provide X, Y and Z data
scale_fill_gradientn(colors=mypal, # Use user-defined colormap
name = "SM", # Name of the colorbar
na.value = "transparent", # transparent NA cells
labels=(c("0", "0.2", "0.4", "0.6", "0.8")), # Labels of colorbar
breaks=seq(0,0.8,by=0.2), # Set breaks of colorbar
limits=c(0,0.8))+ # Set limits of colorbar
coord_equal()+ # Aspect ratio of plot
xlab("Longitude")+ # X-axis label
ylab("Latitude")+ # Y-axis label
borders("world", # Add global landmass boundaries
colour="gray43", # Fill light-gray color to the landmass
fill="transparent")+ # Transparent background
borders("state", # Add US state borders
colour="gray43", # Use light-gray color
fill="transparent") # Use transparent background
print(sm_world)
We can plot the map for a specific extent (CONUS, in this case) by changing the range of coord_fixed
option. We will also use a different theme: theme_bw
. Try xlim = c(114,153)
and ylim = c(-43,-11)
!
sm_conus=ggplot() + # Initialize the plot
geom_tile(data=sm_df_smap, # Add soil moisture data
aes(x=Long, y=Lat, fill=SM))+ # Provide X, Y and Z data
scale_fill_gradientn(colors=mypal, # Use user-defined colormap
name = "SM", # Name of the colorbar
na.value = "transparent", # transparent NA cells
labels = (c("0", "0.2", "0.4", "0.6", "0.8")), # Labels of colorbar
breaks = seq(0,0.8,by =0.2), # Set breaks of colorbar
limits = c(0,0.8))+ # Set limits of colorbar
xlab("Longitude")+ # X-axis label
ylab("Latitude")+ # Y-axis label
borders("world", # Add global landmass boundaries
colour ="gray43", # Fill light-gray color to the landmass
fill ="transparent")+ # Transparent background
borders("state", # Add US state borders
colour = "gray43", # Use light-gray color
fill = "transparent")+ # Use transparent background
coord_fixed(xlim = c(-125,-67), # Add extent for CONUS
ylim = c(24,50))+
theme_bw() # Try black-and-white theme.
print(sm_conus)
For a discrete color map, we will specify the breaks in the data and use scale_fill_manual
for a discrete color palette.
brk=seq(0,0.8, by=0.1) # Breakpoints in the data
sm_conus=ggplot() + # Initialize the plot
geom_tile(data=sm_df_smap, # Dataframe used for the plot
aes(x=Long,
y=Lat,
fill=cut(SM,breaks=brk)), # Specify X, Y and fill (z)
alpha=1)+ # Transparency of the plot layer
scale_fill_manual(values = brewer.pal(length(brk)-1,"Spectral"), # Discrete palette
na.value="transparent", # Set transparent fill values for NA cells
name="SM", # Name of the legend bar
labels=levels(cut(sm_df_smap$SM, breaks=brk))) + # Legend labels Cut data using breaks
xlab("Longitude")+ # X-axis label
ylab("Latitude")+ # Y-axis label
borders("world", # Add global landmass boundaries
colour = "gray43", # Fill light-gray color to the landmass
fill = "transparent")+ # Transparent background
borders("state", # Add US state borders
colour = "gray43", # Use light-gray color
fill = "transparent")+ # Use transparent background
coord_fixed(xlim = c(-125,-67), # Add extent for CONUS
ylim = c(24,50))+
theme_bw() # Try black-and-white theme.
print(sm_conus)
1.4. Plotting vector data
Importing and plotting shapefiles is equally easy in R. We will import the shapefile of the updated global IPCC climate reference regions
(https://doi.org/10.5194/essd-12-2959-2020) as OpenGIS Simple Features Reference (OGR
). We will also use global coastline
shapefile from the web for plotting.
###############################################################
#~~~ PART 1.4.1: Importing and visualizing shapefiles
library(rgdal) # For manipulating shapefiles
library(raster) # For plotting raster and shapefile data
# Import the shapefile of global IPCC climate reference regions (only for land)
IPCC_shp = rgdal::readOGR("./SampleData-master/CMIP_land/CMIP_land.shp",verbose = FALSE)
# View attribute table of the shapefile
head(IPCC_shp@data) # Notice the attributes look like a data frame
## V1 V2 V3 V4
## 0 ARCTIC Greenland/Iceland GIC 1
## 1 NORTH-AMERICA N.E.Canada NEC 2
## 2 NORTH-AMERICA C.North-America CNA 3
## 3 NORTH-AMERICA E.North-America ENA 4
## 4 NORTH-AMERICA N.W.North-America NWN 5
## 5 NORTH-AMERICA W.North-America WNA 6
# Load global coastline shapefile
coastlines = rgdal::readOGR("./SampleData-master/ne_10m_coastline/ne_10m_coastline.shp",verbose = FALSE)
# Alternatively, download global coastlines from the web
# NOTE: May not work if the online server is down
# download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_coastline.zip?version=4.0.1",destfile = 'ne_110m_coastline.zip')
# # Unzip the downloaded file
# unzip(zipfile = "ne_110m_coastline.zip",exdir = 'ne-coastlines-110m')
# Plot a raster with overlaying shapefiles
raster::plot(IPCC_shp) # Add IPCC land regions in blue color
raster::plot(coastlines, add=TRUE) # Add global coastline
#Subset shapefile for Eastern North-America
raster::plot(IPCC_shp[4,], main="Polygon for Eastern North-America")
# Combine raster plots with overlaying shapefiles
plot(sm,
col = brewer.pal(10,"Spectral"),
lab.breaks = seq(0, 1, by=0.1), # Sequence from 0-0.8 with 0.1 increment
breaks = seq(0, 1, by=0.1)) # Global SMAP soil moisture
raster::plot(IPCC_shp, add=TRUE) # Add IPCC land regions in blue color
raster::plot(coastlines, add=TRUE) # Add global coastline
###############################################################
#~~~ PART 1.4.2: Add spatial point to shapefile/ raster
#~~ Make base map
raster::plot(IPCC_shp[c(3,4,6,7),], # IPCC land regions for Contiguous US.
col = "lightgray", # Background color
border = "black") # Border color
#~~ Add global coastline
raster::plot(coastlines,
col= "maroon",
add=TRUE)
#~~ Add spatial point to the plot
Long=-96.33; Lat=30.62 # Lat- Long of College Station, TX
points(SpatialPoints(cbind(Long,Lat)), # Lat-Long as Spatial Points
col="blue", pch=16, cex=1.2) # Shape, size and color of point
1.5. Reprojection of rasters using projectRaster
A coordinate reference system (CRS) is used to relate locations on Earth (which is a 3-D spheroid) to a 2-D projected map using coordinates (for example latitude and longitude). A good description of coordinate reference systems and their importance can be found here: https://docs.qgis.org/3.4/en/docs/gentle_gis_introduction/coordinate_reference_systems.html https://datacarpentry.org/organization-geospatial/03-crs/
In R, the coordinate reference systems or CRS
are commonly specified in EPSG
(European Petroleum Survey Group) or PROJ4 format (See: https://epsg.io/).
# Importing SMAP soil moisture data
sm=raster("./SampleData-master/raster_files/SMAP_SM.tif")
#~~ Projection 1: NAD83 (EPSG: 4269)
sm_proj1 = projectRaster(sm, crs = CRS("+init=epsg:4267"));
raster::plot(sm_proj1,
main = "NAD83", # Title of the plot
col = mypal, # Colormap for the plot
axes = FALSE, # Disable axes
box = FALSE, # Disable box around the plots
asp = NA, # No fixed aspect ratio; asp=NA fills plot to window
legend=FALSE, # Disable legend
interpolate=TRUE) # Interpolate output raster
#~~ Projection 2: Azimuthal equidistant projection centered on Lat=0 and Lon=0
test_proj="+proj=aeqd +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
sm_proj4 = projectRaster(sm, crs = test_proj);
plot(sm_proj4, main="Azimuthal", col=mypal, axes=FALSE,
box=FALSE,asp = NA,legend=FALSE,interpolate=TRUE)
Other than the projections demonstrated, try the following:
"+init=epsg:3857" For Mercator (EPSG: 3857): Used in Google Maps, Open Street Maps, etc.
"+proj=robin +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs" for World Robinson Projection
"+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs" for World Mercator projection
We will use a custom function provided with the material to plot the global raster in Robinson projection.
#~~~ Plot a fancier re-projected maps in Robinson projection
#~ User-defined function for plotting
source("./SampleData-master/functions/robin_ras.R")
robin_plot=robin_ras(sm, # Raster to plot
map_col=rev(mypal), # Colormap
lim=c(0,0.8), # Set min and max z-limits
brk=seq(0,0.8,0.2), # Set breaks in colormap
name="Global Surface Soil Moisture from SMAP") # Name of the plot
robin_plot # print the reprojected plot
Useful references: More excellent examples on making maps in R can be found here: https://bookdown.org/nicohahn/making_maps_with_r5/docs/introduction.html. Quintessential resource for reference on charts and plots in R: https://www.r-graph-gallery.com/index.html
CHAPTER 2. Geospatial operations on raster/vector data
2.1. Raster resampling
We will demonstrate how to change the resolution of raster files using raster::aggregate
(fine to coarse), raster::disaggregate
(coarse to fine), and resampling values from one raster to another using raster::resample
function. We will use global aridity and soil moisture rasters for this purpose.
## [1] 0.373444 0.373444
#~~ Aggregate raster to coarser resolution
SMcoarse = raster::aggregate(sm, # Soil moisture raster
fact=10, # Aggregate by x 2
fun=mean) # Function used to aggregate values
plot(SMcoarse)
## [1] 3.73444 3.73444
#~~ Disaggregate raster to finer resolution
SMfine = raster::disaggregate(sm, # Soil moisture raster
fact=3,
method='bilinear') # Interpolation method. Option:""
res(SMfine)
## [1] 0.1244813 0.1244813
#~~ Raster resampling
# Import global aridity raster
aridity=raster("./SampleData-master/raster_files/aridity_36km.tif")
# Plot aridity map
plot(aridity,
col=mypal,
interpolate=TRUE,
lab.breaks=c(0:5),
breaks=c(0:5),
main="Global aridity map [1=driest, 5= wettest]")
# Resample aridity raster to coarse resolution
aridityResamp=raster::resample(aridity, # Original raster
SMcoarse, # Target resolution raster
method='ngb') # bilinear or ngb (nearest neighbor)
plot(aridityResamp,
col=mypal,
lab.breaks=c(0:5),
breaks=c(0:5),
main="Resampled aridity to coarser resolution")
2.2. Raster summary statistics
Arithmetic operations a.k.a Arith-methods (+, -, *, /, ^, %%, %/%) on rasters closely resemble simple vector-like operations. More details on Arith-methods can be found here: https://rdrr.io/cran/raster/man/Arith-methods.html. We will use cellStats
function to apply summary statistics and user-defined operations on cells of a raster.
# Simple arithmetic operations
sm2=sm*2
print(sm2) # Try sm2=sm*10, or sm2=sm^2 and see the difference in sm2 values
## class : RasterLayer
## dimensions : 456, 964, 439584 (nrow, ncol, ncell)
## resolution : 0.373444, 0.373444 (x, y)
## extent : -180, 180, -85.24595, 85.0445 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : SMAP_SM
## values : 0.03999996, 1.753352 (min, max)
## 25% 75%
## 0.09521707 0.29220164
# We can use cellStats function to apply user defined operation on rasters.
cellStats(sm, mean) # Try modal, median, min etc.
## [1] 0.209402
# User-defined statistics by defining own function
quant_fun = function(x, na.rm=TRUE){ # Remember to add "na.rm" option
quantile(x, probs = c(0.25, 0.75), na.rm=TRUE)
}
cellStats(sm, quant_fun) # 25th, and 75th percentile of each layer
## 25% 75%
## 0.09521707 0.29220164
2.3. Summarizing rasters using shapefles
Let’s explore using a spatial polygon/shapefile for summarizing a raster (in this case, global SMAP soil moisture) by using extract
function from the raster
library. We will also transform global aridity raster to a polygon using rasterToPolygons
function to find the mean soil moisture values for each aridity class.
First, we will use the IPCC shapefile to summarize the soil moisture raster.
###############################################################
#~~~ PART 2.3.1: Using shapefile to summarize a raster
sm_IPCC_df=raster::extract(sm, # Raster to be summarized
IPCC_shp, # Shapefile/ polygon to summarize the raster
df=TRUE, # Gives the summary statistics as a dataframe
fun=mean, # Desired statistic
na.rm=TRUE)# Ignore NA values? TRUE=yes!
head(sm_IPCC_df)
## ID SMAP_SM
## 1 1 0.2545766
## 2 2 0.3839087
## 3 3 0.2345457
## 4 4 0.3788003
## 5 5 0.2383580
## 6 6 0.1475145
###############################################################
#~~~ PART 2.3.2: Extract cell values for each region
sm_IPCC_list=raster::extract(sm, # Raster to be summarized
IPCC_shp, # Shapefile/ polygon to summarize the raster
df=FALSE, # Returns a list
fun=NULL, # fun=NULL will output cell values within each region
na.rm=TRUE) # Ignore NA values? yes!
# Apply function on cell values for each region
sm_IPCC_mean=lapply(sm_IPCC_list,mean) # Returns a list of regional means
sm_IPCC_mean=purrr::map(sm_IPCC_list,~ mean(.x, na.rm=TRUE)) # Returns a list of regional means
#~~ Try user defined function
myfun=function (y){return(mean(y, na.rm=TRUE))} # User defined function for calculating means
#~ Implement function using lapply and map
sm_IPCC_mean=lapply(sm_IPCC_list,myfun) # Returns a list of regional means
sm_IPCC_mean=purrr::map(sm_IPCC_list,~ myfun(.x)) # Returns a list of regional means
sm_IPCC_mean=unlist(sm_IPCC_mean) # Unlist to return a vector
head(sm_IPCC_mean) # Is this the same as the previous result?
## [1] 0.2545766 0.3839087 0.2345457 0.3788003 0.2383580 0.1475145
2.4. DIY: Summarize raster using classified raster
In the next example, we will convert global aridity raster into a polygon based on aridity classification using rasterToPolygons
function. Global aridity raster has 5 classes with 5 indicating humid and 1 indicating hyper-arid climate. We will use this polygon to extract values from the raster and summarize soil moisture for each aridity class.
###############################################################
#~~~ PART 2.2.3: Convert a raster to a shapefile
aridity=raster("./SampleData-master/raster_files/aridity_36km.tif") #Global aridity
# Convert raster to shapefile
# NOTE: rgeos package is needed to "dissolve" raster features to polygon
# install.packages("rgeos")
library(rgeos)
arid_poly=raster::rasterToPolygons(aridity, # Input raster
na.rm=TRUE, # Ignore NA values
digits=12, # No. of digits to round the coordinates
dissolve=TRUE) # FALSE will give boundaries for each cell
plot(arid_poly, col=arid_poly$aridity_36km) # Plot aridity polygon
Summarize values of SMAP soil moisture raster for aridity classes:
sm_arid_df=raster::extract(sm, # Raster to be summarized
arid_poly, # Shapefile/ polygon to summarize the raster
df=TRUE, # Gives the summary statistics as a dataframe
fun=mean, # Desired statistic
na.rm=TRUE)# Ignore NA values? yes!
# Lets plot the climate-wise mean of surface soil moisture
plot(sm_arid_df,xaxt = "n",
xlab="Aridity", ylab="Soil moisture",
type="b",
col="blue",
main="Climate-wise mean of surface soil moisture")
axis(1, at=1:5, labels=c("Hyper-arid", "Arid", "Semi-Arid","Sub-humid","Humid"))
CHAPTER 3. Data-cubes: RasterStack/RasterBrick
Whats is a RasterStack/ RasterBrick
?
raster
) is a collection of RasterLayer objects with the same spatial extent and resolution.
Raster brick exist entirely in memory takes longer time to create a raster brick compared to Raster Stack. However, various raster operations are significantly faster for RasterBrick over RasterStacks.
3.1. Create, subset and visualize RasterStack/ RasterBrick
We will create multilayer raster stacks/ bricks for NDVI
and SMAP SM
from the sample dataset and demonstrate several applications and operations. Working with Raster stack/brick are very similar to working with regular arrays or lists.
###############################################################
#~~~ PART 3.1.1: Create and plot NDVI RasterBrick/ Stack
library(raster)
library(fields)
library(unikn)
# Location of the NDVI raster files
ndvi_path="./SampleData-master/raster_files/NDVI/" #Specify location of NDVI rasters
# List of all NDVI rasters
ras_list=list.files(ndvi_path,pattern='*.tif',full.names=TRUE)
head(ras_list)
## [1] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-01-01.tif"
## [2] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-01-17.tif"
## [3] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-02-02.tif"
## [4] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-02-18.tif"
## [5] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-03-05.tif"
## [6] "./SampleData-master/raster_files/NDVI/NDVI_resamp_2016-03-21.tif"
Let’s import NDVI rasters from the location and store as a RasterBrick. We will use lapply
, map
, and for
loop using addlayer
function to import rasters from paths stored ras_list
. We will also learn how to add shapefiles to the background of all plots made using RasterBrick/Stack.
Expert Note: For memory intensive rasters to brick formation, the “for” loop method may be better to avoid memory bottleneck.
# Method 1: Use lapply to create raster layers from the raster location
ras_stack = lapply(paste(ras_list, sep = ""), raster)
ras_brick = brick(ras_stack) # Working on a RasterBrick is faster than stacks!!!
# Method 2: Using pipes to create raster layers from the raster location
library(tidyr) # For piping
ras_stack = ras_list %>% purrr::map(~ raster(.x)) # Import the raster
ras_brick = brick(ras_stack) # Convert RasterStack to RasterBrick
# Method 3: Using for loop to create raster layers from the raster location
ras_stack=stack()
for (nRun in 1:length(ras_list)){
ras_stack=addLayer(ras_stack,raster(ras_list[[nRun]]))
}
ras_brick = brick(ras_stack) # Convert RasterStack to RasterBrick
# Check dimension of RasterBrick
dim(ras_brick) #[x: y: z]- 23 raster layers with 456 x 964 cells
## [1] 456 964 23
## [1] 23
## [1] "NDVI_resamp_2016.01.01" "NDVI_resamp_2016.01.17" "NDVI_resamp_2016.02.02"
## [4] "NDVI_resamp_2016.02.18" "NDVI_resamp_2016.03.05" "NDVI_resamp_2016.03.21"
## [7] "NDVI_resamp_2016.04.06" "NDVI_resamp_2016.04.22" "NDVI_resamp_2016.05.08"
## [10] "NDVI_resamp_2016.05.24" "NDVI_resamp_2016.06.09" "NDVI_resamp_2016.06.25"
## [13] "NDVI_resamp_2016.07.11" "NDVI_resamp_2016.07.27" "NDVI_resamp_2016.08.12"
## [16] "NDVI_resamp_2016.08.28" "NDVI_resamp_2016.09.13" "NDVI_resamp_2016.09.29"
## [19] "NDVI_resamp_2016.10.15" "NDVI_resamp_2016.10.31" "NDVI_resamp_2016.11.16"
## [22] "NDVI_resamp_2016.12.02" "NDVI_resamp_2016.12.18"
# Subset raster stack/brick (notice the double [[]] bracket and similarity to lists)
sub_ras_brick=ras_brick[[c(1,3,5,10,12)]] #Select 1st, 3rd, 5th, 10th and 12th layers
# Plot first 4 elements of NDVI raster brick
addCoastlines=function(){plot(coastlines, add=TRUE)} #Function to add shapefile to a raster plot
plot(sub_ras_brick[[1:4]], # Select raster layers to plot
col=brewer.pal(10,"Spectral"),# Specify colormap
interpolate=TRUE, # Interpolate the output map
asp=NA, # Aspect ratio= NA
addfun=addCoastlines) # Add coastline to each layer
DIY: This time, let’s use levelplot
from rasterVis
library to plot RasterBrick
layers.
mapTheme = rasterTheme(region = brewer.pal(10,"Spectral")) # Set raster theme (color etc.)
my.at=seq(-1,1,0.1) # Set colorkey/ colorbar sequence
myColorkey = list(at=my.at, space="right") # Set colorkey/colorbar to the right
plt= rasterVis::levelplot(sub_ras_brick[[1:4]], # Select first 4 raster layers to plot
margin = F,
pretty=TRUE,
layout=c(2,2), # Layout of the plots
row.values=2, # Number of rows
par.settings = mapTheme, # Add user define theme
colorkey=myColorkey, # Add user defined
main="NDVI")
# Add coastline to the plots
# NOTE: We use "layer" function from "latticeExtra" package.
# "ggplot2" has a "layer" function too, but with different application.
plt=plt + latticeExtra::layer(sp.lines(coastlines, col="black", lwd=0.1))
print(plt)
3.2. Geospatial operations on RasterStack/ RasterBrick
In this section, we will demonstrate application of several operations on RasterStack/RasterBrick for e.g. extracting time series at user-defined locations, spatial data extraction using spatial polygons/raster, crop and masking of raster brick/stack.
3.2.1. Time-series extraction from RasterBrick/Stack
We will demonstrate three ways of extracting time series information from raster stack/brick.
Method 1: By converting raster brick to a spatial data frame using as.data.frame
or raster::rasterToPoints
functions
###############################################################
#~~~~ Method 1.1: Convert raster brick to spatial data frame
spdf = as(ras_brick, "SpatialPixelsDataFrame")
df_spdf = as.data.frame(spdf) # "x" and "y" in the df are long and lat
df_spdf[1:3,1:6] # Notice values of all raster layers are extracted for each cell
## NDVI_resamp_2016.01.01 NDVI_resamp_2016.01.17 NDVI_resamp_2016.02.02
## 1 NA -0.09864961 -0.06309947
## 2 NA -0.11487599 -0.07173216
## 3 NA -0.09752892 -0.07198682
## NDVI_resamp_2016.02.18 NDVI_resamp_2016.03.05 NDVI_resamp_2016.03.21
## 1 -0.03692058 -0.03954601 -0.03983472
## 2 -0.04443348 -0.04195234 -0.03798087
## 3 -0.04526810 -0.04376680 -0.03974375
###############################################################
#~~~~ Method 1.2: Use rasterToPoints function
r.pts = rasterToPoints(ras_brick, spatial=TRUE)
df_spdf = data.frame(long=coordinates(r.pts)[,1],
lat=coordinates(r.pts)[,2],
r.pts@data)
df_spdf[1:3,1:6] # Notice values of all raster layers are extracted for each cell
## long lat NDVI_resamp_2016.01.01 NDVI_resamp_2016.01.17
## 1 -167.4896 66.18558 NA -0.09864961
## 2 -167.1162 66.18558 NA -0.11487599
## 3 -166.7427 66.18558 NA -0.09752892
## NDVI_resamp_2016.02.02 NDVI_resamp_2016.02.18
## 1 -0.06309947 -0.03692058
## 2 -0.07173216 -0.04443348
## 3 -0.07198682 -0.04526810
Method 2: Using raster::extract
function to extract time series data for specific location(s)
###############################################################
#~~~~ Method 2.1: Extract values for a single location
# User defined lat and long
Long=-96.33; Lat=30.62
# Extract time series for the location
ndvi_val=raster::extract(ras_brick,
SpatialPoints(cbind(Long,Lat)), # lat-long as spatial locations
method='bilinear') # or "simple" for nearest neighbor
# Create a dataframe using the dates (derived from raster layer names) and extracted values
ndvi_ts=data.frame(Time=c(1:nlayers(ras_brick)), # Sequence of retrieval time
NDVI=as.vector(ndvi_val)) #NDVI values
# Try changing Time to as.Date(substr(colnames(ndvi_val),13,22), format = "%Y.%m.%d")
# Plot NDVI time series extracted from raster brick/stack
plot(ndvi_ts, type="l", col="maroon", ylim=c(0.2,0.8))
Let us use Method 2 (described above) for multiple locations.
#~~~~ Method 2.2: Extract values for multiple locations
# Import sample locations from contrasting hydroclimate
library(readxl)
loc= read_excel("./SampleData-master/location_points.xlsx")
print(loc)
## # A tibble: 3 x 4
## Aridity State Longitude Latitude
## <chr> <chr> <dbl> <dbl>
## 1 Humid Louisiana -92.7 34.3
## 2 Arid Nevada -116. 38.7
## 3 Semi-arid Kansas -99.8 38.8
# Extract time series using "raster::extract"
loc_ndvi=raster::extract(ras_brick,
#Convert lat -long to spatial locations
SpatialPoints(cbind(loc$Longitude,loc$Latitude)),
# Use bilinear interpolation (or simple) option
method='bilinear')
The rows in loc_ndvi
variable corresponds to the points specified by the user, namely humid, arid and semi-arid locations. Let’s plot the extracted NDVI for three locations to look for patterns (influence on climate on vegetation).
# Create a new data frame with dates of retrieval and NDVI for different hydroclimates
ndvi_locs = data.frame(Date=as.Date(substr(colnames(ndvi_val),13,22),format = "%Y.%m.%d"),
Humid = as.vector(loc_ndvi[1,]), #Location 1
Arid = as.vector(loc_ndvi[2,]), #Location 2
SemiArid = as.vector(loc_ndvi[3,])) #Location 3
# Convert data frame in "long" format for plotting using ggplot
library(tidyr)
df_long=ndvi_locs %>% gather(Climate,NDVI,-Date)
#"-" sign indicates decreasing order
head(df_long,n=4)
## Date Climate NDVI
## 1 2016-01-01 Humid 0.5919486
## 2 2016-01-17 Humid 0.5762772
## 3 2016-02-02 Humid 0.5570874
## 4 2016-02-18 Humid 0.5752835
# Plot NDVI for different locations (hydroclimates). Do we see any pattern??
library(ggplot2)
l = ggplot(df_long, # Dataframe with input Dataset
aes(x=Date, # X-variable
y=NDVI, # Y-variable
group=Climate)) + # Group by climate
geom_line(aes(color=Climate))+ # Line color based on climate variable
geom_point(aes(color=Climate)) # Point color based on climate variable
print(l)
Method 3: Use rowFromY
or rowFromX
to find row and column number of the raster for the location.
###############################################################
#~~~~ Method 3: Retrieve values using cell row and column number
row = rowFromY(ras_brick[[1]], Lat)
col = colFromX(ras_brick[[1]], Long)
ras_brick[row,col][1:5]
## [1] 0.5614925 0.5193725 0.4958409 0.5731566 0.6259534
3.2.2. Spatial extraction/summary from RasterBrick/Stack
Let’s try extracting values of RasterBrick (for all layers simultaneously) based on polygon features using extract
function.
#~~~~ Method 1: Extract values based on spatial polygons
IPCC_shp = rgdal::readOGR("./SampleData-master/CMIP_land/CMIP_land.shp",verbose = FALSE)
ndvi_sp = raster::extract(ras_brick,IPCC_shp, fun=mean, na.rm=T, df=TRUE, method='bilinear')
head(ndvi_sp,n=3)[1:5]
## ID NDVI_resamp_2016.01.01 NDVI_resamp_2016.01.17 NDVI_resamp_2016.02.02
## 1 1 -0.08325353 -0.05983509 -0.04461097
## 2 2 0.02765715 0.02737234 0.03172858
## 3 3 0.21922899 0.22114161 0.21826435
## NDVI_resamp_2016.02.18
## 1 -0.03795020
## 2 0.03084664
## 3 0.27331317
We can also use a classified raster like aridity data we previously used, to extract values corresponding to each class of values using lapply
and selective filtering. The two rasters must have same extent and resolution.
#~~~~ Method 2: Extract values based on another raster
aridity = raster("./SampleData-master/raster_files/aridity_36km.tif")
climate_ndvi = list()
# Extracts the time series of NDVI for pixels for each climate region
climate_ndvi = lapply(list(1,2,3,4,5),function(x) (na.omit((ras_brick[aridity==x]))))
# Calculate and store mean NDVI for each climate region
climate_ndvi_mean = lapply(list(1,2,3,4,5),function(x) (mean(ras_brick[aridity==x], na.rm=TRUE)))
plot(unlist(climate_ndvi_mean),type = "b")
3.2.3. Crop and mask RasterBrick/RasterStack
We will now crop
and mask
raster brick using shapefile from the disk or from imported shapefile from raster::getData
. crop
function clips the raster to the extent
provided. mask
removes the area outside the provided shapefile. Alternatively, we can import shapefile from disk using rgdal::readOGR
function.
# Import polygons for polygons for USA at level 1 i.e. state from disk
library(raster)
state_shapefile = rgdal::readOGR("./SampleData-master/USA_states/cb_2018_us_state_5m.shp",verbose = FALSE)
# Alternatively, try downloading level 1 (i.e. state level) USA polygons from GADM dataset online
# state_shapefile = raster::getData("GADM",country="USA",level=1) # Depends host server availability
# Print variable names
names(state_shapefile)
## [1] "STATEFP" "STATENS" "AFFGEOID" "GEOID" "STUSPS" "NAME" "LSAD"
## [8] "ALAND" "AWATER"
## [1] "Nebraska"
## [2] "Washington"
## [3] "New Mexico"
## [4] "South Dakota"
## [5] "Texas"
## [6] "California"
## [7] "Kentucky"
## [8] "Ohio"
## [9] "Alabama"
## [10] "Georgia"
## [11] "Wisconsin"
## [12] "Oregon"
## [13] "Pennsylvania"
## [14] "Mississippi"
## [15] "Missouri"
## [16] "North Carolina"
## [17] "Oklahoma"
## [18] "West Virginia"
## [19] "New York"
## [20] "Indiana"
## [21] "Kansas"
## [22] "Idaho"
## [23] "Nevada"
## [24] "Vermont"
## [25] "Montana"
## [26] "Minnesota"
## [27] "North Dakota"
## [28] "Hawaii"
## [29] "Arizona"
## [30] "Delaware"
## [31] "Rhode Island"
## [32] "Colorado"
## [33] "Utah"
## [34] "Virginia"
## [35] "Wyoming"
## [36] "Louisiana"
## [37] "Michigan"
## [38] "Massachusetts"
## [39] "Florida"
## [40] "United States Virgin Islands"
## [41] "Connecticut"
## [42] "New Jersey"
## [43] "Maryland"
## [44] "South Carolina"
## [45] "Maine"
## [46] "New Hampshire"
## [47] "District of Columbia"
## [48] "Guam"
## [49] "Commonwealth of the Northern Mariana Islands"
## [50] "American Samoa"
## [51] "Iowa"
## [52] "Puerto Rico"
## [53] "Arkansas"
## [54] "Tennessee"
## [55] "Illinois"
## [56] "Alaska"
# Exclude states outside of CONUS
conus = state_shapefile[!(state_shapefile$NAME %in% c("Alaska","Hawaii","American Samoa",
"United States Virgin Islands","Guam", "Puerto Rico",
"Commonwealth of the Northern Mariana Islands")),]
plot(conus)
# Crop rasters using USA polygon
usa_crop = crop(ras_brick, extent(conus)) # Crop
plot(usa_crop[[1:4]], # Cropped rasters
col=mypal, # Color pal.
addfun=function(){plot(conus, add=TRUE)} # Add shapefile to the plots
)
# Mask rasters using USA polygon
ndvi_mask_usa = raster::mask(usa_crop,conus) # Mask
plot(ndvi_mask_usa[[1:4]], # Masked rasters
col=mypal, # Color pal.
addfun=function(){plot(conus, add=TRUE)} # Add shapefile to the plots
)
Let’s crop raster for selected states.
# Mask raster for selected states
states = c('Colorado','Texas','New Mexico')
# Subset shapefile for selected attributes
state_plot = state_shapefile[(state_shapefile$NAME %in% states),]
states_trim = crop(ras_brick, extent(state_plot)) # Crop raster
ndvi_mask_states = raster::mask(states_trim,state_plot) # Mask
# Plot raster and shapefile
plot(ndvi_mask_states[[1]],col=mypal);
plot(state_plot, add=TRUE)
3.3. Layer-wise operations on RasterStack/ RasterBrick
We will demonstrate come layer-wise arithmetic operations, summary statistics and data transformation on RasterStacks/RasterBricks.
Arithmetic operations a.k.a Arith-methods
(+, -, *, /, ^, %%, %/%) on RasterStacks/RasterBricks closely resemble simple vector-like operations. More details on Arith-methods
can be found here: https://rdrr.io/cran/raster/man/Arith-methods.html. For layer-wise summary statistics and user-defined operations, we will use cellStats
function.
Let’s try some layer-wise cell-statistics:
###############################################################
#~~~ PART 3.3.1: Layer-wise cell-statistics
# Layer-wise Mean
cellStats(sub_ras_brick, mean) # Mean of each raster layer. Try modal, median, min etc.
## NDVI_resamp_2016.01.01 NDVI_resamp_2016.02.02 NDVI_resamp_2016.03.05
## 0.2840846 0.2948298 0.3057182
## NDVI_resamp_2016.05.24 NDVI_resamp_2016.06.25
## 0.4618550 0.4997354
## 25% 75%
## NDVI_resamp_2016.01.01 0.07200430 0.5136818
## NDVI_resamp_2016.02.02 0.09046099 0.5258916
## NDVI_resamp_2016.03.05 0.10357170 0.5185570
## NDVI_resamp_2016.05.24 0.23308019 0.6722358
## NDVI_resamp_2016.06.25 0.25382688 0.7282458
# User-defined statistics by defining own function
quant_fun = function(x, na.rm=TRUE) # Remember to add "na.rm" option
{quantile(x, probs = c(0.25, 0.75), na.rm=TRUE)}
cellStats(sub_ras_brick, quant_fun) # 25th, and 75th percentile of each layer
## NDVI_resamp_2016.01.01 NDVI_resamp_2016.02.02 NDVI_resamp_2016.03.05
## 25% 0.07188153 0.0844293 0.09917563
## 75% 0.51358944 0.5177374 0.51130731
## NDVI_resamp_2016.05.24 NDVI_resamp_2016.06.25
## 25% 0.2367169 0.2600064
## 75% 0.6695234 0.7303487
# Custom function for mean, variance and skewness
my_fun = function(x, na.rm=TRUE){ # Remember to add "na.rm" option
meanVal=mean(x, na.rm=TRUE) # Mean
varVal=var(x, na.rm=TRUE) # Variance
skewVal=moments::skewness(x, na.rm=TRUE) # Skewness
output=c(meanVal,varVal,skewVal) # Combine all statistics
return(output) # Return output
}
cellStats(sub_ras_brick, my_fun) # Mean, Variance and skewness of each layer
## NDVI_resamp_2016.01.01 NDVI_resamp_2016.02.02 NDVI_resamp_2016.03.05
## [1,] 0.28408458 0.29482983 0.30571816
## [2,] 0.07224034 0.06677124 0.06171894
## [3,] 0.54099728 0.50752435 0.50182848
## NDVI_resamp_2016.05.24 NDVI_resamp_2016.06.25
## [1,] 0.46185503 0.49973536
## [2,] 0.05944947 0.06587261
## [3,] -0.27219413 -0.33922138
Global cell-statistics can be calculated by converting raster stack to a vector and then using regular statistics functions.
###############################################################
#~~~ PART 3.3.2: Global cell-statistics
min_val=min(raster::minValue(sub_ras_brick))
max_val=max(raster::maxValue(sub_ras_brick))
print(c(min_val,max_val))
## [1] -0.20000 0.95065
# Or by vectorizing the RasterBrick and finding statistics
min_val=min(as.vector(sub_ras_brick), na.rm=T)
max_val=max(as.vector(sub_ras_brick), na.rm=T)
print(c(min_val,max_val))
## [1] -0.20000 0.95065
# Another example of statistics of vectorized RasterBrick
quant=quantile(as.vector(sub_ras_brick), c(0.01,0.99), na.rm=TRUE) #1st and 99th percentiles
print(quant)
## 1% 99%
## -0.05663428 0.84776928
Layer-wise arithmetic operations on RasterStack/RasterBrick:
###############################################################
#~~~ PART 3.3.3: Layer-wise operations on RasterStack/RasterBrick
# Arithmetic operations on RasterBrick/ RatserStacks are same as lists
add = sub_ras_brick+10 # Add a number to raster layers
mult = sub_ras_brick*5 # Multiply a number to raster layers
subset_mult = sub_ras_brick[[1:3]]*10 # Multiply a number to a subset of raster layers
# Data filtering based on cell-value
filter_brick = sub_ras_brick # Create a RasterBrick to operate on
filter_brick[filter_brick<0.5] = NA # Assign NA to any value less than 0.5
# Let's plot the filtered rasters
plot(filter_brick,
asp=NA, # Aspect ratio. NA= No white space
colNA="gray95", # Background color
nr=2, # Number of rows to arrange the plots
addfun= addCoastlines) # Add coastlines to each plot
# Layer-wise Log-transformation
log_ras_brick=log(sub_ras_brick)
# Normalize raster layers to [0,1] based on min and max of raster brick/stack
norm_brk=(sub_ras_brick-min_val)/(max_val-min_val) # Notice that the values are between [0,1]
# Plot in Robinson projection
source("./SampleData-master/functions/robin_ras.R") # User defined function
ndvi_robin_plot = robin_ras(norm_brk[[1]], # Select first raster
map_col=rev(mypal), # Colormap
lim=c(0,1), # Set min and max z-limits
brk=c(0, 0.2, 0.4, 0.6, 0.8, 1),# Set breaks in colormap
name="Normalized NDVI") # Name of the plot
ndvi_robin_plot # print the reprojected plot
3.4. Cell-wise operations with stackApply
and calc
We will now carry-out cell-wise operations on raster stack/brick. These functions carry operations on each cell of a RasterStack/RasterBrick for all layers.
While we can use some common summary statistical functions (mean/ median, min, max etc) directly, stackApply
and calc
will be used for more complex operations.
# Mean of all cells across layers
mean_ras = mean(sub_ras_brick, na.rm=TRUE) #Try modal, median, min etc.
# Approximate NA values by interpolating between Layers
approx_brick = approxNA(sub_ras_brick,method="linear") #Try "constant"
3.4.1. Stackapply
What if we have a more complex function to apply to the cells? To apply user defined functions, raster::stackApply
and raster::calc
can come handy.
Expert Note: raster::stackApply
is used when the output of the function is a vector of size 1. The result of using raster::stackApply
on a RasterStack/RasterBrick would be a single raster. We can apply raster::stackApply
function on subsets of original RasterStack/RasterBrick by using the option indices
. The output is saved as a RasterStack, with each index
class generating its own output layer.
raster::calc
is more versatile and can handle functions with output of length>1. raster::calc
can be easily parallelized with the help of clusterR package. Let’s see what this means!!
#~~ A user-defined function for mean, variance and skewness
my_fun = function(x, na.rm=TRUE){ # Remember to add "na.rm" option
meanVal=mean(x, na.rm=TRUE) # Mean
return(meanVal) # Return output
}
#~~ Using "stackApply" to apply "quant_fun"
# "Indices" specifies the group/subset of layers on which to apply the "fun".
# Same index for ALL layers will mean function applies to ALL layers.
stat_ras = raster::stackApply(sub_ras_brick,
fun=my_fun, # User defined function
indices=rep(1, nlayers(sub_ras_brick))) # function applied to All layers
# Lets plot the two output rasters
plot(stat_ras, # Layers to plot
asp=NA, # Aspect ratio. NA= No white space
colNA="gray95", # Background color
nr=2, # Number of rows to arrange the plots
addfun=addCoastlines) # Add coastlines to each plot
# Now, let's have the first two layers as group 1 and the next three as group 2.
# Function will be applied to each group separately and 2 layers of output will be generated.
stat_ras = raster::stackApply(sub_ras_brick,
fun=my_fun,
indices=c(1,1,2,2,2))
names(stat_ras) = c("Mean_of_rasters_1_to_2", "Mean_of_rasters_3_to_5")
# Two layers are formed, one for each group of indices
# Lets plot the two output rasters
plot(stat_ras, # Layers to plot
asp=NA, # Aspect ratio. NA= No white space
colNA="gray95", # Background color
nr=2, # Number of rows to arrange the plots
addfun=addCoastlines) # Add coastlines to each plot
3.4.2. Calc
raster::calc
can handle a modification of the user defined function,with several outputs. Let’s see what it means. We will change the my_fun
slightly to calculate mean, variance and skewness of the data (length of the output =3).
#~~ Custom function for mean, variance and skewness
my_fun = function(x, na.rm=TRUE){ # Notice:"na.rm" not required
meanVal=mean(x, na.rm=TRUE) # Mean
varVal=var(x, na.rm=TRUE) # Variance
skewVal=moments::skewness(x, na.rm=TRUE) # Skewness
output=c(meanVal,varVal,skewVal) # Combine all statistics
return(output) # Return output
}
# Using "calc" function for user defined operations
stat_brk=calc(sub_ras_brick, # RasterStack/ RasterBrick
fun = my_fun, # Function to be used
progress='text') # Progress bar
nlayers(stat_brk) # Notice the number of output layers = 3
## [1] 3
# Name corresponding layers
names(stat_brk)=c("Mean", "Variance", "Skewness")
# Plot output raster
plot(stat_brk, col=mypal)
CHAPTER 4. Parallel computation for geospatial analysis
This would be a good time to introduce ourselves to NetCDF
(Network Common Data Form) data format. Most large-scale climate/ satellite datasets are commonly provided in NetCDF
or HDF
(Hierarchical Data Format), which are machine-independent data formats for creation, access, and sharing of array-oriented scientific data. In our context, NetCDF
or HDF
files contain 3-D gridded geospatial dataset with information on lat-long, extent, resolution and time. These files can be imported in R as RasterStack/Bricks.
4.1. Cellwise implimentation of functions
4.1.1. Apply custom function to pixel time series
Once we have imported the NetCDF
file as raster brick, we wil apply a slightly modified version of previously used function my_fun
for calculating mean, variance and skewness for time series data for each cell in parallel. We will use raster::clusterR
function to apply my_fun
on RasterBrick in parallel.
Expert Note: For seamless implementation of function in parallel mode, care must be taken that all necessary are accessible to ALL cores and error exceptions are handles appropriately. We will modify my_fun
slightly to highlight what it means in practice.
- We will convert input
x
to a numeric array - We will remove
NA
values from dataset before calculation - We will use
minSamp
to fix minimum sample counts for calculation - We will use
tryCatch
to handle error exceptions
#~~ We will make some changes in the custom function for mean, variance and skewness
minSamp=50 # Minimum assured samples for statistics
my_fun = function(x, na.rm=TRUE){
smTS=as.numeric(as.vector(x)) # Convert dataset to numeric array
smTS=as.numeric(na.omit(smTS)) # Omit NA values
# Implement function with trycatch for catching exception
tryCatch(if(length(smTS)>minSamp) { # Apply minimum sample filter
######## OPERATION BEGINS #############
meanVal=mean(smTS, na.rm=TRUE) # Mean
varVal=var(smTS, na.rm=TRUE) # Variance
skewVal=moments::skewness(smTS, na.rm=TRUE) # Skewness
output=c(meanVal,varVal,skewVal) # Combine all statistics
return(output) # Return output
######## OPERATION ENDS #############
} else {
return(rep(NA,3)) # If conditions !=TRUE, return array with NA
},error =function(e){return(rep(NA, 3))}) # If Error== TRUE, return array with NA
}
# Apply function to all grids in parallel
library(tictoc)
library(parallel)
numCores=detectCores()-1; # Always leave one node free for housekeeping tasks
beginCluster(n=numCores) # Begin cluster with n nodes
tic()
StatOut=clusterR(SMAPBrk, # RasterBrick
calc, # Apply *raster::calc* in parallel
args=list(fun=my_fun), # Export function to all nodes
export=c("minSamp"), # Export necessary variables to all cores
progress='text') # Progressbar as text
##
|
| | 0%
|
|===== | 8%
|
|=========== | 15%
|
|================ | 23%
|
|====================== | 31%
|
|=========================== | 38%
|
|================================ | 46%
|
|====================================== | 54%
|
|=========================================== | 62%
|
|================================================ | 69%
|
|====================================================== | 77%
|
|=========================================================== | 85%
|
|================================================================= | 92%
|
|======================================================================| 100%
##
## 3.4 sec elapsed
endCluster() # End open cluster
names(StatOut)=c("Mean", "Variance", "Skewness")
# Apply function using Calc to see the difference in computing time
tic();
StatOutCalc=calc(SMAPBrk,my_fun)
toc()
## 3.86 sec elapsed
4.2.1. Best practices for large-scale operations
- Try parallel operation on a smaller region before submitting large jobs. Pixel-wise implementation of the function can help identify errors in the code. Convert the cropped region into a data frame and apply function to time series of each cell. If your code throws error, troubleshoot carefully for the series which generates the error.
# Crop a small region (2x2 degrees)
e = extent( c(-100, -98, 43, 45)) #Xmin, Xmax, Ymin, Ymax
smapCrp=raster::crop(smapBrk, e) # Try "my_fun" over "smapCrp" with clusterR as shown above
# Pixel-wise test (for trouble shooting error sources)
smDF=raster::as.data.frame(smapCrp)
parList=list()
for (i in 1:nrow(smDF)){
print(i)
parList[i]=my_fun(smDF[i,])
}
- Use
tryCatch
carefully as it may suppress legitimate errors as well, generating spurious results. Test the codes for smaller region withouttryCatch
to test the robustness of your codes. - Use global assignment
<<-
instead of<-
or=
for variables called in a function within a funtion used withclusterR
.
Expert Note: Parallel computing may have some overheads upon creation and closing of clusters. The a significant improvement in computing times using parallel techniques would be visible for large jobs.
4.2. Layerwise implimentation of functions
4.2.1. RasterBrick as a lists
We will convert RasterBrick to a list of rasters and then we will apply my_fun
to each element of the list in parallel using future_lapply
.
## [1] 1188
## [1] 0.144378162 0.006185985 1.656570393
# Apply function in parallel for all layers
library(parallel)
library(future.apply)
plan(multiprocess, workers = detectCores() - 1)
outStat= future_lapply(rasList, my_fun)
# Test output for one layer
outStat[[100]]
## [1] 0.144378162 0.006185985 1.656570393
Expert Note: Special options can be used to remove memory bottleneck when using future_lapply
. ***
# Allocate 2.2 GB memory for computing
# 2200 MB*(1024^2) = 2306867200
options(future.globals.maxSize= 2097152000)
4.2.2. Blockwise summary of feature extracted data
In this section we will use a shapefile to extract cell values from a RasterBrick as a list using exact_extract
. Summary statistics will be calculates in parallel using my_fun
for dataset for each feature.
Expert Note: Function exactextractr::exact_extract
is faster and more suited for large applications compared to raster::extract
. Although both perform similar operation with little changes in output format
#~ Extract feature data as data frame
library(exactextractr)
library(sf)
library(sp)
featureData=exact_extract(SMAPBrk, # Raster brick
st_as_sf(conus), # Convert shapefile to sf (simple feature)
force_df = FALSE, # Output as a data.frame?
include_xy = FALSE, # Incluse cell lat-long in output?
fun = NULL, # Specify any function to apply for each feature extracted data
progress = TRUE) # Progressbar
##
|
| | 0%
|
|= | 2%
|
|=== | 4%
|
|==== | 6%
|
|====== | 8%
|
|======= | 10%
|
|========= | 12%
|
|========== | 14%
|
|=========== | 16%
|
|============= | 18%
|
|============== | 20%
|
|================ | 22%
|
|================= | 24%
|
|=================== | 27%
|
|==================== | 29%
|
|===================== | 31%
|
|======================= | 33%
|
|======================== | 35%
|
|========================== | 37%
|
|=========================== | 39%
|
|============================= | 41%
|
|============================== | 43%
|
|=============================== | 45%
|
|================================= | 47%
|
|================================== | 49%
|
|==================================== | 51%
|
|===================================== | 53%
|
|======================================= | 55%
|
|======================================== | 57%
|
|========================================= | 59%
|
|=========================================== | 61%
|
|============================================ | 63%
|
|============================================== | 65%
|
|=============================================== | 67%
|
|================================================= | 69%
|
|================================================== | 71%
|
|=================================================== | 73%
|
|===================================================== | 76%
|
|====================================================== | 78%
|
|======================================================== | 80%
|
|========================================================= | 82%
|
|=========================================================== | 84%
|
|============================================================ | 86%
|
|============================================================= | 88%
|
|=============================================================== | 90%
|
|================================================================ | 92%
|
|================================================================== | 94%
|
|=================================================================== | 96%
|
|===================================================================== | 98%
|
|======================================================================| 100%
## [1] 49
## [1] 5
# View(featureData[[5]]) # View the extracted data frame
nrow(featureData[[5]]) # No. pixels within selected feature
## [1] 694
Each row in featureData[[5]]
is the time series of cell values which fall within the boundary of feature number 5, i.e. Texas. Since exact_extract
function provides coverage_fraction
for each pixel in the output, we will make some minor change in the my_fun
function to remove this variable before calculating the statistics.
# Plot SM time series for first pixel by removing percentage fraction
cellTS=as.numeric(featureData[[5]][1,1:nlayers(SMAPBrk)])
#
plot(cellTS, type="l", xlab="Time", ylab="Soil moisture")
#~~ We will make another small change in the custom function for mean, variance and skewness
minSamp=50 # Minimum assured samples for statistics
my_fun = function(x, na.rm=TRUE){
xDF=data.frame(x) # Convert list to data frame
xDF=xDF[ , !(names(xDF) %in% 'coverage_fraction')] # Remove coverage_fraction column
xData=as.vector(as.matrix(xDF)) # Convert data.frame to 1-D matrix
smTS=as.numeric(na.omit(xData)) # Omit NA values
# Implement function with trycatch for catching exception
tryCatch(if(length(smTS)>minSamp) { # Apply minimum sample filter
######## OPERATION BEGINS #############
meanVal=mean(smTS, na.rm=TRUE) # Mean
varVal=var(smTS, na.rm=TRUE) # Variance
skewVal=moments::skewness(smTS, na.rm=TRUE) # Skewness
output=c(meanVal,varVal,skewVal) # Combine all statistics
return(output) # Return output
######## OPERATION ENDS #############
} else {
return(rep(NA,3)) # If conditions !=TRUE, return array with NA
},error =function(e){return(rep(NA, 3))}) # If Error== TRUE, return array with NA
}
Let’s apply my_fun
to extracted data for each feature.
## [1] 0.1727930 0.0095579 0.9876028
# Apply function in parallel for all layers
library(parallel)
library(snow)
library(future.apply)
plan(multiprocess, workers = detectCores() - 1)
outStat= future_lapply(featureData, my_fun)
# Test output for one feature
outStat[[5]] # Is this the same as before?
## [1] 0.1727930 0.0095579 0.9876028
# Extract each summary stats for all features from the output list
FeatureMean=sapply(outStat,"[[",1) # Extract mean for all features
FeatureVar=sapply(outStat,"[[",2) # Extract variance for all features
FeatureSkew=sapply(outStat,"[[",3) # Extract skewness for all features
# Let's place mean statistics as an attribute to the shapefile
conus$meanSM=FeatureMean
# Plot mean soil moisture map for CONUS
library(rcartocolor)
library(ggplot2)
library(sf)
library(sp)
mean_map=ggplot() +
geom_sf(data = st_as_sf(conus), # CONUS shp as sf object (simple feature)
aes(fill = meanSM)) + # Plot fill color= mean soil moisture
scale_fill_carto_c(palette = "BluYl", # Using carto color palette
name = "Mean SM", # Legend name
na.value = "#e9e9e9", # Fill values for NA
direction = 1)+ # To invert color, use -1
coord_sf(crs = 2163)+ # Reprojecting polygon 4326 or 3083
theme_void() + # Plot theme. Try: theme_bw
theme(legend.position = c(0.2, 0.1),
legend.direction = "horizontal",
legend.key.width = unit(5, "mm"),
legend.key.height = unit(4, "mm"))
mean_map
DIY: Variable extraction from netCDF
using ncdf4
package.
# Extract Time variable from NetCDF file
library(ncdf4)
nc = nc_open("./SampleData-master/SMAP_L3_USA.nc")
names(nc$var) # Variable names
names(nc$dim) # Dimension attributes
# Extract time variable
retrieval_date=ncvar_get(nc,"Time")
head(retrieval_date)
# Time to datetime format
retrieval_date=as.Date(retrieval_date, origin="1970-01-01")
nc_close(nc) # Close connection to NC file
# Add Retrieval dates as layer names in SMAPBrk
names(SMAPBrk)=retrieval_date
plot(SMAPBrk[[c(1,10,20,27)]])
CHAPTER 5. Supplementary material
5.1. Exporting RasterBrick to NetCDF Data
RasterBricks are significantly faster in computation compared to RasterStacks. Hence, its convenient to store the model outputs as NetCDFs which can be imported as rasterBricks later for further analysis. Here we will look at the method to export RasterBrick to NetCDF. The memory hack provided below is not always mandatory. However, it can be helpful if you are working with heavy datasets.
Expert Note: a) NetCDF write speeds are significantly slower on cloud platforms like GoogleDrive or OneDrive. Hence, its recommended to export NetCDF files to local disk for faster write speeds. b) Before writing NetCDF file, convert your RasterStack (if any) to RasterBrick for faster write speed.
rb=brick("./SampleData-master/SMAP_L3_USA.nc")
r=rb[[1]] # raster taken from a first layer of a stack
xy<-xyFromCell(r,1:length(r)) #matrix of logitudes (x) and latitudes(y)
lon=unique(xy[,1])
lat=unique(xy[,2])
#~ hack raster internal function
cs_orig <- raster:::.chunk
cs_hack <- function(x) getOption("rasterChunkSize")
assignInNamespace(".chunk", cs_hack, ns = "raster")
rasterOptions(memfrac = 0.9,
maxmemory=10e+11,
chunksize =10e+10,
todisk=TRUE)
# first we write the netcdf file to disk
writeRaster(rb,
"C:/Users/vinit/Documents/SMAP_L3_myExport.nc",
overwrite=TRUE,
format="CDF",
varname="soil_moisture",
varunit="[m3/m3]",
longname="SMAPL3-V7 SM, 2-day interpolated, 36KM",
xname="Longitude",
yname="Latitude",
zname='Time',
zunit='Date of retrieval',
progress="text")
#### Adding lat, long and time to netCDF
# note that we use write=TRUE so that we can change it
library(ncdf4)
nc = nc_open('SMAP_L3_myExport.nc', write = TRUE)
export_dates=seq(as.Date("2015-03-31"),as.Date("2021-09-30"),by=2)
# put additional attributes into dimension and data variables
zvals =as.Date(export_dates) # Add zvals as dates
ncdf4::ncvar_put(nc,'Time',zvals)
ncdf4::ncvar_put(nc, "Longitude", lon)
ncdf4::ncvar_put(nc, "Latitude", lat)
nc_close(nc)
#~~ undo the hack
assignInNamespace(".chunk", cs_orig, ns = "raster")
rasterOptions(default = TRUE)
5.2. Raster manipulation using terra
Terra
is a new library to manipulate geographic (spatial) data which improves upon the raster
package in terms of speed. In this workshop, we will focus on raster
package due to the availability of better legacy resources online.
In most cases, the use of functions from terra
library is similar to raster
. To make raster plots using terra
, we use terra::plot
. Raster objects in terra
package are imported using rast
function. Equivalent terra
function names for every raster
function can be found in Section XXIII in the following document:
https://www.rdocumentation.org/packages/terra/versions/1.2-5/topics/terra-package
As with raster
package, there are numerous customizable options for raster plots in terra
. Try type=interval
, or horiz=TRUE
, or change cex
to 1. Use breaks
argument only when type
is interval
or classes
.
# Import raster from the downloaded folder using rast
sm=terra::rast("./SampleData-master/raster_files/SMAP_SM.tif") # SMAP soil moisture data
# Plot raster using plot function in terra package
terra::plot(sm, main="Soil Moisture")
smRast=terra::rast("./SampleData-master/raster_files/SMAP_SM.tif") # SMAP soil moisture data
terra::plot(smRast,
main="Terra: New fancy plot", # Title of the plot
# Color options
col=brewer.pal(10,"Spectral"), # Custom colormap
colNA="lightgray", # Color of cells with NA values
# Axis options
axes=TRUE, # Plot axes: TRUE/ FALSE
range=c(0,1), # Z-axis limit. Raster pkg uses zlim
# Legend options
type="continuous", # continuous, interval, classes
legend='right', # Legend position (topright,bottomleft, right etc)
#breaks=seq(0, 1, by=0.2), # Sequence from 0-1 with 0.2 increment
plg=list(horiz=FALSE, # Plot legend horizontally: TRUE/ FALSE
cex=0.7, # Legend text size
title = 'v/v', # Legend text
ncol=2), # No. of columns of legend labels
# Miscellaneous
asp = NA, # Aspect ratio; asp=NA fills plot to window
smooth=TRUE # Interpolate output plot
)
# To add grid to the plot
# grid(col="darkgray")
5.3. Updating R using RStudio
The operations in this tutorial are based on R version 4.1.1- Kick Things
and version 4.0.3 - Bunny-Wunnies Freak Out
. If necessary, update R from Rstudio using the updateR
function from the installr
package.
install.packages("installr")
library(installr)
updateR(keep_install_file=TRUE)
Follow Part 1: Data Visualization with R for a refresher on ggplot2, wordcloud, OfficeR and text mining.