In this class, we will study the second basic geospatial data type - raster. Below are the packages that need to be installed and loaded.
library(sf)
library(terra)
library(tidyverse)
Textbook link:https://bookdown.org/mcwimberly/gdswr-book/raster-geospatial-data---continuous.html
Raster data type is useful to visualize and study characteristics such as
and other weather or geographic conditions that continuously span the area.
A raster object can be created by calling the rast()
function and specifying an external image file as an argument.
In this example, a dataset of land surface temperature (LST) measured by the MODIS sensor on board the Terra satellite is imported into R as a raster object.
MODIS data, along with many other types of satellite remote sensing products, can be obtained from the United State Geological Survey’s Land Processes Distributed Active Archive Center (https://lpdaac.usgs.gov/).
lst <- rast("~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/MOD11A2_2017-07-12.LST_Day_1km.tif")
SpatRaster
classThe resulting object belongs to the SpatRaster
class.
Invoking the print()
function for a raster object provides
information about the dimensions of the grid, cell size, geographic
location, and other details.
class(lst)
## [1] "SpatRaster"
## attr(,"package")
## [1] "terra"
lst
## class : SpatRaster
## dimensions : 1110, 3902, 1 (nrow, ncol, nlyr)
## resolution : 0.009009009, 0.009009009 (x, y)
## extent : -104.4326, -69.27943, 30, 40 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : MOD11A2_2017-07-12.LST_Day_1km.tif
## name : MOD11A2_2017-07-12.LST_Day_1km
resolution: the size of each pixel in x (latitude) and y (longitude).
extent: the range of this image
coord.ref.: the coordinate reference system, which is used to locate the actual location on earth from the coordinates
In each layer of raster data, there stores a value correspoding to a characteristic of each pixel. In this example, the value is a measured temperature.
The summary()
function provides information about the
statistical distribution of raster values, with the size argument
specifying the number of randomly sampled pixels to include in the
summary.
summary(lst, size = 1e6)
## MOD11A2_2017.07.12.LST_Day_1km
## Min. : 0
## 1st Qu.: 0
## Median :14981
## Mean : 8321
## 3rd Qu.:15148
## Max. :16396
There are a number of helper functions, shown below, that can be used
to extract specific characteristics of a SpatRaster
object.
The ncell()
, nrow()
, ncol()
, and
nlyr()
functions return the numbers of grid cells, grid
rows, grid columns, and raster layers in the dataset. The
res()
function returns the grid cell size.
ncell(lst)
## [1] 4331220
nrow(lst)
## [1] 1110
ncol(lst)
## [1] 3902
nlyr(lst)
## [1] 1
res(lst)
## [1] 0.009009009 0.009009009
ext()
functionThe ext()
function returns a SpatExtent
object that contains the geographic coordinates of the raster extent.
SpatExtent
objects can be used to specify the extent of new
raster objects or to crop rasters to a new extent.
lst_ext <- ext(lst)
class(lst_ext)
## [1] "SpatExtent"
## attr(,"package")
## [1] "terra"
lst_ext[1:4]
## xmin xmax ymin ymax
## -104.43258 -69.27943 30.00000 40.00000
Each layer in a SpatRaster
object has a name. By
default, this name is the same as the data file that was used to create
the raster. Because file names are often long and difficult to
interpret, it is useful to specify more readable layer names. Layer
names can be extracted and assigned using the names()
function.
names(lst)
## [1] "MOD11A2_2017-07-12.LST_Day_1km"
names(lst)<-c("temperature")
names(lst)
## [1] "temperature"
In the LST dataset, missing data are coded as zeroes, and the raw
digital numbers must be modified by a scaling factor of 0.02 to convert
them into degrees Kelvin. The code below uses the ifel()
function, which is analogous to the ifelse()
function in
base R, to replace zero values in the LST dataset with NA
values.
lst <- ifel(lst == 0, NA, lst)
lst_c <- lst * 0.02 - 273.15
summary(lst_c, size = 1e6)
## temperature
## Min. : 9.1
## 1st Qu.:28.0
## Median :29.6
## Mean :30.3
## 3rd Qu.:31.5
## Max. :54.8
## NA's :452091
global()
functionThe global()
function can be used to generate
statistical summaries of the pixels in a raster dataset. As demonstrated
in Chapter 1, the na.rm argument must be TRUE to ignore missing data in
the calculations.
global(lst_c, fun = "mean", na.rm=T)
## mean
## temperature 30.28687
global(lst_c, fun = "min", na.rm=T)
## min
## temperature 1.67
global(lst_c, fun = "max", na.rm=T)
## max
## temperature 54.77
global(lst_c, fun = "sd", na.rm=T)
## sd
## temperature 3.652403
Raster objects can be exported using the writeRaster() function. The format of the exported image is specified using the format argument. Common output formats are GTiff (GeoTiff), ascii (ESRI ASCII text ), CDF, (NetCDF), and HFA (ERDAS Imagine). The overwrite=TRUE argument will replace existing files with the same name.
writeRaster(lst_c,
filename = "MOD11A2_2017-07-12.LST_Day_1km_DegC.tif",
filetype="GTiff", overwrite=TRUE )
The ggplot()
function can be used to make maps with
raster data, as well as vector data. However, ggplot() only works with
inputs in the form of a data frame and therefore cannot
be used to directly map a SpatRaster
object.
In most situations, data frames are very inefficient for storing raster data because
Fortunately, raster objects can be converted to data frames with one row per cell, columns for the x and y coordinates, and columns for one or more attribute that are associated with each cell.
Converting raster data to a data frame is straightforward, but requires several lines of code. To automate the process, a custom function can be used.
rasterdf <- function(x, aggregate = 1) {
resampleFactor <- aggregate
inputRaster <- x
inCols <- ncol(inputRaster)
inRows <- nrow(inputRaster)
# Compute numbers of columns and rows in the resampled raster
resampledRaster <- rast(ncol=(inCols / resampleFactor),
nrow=(inRows / resampleFactor),
crs = crs(inputRaster))
# Match to the extent of the original raster
ext(resampledRaster) <- ext(inputRaster)
# Resample data on the new raster
y <- resample(inputRaster,resampledRaster,method='near')
# Extract cell coordinates into a data frame
coords <- xyFromCell(y, seq_len(ncell(y)))
# Extract layer names
dat <- stack(values(y, dataframe = TRUE))
# Add names - 'value' for data, 'variable' for different
# layer names in a multilayer raster
names(dat) <- c('value', 'variable')
dat <- cbind(coords, dat)
dat
}
lst_df <- rasterdf(lst_c, aggregate = 3)
summary(lst_df)
## x y value variable
## Min. :-104.42 Min. :30.01 Min. :11.23 temperature:481370
## 1st Qu.: -95.64 1st Qu.:32.50 1st Qu.:28.05
## Median : -86.86 Median :35.00 Median :29.55
## Mean : -86.86 Mean :35.00 Mean :30.29
## 3rd Qu.: -78.07 3rd Qu.:37.50 3rd Qu.:31.53
## Max. : -69.29 Max. :39.99 Max. :54.77
## NA's :217276
geom_raster()
The geom_raster()
function is used with the x and y
arguments assigned to the corresponding columns in lst_df
.
The fill
argument is the value
column from
lst_df
, which contains the LST values.
ggplot(data = lst_df) +
geom_raster(aes(x = x,
y = y,
fill = value)) +
scale_fill_gradient(name = "Degrees C",
low = "yellow",
high = "red") +
coord_sf(expand = FALSE) +
labs(title = "LST-Adjusted Temperature (Terra Day)",
x = "longitude",
y = "latitude") +
theme(legend.position = "bottom")
The temperature map in the previous page looks nice but where is the plotting range exactly located? We can plot a US map to see that.
us_map <- st_read("~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/conterminous_us_states.shp", quiet = T)
ggplot() +
geom_raster(data = lst_df, aes(x = x,
y = y,
fill = value), alpha = 0.5) +
geom_sf(data = us_map, color = "grey50", fill = NA) +
scale_fill_gradient(name = "Degrees C",
low = "yellow",
high = "red") +
coord_sf(expand = FALSE) +
labs(title = "LST-Adjusted Temperature (Terra Day)",
x = "longitude",
y = "latitude")
The terra package includes functions for a modifying the geometry of
raster datasets. To clip out a smaller portion of the LST dataset, we
can use the crop()
function.
In this example, a bounding box of geographic coordinates (latitude
and longitude) is specified to create a SpatExtent
object
and crop the raster to the U.S. state of Georgia.
clipext <- ext(-86, -80.5, 30, 35.5)
class(clipext)
## [1] "SpatExtent"
## attr(,"package")
## [1] "terra"
lst_clip <- crop(lst_c, clipext)
lst_clip_df <- rasterdf(lst_clip)
Let’s also read the vector map for Georgia state.
ga_sf <- st_read(dsn = "~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/GA_SHP.shp", quiet = TRUE)
ggplot() +
geom_raster(data = lst_clip_df,
aes(x = x,
y = y,
fill = value)) +
geom_sf(data=ga_sf,
color = "grey50",
fill = NA, size = 0.5) +
scale_fill_gradient(name = "Degrees C",
low = "yellow",
high = "red") +
coord_sf(expand = FALSE) +
labs(title = "LST-Adjusted Temperature (Terra Day)",
x = "longitude",
y = "latitude")
The next examples will use forcings data from the North American Land Data Assimilation System (NLDAS).
These are gridded meterological data that can be obtained through the National Aeronautics and Space Administration’s Goddard Earth Sciences Data and Information Services Center (https://disc.gsfc.nasa.gov/).
The data have a grid cell size of 0.125 x 0.125 degrees and are
stored as GeoTIFF files and can be imported as
SpatRaster objects
in the same way as the LST dataset.
path = "~/Documents/Fei Tian/Course_DAS522_Exploratory_Data_Analysis_and_Visualization_Spring2024/Datasets/Geospatial/gdswr_data/Chapter6/"
temp2012 <- rast(paste0(path,"NLDAS_FORA0125_M.A201207.002.grb.temp.tif"))
temp2013 <- rast(paste0(path,"NLDAS_FORA0125_M.A201307.002.grb.temp.tif"))
temp2014 <- rast(paste0(path,"NLDAS_FORA0125_M.A201407.002.grb.temp.tif"))
temp2015 <- rast(paste0(path,"NLDAS_FORA0125_M.A201507.002.grb.temp.tif"))
temp2016 <- rast(paste0(path,"NLDAS_FORA0125_M.A201607.002.grb.temp.tif"))
temp2017 <- rast(paste0(path,"NLDAS_FORA0125_M.A201707.002.grb.temp.tif"))
Multiple SpatRaster objects can be combined to produce multi-layer raster datasets. These objects are frequently used for space-time data because they can store rasters from different dates.
tempstack <- c(temp2012, temp2013, temp2014,
temp2015, temp2016, temp2017)
names(tempstack) <- c("July 2012", "July 2013", "July 2014",
"July 2015", "July 2016", "July 2017")
The code above creates two multi-layer SpatRaster
objects, one containing July temperature data from 2012-2017 and one
containing July precipitation data from 2012-2017. The
names()
function is used to specify the layer names for
each object.
When the global()
function is applied to multi-layer
rasters, each layer in the SpatRaster
object is summarized
individually.
global(tempstack, stat = "mean", na.rm=T)
## mean
## July 2012 297.8852
## July 2013 296.1879
## July 2014 295.7905
## July 2015 296.2655
## July 2016 296.5719
## July 2017 296.7618
From these summaries, is apparent that the temperature units are in
Kelvin. As was done with the MODIS LST data, these data can be converted
to Celsius using a simple R expression. The expression is automatically
applied to all the layers in the multi-layer SpatRaster
object.
tempstack <- tempstack - 273.15
global(tempstack, stat = "mean", na.rm=T)
## mean
## July 2012 24.73522
## July 2013 23.03793
## July 2014 22.64051
## July 2015 23.11549
## July 2016 23.42190
## July 2017 23.61176
Multi-layer rasters can be mapped using ggplot()
. First,
the SpatRaster
object is converted to a data frame using
the custom rasterdf()
function, and a dataset of U.S. state
boundaries is imported for use as an overlay.
tempstack_df <- rasterdf(tempstack)
summary(tempstack_df)
## x y value variable
## Min. :-124.94 Min. :25.06 Min. : 3.09 July 2012:103936
## 1st Qu.:-110.47 1st Qu.:32.03 1st Qu.:19.59 July 2013:103936
## Median : -96.00 Median :39.00 Median :23.53 July 2014:103936
## Mean : -96.00 Mean :39.00 Mean :23.43 July 2015:103936
## 3rd Qu.: -81.53 3rd Qu.:45.97 3rd Qu.:27.62 July 2016:103936
## Max. : -67.06 Max. :52.94 Max. :40.18 July 2017:103936
## NA's :140982
Then, ggplot()
is used with facet_wrap()
to
display a composite plot with one year mapped in each facet. The
variable column, which contains the names of the different raster
layers, is specified in the facets argument. The temperature data are
displayed using a yellow to red color ramp with a vector dataset of
state boundaries overlaid to provide a spatial reference (Figure
6.3).
states_sf <- read_sf(paste0(path,"conterminous_us_states.shp"), quiet = TRUE)
ggplot() +
geom_raster(data = tempstack_df,
aes(x = x,
y = y,
fill = value)) +
geom_sf(data = states_sf,
fill = NA,
color = "grey50",
size = 0.25) +
scale_fill_gradient(name = "Degrees C",
low = "yellow",
high = "red") +
coord_sf(expand = FALSE) +
facet_wrap(facets = vars(variable), ncol = 2) +
labs(title = "Mean July Temperature",
x = "longitude",
y = "latitude")
Many statistical summary functions have methods for raster objects. When used with multi-layer rasters, these functions will be applied to each pixel to summarize the data across all layers.
For example, the following code generates a mean temperature for each pixel using the six years of data stored in different layers.
Note the difference compared to the summaries calculated with
global()
, which were summarized across all pixels for each
layer instead of across all layers for each pixel.
meantemp <- mean(tempstack)
meantemp_df <- rasterdf(meantemp)
ggplot() +
geom_raster(data = meantemp_df, aes(x = x,
y = y,
fill = value)) +
geom_sf(data = states_sf,
fill = NA,
color = "grey50",
size = 0.25) +
scale_fill_gradient(name = "Degrees C",
low = "yellow",
high = "red") +
coord_sf(expand = FALSE) +
labs(title = "Mean July Temperature, 2012-2017",
x = "longitude", y = "latitude") +
theme(legend.position = "bottom")
The following code subtracts each of the six annual datasets in the multi-layer raster from the mean temperature layer to generate maps of temperature anomalies.
tempanom <- tempstack - meantemp
names(tempanom) <- names(tempstack)
The anomalies are displayed using a bicolor ramp generated with the
scale_fill_gradients2()
function. This approach is
justified because the anomalies have a breakpoint at zero. Positive
(warm) anomalies are displayed as red, and negative (cold) anomalies are
displayed as blue.
tempanom_df <- rasterdf(tempanom)
ggplot() +
geom_raster(data = tempanom_df, aes(x = x,
y = y,
fill = value)) +
geom_sf(data = states_sf,
fill = NA,
color = "grey50",
size = 0.25) +
scale_fill_gradient2(name = "Degrees C",
low = "blue",
mid = "lightyellow",
high = "red") +
coord_sf(expand = TRUE) +
facet_wrap(facets = vars(variable), ncol = 2) +
theme_void() +
theme(strip.text.x = element_text(size=12, face="bold"))
One may also change the palette with what we learned from last class:
library(RColorBrewer)
ggplot() +
geom_raster(data = tempanom_df, aes(x = x,
y = y,
fill = value)) +
geom_sf(data = states_sf,
fill = NA,
color = "grey50",
size = 0.25) +
scale_fill_distiller(name = "Degree C",
palette = "Spectral") +
coord_sf(expand = TRUE) +
facet_wrap(facets = vars(variable), ncol = 2) +
theme_void() +
theme(strip.text.x = element_text(size=12, face="bold"))
The terra
package has methods for many of the base R
plotting functions. This means that a SpatRaster
object can
be provided as an input to these functions, and R will be able to
generate an appropriate map or graph. Below are a few examples.
plot(tempstack[[1]])
plot(tempstack)
hist(tempstack)
Convert the clipped LST raster from Celsius to Fahrenheit and generate an updated version of the Georgia LST map.
Generate a new raster where each pixel contains the standard deviation of the NLDAS July Temperature values from 2012-2017. Map the result to explore where temperatures were most variable over this period.
To use raster data to visualize terrain, we need to study the following chapter of the textbook:
https://bookdown.org/mcwimberly/gdswr-book/raster-geospatial-data---discrete.html