Loading required packages. spData library provides the
us_states dataset with state-level polygon geometries. The
us_states dataset is transformed to WGS84 (EPSG:4326) to
ensure spatial compatibility with the SPEI raster.
install.packages('sf')
install.packages('spData')
install.packages('dplyr')
install.packages('ggplot2')
install.packages('terra')
rm(list = ls())
library(sf)
library(spData)
library(dplyr)
library(ggplot2)
library(terra)
# Load and reproject US states to WGS84:
data(us_states, package = 'spData')
sf.usstates <- us_states %>%
st_transform(4326)
The SPEI (Standardised Precipitation-Evapotranspiration Index) is a drought indicator where negative values mean drier-than-normal conditions and positive values mean wetter-than-normal. The NetCDF file downloaded online contains a global raster for every month since 1901. Each layer corresponds to one month.
r.spei <- rast('~/Desktop/Geospatial/spei01.nc')
r.spei
## class : SpatRaster
## size : 360, 720, 1488 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : spei01.nc
## varname : spei (Standardized Precipitation-Evapotranspiration Index)
## names : spei_1, spei_2, spei_3, spei_4, spei_5, spei_6, ...
## unit : 1
## time (days) : 1901-01-16 to 2024-12-16 (1488 steps)
t.spei <- time(r.spei)
head(t.spei)
## [1] "1901-01-16" "1901-02-15" "1901-03-16" "1901-04-16" "1901-05-16"
## [6] "1901-06-16"
tail(t.spei)
## [1] "2024-07-16" "2024-08-16" "2024-09-16" "2024-10-16" "2024-11-16"
## [6] "2024-12-16"
The us_states dataset includes a REGION
variable that groups each state into one of four regions. We dissolve
the state-level polygons into four regional geometries. This creates a
single polygon per region, which we will use to extract average SPEI
values.
sf_use_s2(F)
sf.regions <- sf.usstates %>%
group_by(REGION) %>%
summarise(geometry = st_union(geometry), .groups = 'drop') %>%
st_make_valid()
sf.regions
## Simple feature collection with 4 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7042 ymin: 24.55868 xmax: -66.9824 ymax: 49.38436
## Geodetic CRS: WGS 84
## # A tibble: 4 × 2
## REGION geometry
## * <fct> <MULTIPOLYGON [°]>
## 1 Norteast (((-70.11867 41.24235, -70.27553 41.31046, -70.08207 41.29909, -70.1…
## 2 Midwest (((-89.19948 36.71605, -89.15908 36.66635, -89.22732 36.56938, -89.3…
## 3 South (((-81.40189 24.62354, -81.5174 24.62124, -81.68524 24.55868, -81.81…
## 4 West (((-118.6055 33.031, -118.37 32.83927, -118.4963 32.85157, -118.6055…
sf_use_s2(T)
We extract the year from each layer’s date stamp and dynamically select the most recent 50-year period.
spei.dates <- as.Date(t.spei)
spei.years <- as.integer(format(spei.dates, '%Y'))
year.end <- max(spei.years, na.rm = T)
year.start <- year.end - 49
cat('Period:', year.start, '-', year.end, '\n')
## Period: 1975 - 2024
years.window <- year.start:year.end
Instead of extracting values from every monthly layer, we first aggregate the monthly layers into yearly mean rasters. This reduces to 50 rasters. Each yearly raster represents the average drought condition for that year across all 12 months.
list.yearly <- lapply(years.window, function(y) {
idx <- which(spei.years == y)
if (length(idx) == 0) return(NULL)
app(r.spei[[idx]], fun = mean, na.rm = T)
})
# Drop NULLs (if any incomplete years):
valid <- !vapply(list.yearly, is.null, logical(1))
years.window <- years.window[valid]
list.yearly <- list.yearly[valid]
length(list.yearly)
## [1] 50
For each yearly raster, we extract the mean SPEI value within each of the four region polygons. The results are combined into a panel dataset with 200 rows (4 regions × 50 years). Each row contains the year, region name, and the average SPEI for that region-year combination. This panel structure allows us to plot time series for each region and compare trends.
v.regions <- vect(sf.regions)
list.panel <- lapply(seq_along(list.yearly), function(i) {
r.year <- list.yearly[[i]]
r.extract <- terra::extract(r.year, v.regions, fun = mean, na.rm = T)
val.col <- setdiff(names(r.extract), 'ID')[1]
data.frame(
year = years.window[i],
REGION = sf.regions$REGION,
spei = r.extract[[val.col]]
)
})
df.panel <- bind_rows(list.panel)
head(df.panel,10)
## year REGION spei
## 1 1975 Norteast 0.3032744
## 2 1975 Midwest 0.1908994
## 3 1975 South 0.3281016
## 4 1975 West 0.3544637
## 5 1976 Norteast 0.2001266
## 6 1976 Midwest -0.6973260
## 7 1976 South -0.0784055
## 8 1976 West -0.1968005
## 9 1977 Norteast 0.2908968
## 10 1977 Midwest 0.1424372
We verify the data in three ways. First, we check that each region has exactly 50 observations with no missing values, confirming a complete and balanced panel. Second, we cross-check one data point by manually extracting the 12 monthly SPEI values for the South region in 1983, computing their average, and comparing it with the corresponding value in the panel dataset. If they match, it confirms the yearly raster aggregation is working correctly. Third, we compute descriptive statistics — the mean should be close to zero (since SPEI is a standardized index), standard deviations should be moderate, and there should be no extreme or implausible values.
df.panel %>%
group_by(REGION) %>%
summarise(
n = n(),
n_na = sum(is.na(spei))
)
## # A tibble: 4 × 3
## REGION n n_na
## <fct> <int> <int>
## 1 Norteast 50 0
## 2 Midwest 50 0
## 3 South 50 0
## 4 West 50 0
We extract all 12 monthly SPEI layers for 1983, compute the mean for the South region manually, and compare it with the value stored in the panel.
# Extract the 12 monthly layers for 1983:
idx.1983 <- which(spei.years == 1983)
v.south <- vect(sf.regions %>% filter(REGION == 'South'))
df.monthly.1983 <- lapply(idx.1983, function(i) {
r.extract <- terra::extract(r.spei[[i]], v.south, fun = mean, na.rm = T)
val.col <- setdiff(names(r.extract), 'ID')[1]
data.frame(
month = i - idx.1983[1] + 1,
spei = r.extract[[val.col]]
)
})
df.monthly.1983 <- bind_rows(df.monthly.1983)
df.monthly.1983 # all 12 months
## month spei
## 1 1 -0.03026204
## 2 2 0.80249231
## 3 3 0.63373701
## 4 4 0.80860500
## 5 5 0.50706356
## 6 6 0.29532342
## 7 7 -1.02819357
## 8 8 -0.80992244
## 9 9 -0.38399220
## 10 10 0.45356528
## 11 11 0.75064383
## 12 12 1.05884683
# Manual average:
cat('Manual average from 12 months:', mean(df.monthly.1983$spei, na.rm = T), '\n')
## Manual average from 12 months: 0.2548256
# Compare with panel:
panel.value <- df.panel %>%
filter(REGION == 'South', year == 1983)
cat('Panel value:', panel.value$spei, '\n')
## Panel value: 0.2548256
df.desc <- df.panel %>%
group_by(REGION) %>%
summarise(
n_obs = n(),
mean_spei = mean(spei, na.rm = T),
sd_spei = sd(spei, na.rm = T),
min_spei = min(spei, na.rm = T),
p25_spei = quantile(spei, 0.25, na.rm = T),
median_spei = median(spei, na.rm = T),
p75_spei = quantile(spei, 0.75, na.rm = T),
max_spei = max(spei, na.rm = T)
)
# Overall:
df.total <- df.panel %>%
summarise(
REGION = 'Overall',
n_obs = n(),
mean_spei = mean(spei, na.rm = T),
sd_spei = sd(spei, na.rm = T),
min_spei = min(spei, na.rm = T),
p25_spei = quantile(spei, 0.25, na.rm = T),
median_spei = median(spei, na.rm = T),
p75_spei = quantile(spei, 0.75, na.rm = T),
max_spei = max(spei, na.rm = T)
)
knitr::kable(bind_rows(df.desc, df.total), digits = 3)
| REGION | n_obs | mean_spei | sd_spei | min_spei | p25_spei | median_spei | p75_spei | max_spei |
|---|---|---|---|---|---|---|---|---|
| Norteast | 50 | 0.088 | 0.285 | -0.509 | -0.091 | 0.109 | 0.286 | 0.685 |
| Midwest | 50 | 0.106 | 0.287 | -0.697 | -0.013 | 0.137 | 0.274 | 0.833 |
| South | 50 | 0.070 | 0.212 | -0.282 | -0.105 | 0.080 | 0.218 | 0.486 |
| West | 50 | 0.020 | 0.266 | -0.535 | -0.126 | -0.034 | 0.159 | 0.678 |
| Overall | 200 | 0.071 | 0.264 | -0.697 | -0.102 | 0.075 | 0.251 | 0.833 |
Each region is shown as a colored line representing its yearly
average SPEI. A single blue LOESS smooth curve
(aes(group = 1)) is fitted across all regions to capture
the overall US-wide trend. The grey band is the 95% confidence interval
around the smooth. If the blue curve trends downward over time, it
suggests increasing drought severity across the US as a whole.
ggplot(df.panel, aes(x = year, y = spei, color = REGION)) +
geom_line(linewidth = 0.7, alpha = 0.9) +
geom_smooth(
aes(group = 1),
method = 'loess',
se = T,
linewidth = 1.2,
color = 'blue'
) +
labs(
x = 'year',
y = 'spei',
color = 'REGION'
) +
theme_bw()