1. Packages and Data

Loading required packages. spData library provides the us_states dataset with state-level polygon geometries. The us_states dataset is transformed to WGS84 (EPSG:4326) to ensure spatial compatibility with the SPEI raster.

install.packages('sf')
install.packages('spData')
install.packages('dplyr')
install.packages('ggplot2')
install.packages('terra')
rm(list = ls())

library(sf)
library(spData)
library(dplyr)
library(ggplot2)
library(terra)

# Load and reproject US states to WGS84:
data(us_states, package = 'spData')
sf.usstates <- us_states %>% 
  st_transform(4326)

2. Load the SPEI Raster (NetCDF)

The SPEI (Standardised Precipitation-Evapotranspiration Index) is a drought indicator where negative values mean drier-than-normal conditions and positive values mean wetter-than-normal. The NetCDF file downloaded online contains a global raster for every month since 1901. Each layer corresponds to one month.

r.spei <- rast('~/Desktop/Geospatial/spei01.nc')
r.spei
## class       : SpatRaster 
## size        : 360, 720, 1488  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : spei01.nc 
## varname     : spei (Standardized Precipitation-Evapotranspiration Index) 
## names       : spei_1, spei_2, spei_3, spei_4, spei_5, spei_6, ... 
## unit        : 1 
## time (days) : 1901-01-16 to 2024-12-16 (1488 steps)
t.spei <- time(r.spei)
head(t.spei)
## [1] "1901-01-16" "1901-02-15" "1901-03-16" "1901-04-16" "1901-05-16"
## [6] "1901-06-16"
tail(t.spei)
## [1] "2024-07-16" "2024-08-16" "2024-09-16" "2024-10-16" "2024-11-16"
## [6] "2024-12-16"

3. Build Regions (Northeast / Midwest / South / West)

The us_states dataset includes a REGION variable that groups each state into one of four regions. We dissolve the state-level polygons into four regional geometries. This creates a single polygon per region, which we will use to extract average SPEI values.

sf_use_s2(F)

sf.regions <- sf.usstates %>% 
  group_by(REGION) %>% 
  summarise(geometry = st_union(geometry), .groups = 'drop') %>% 
  st_make_valid()

sf.regions
## Simple feature collection with 4 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7042 ymin: 24.55868 xmax: -66.9824 ymax: 49.38436
## Geodetic CRS:  WGS 84
## # A tibble: 4 × 2
##   REGION                                                                geometry
## * <fct>                                                       <MULTIPOLYGON [°]>
## 1 Norteast (((-70.11867 41.24235, -70.27553 41.31046, -70.08207 41.29909, -70.1…
## 2 Midwest  (((-89.19948 36.71605, -89.15908 36.66635, -89.22732 36.56938, -89.3…
## 3 South    (((-81.40189 24.62354, -81.5174 24.62124, -81.68524 24.55868, -81.81…
## 4 West     (((-118.6055 33.031, -118.37 32.83927, -118.4963 32.85157, -118.6055…
sf_use_s2(T)

4. Compute Average SPEI Across Regions for the Past 50 Years

4.1. Choose the Last 50 Years

We extract the year from each layer’s date stamp and dynamically select the most recent 50-year period.

spei.dates <- as.Date(t.spei)
spei.years <- as.integer(format(spei.dates, '%Y'))

year.end <- max(spei.years, na.rm = T)
year.start <- year.end - 49
cat('Period:', year.start, '-', year.end, '\n')
## Period: 1975 - 2024
years.window <- year.start:year.end

4.2. Aggregate Monthly SPEI to Yearly Means

Instead of extracting values from every monthly layer, we first aggregate the monthly layers into yearly mean rasters. This reduces to 50 rasters. Each yearly raster represents the average drought condition for that year across all 12 months.

list.yearly <- lapply(years.window, function(y) {
  idx <- which(spei.years == y)
  if (length(idx) == 0) return(NULL)
  app(r.spei[[idx]], fun = mean, na.rm = T)
})

# Drop NULLs (if any incomplete years):
valid <- !vapply(list.yearly, is.null, logical(1))
years.window <- years.window[valid]
list.yearly <- list.yearly[valid]

length(list.yearly)
## [1] 50

4.3. Extract Regional Means (Panel Dataset)

For each yearly raster, we extract the mean SPEI value within each of the four region polygons. The results are combined into a panel dataset with 200 rows (4 regions × 50 years). Each row contains the year, region name, and the average SPEI for that region-year combination. This panel structure allows us to plot time series for each region and compare trends.

v.regions <- vect(sf.regions)

list.panel <- lapply(seq_along(list.yearly), function(i) {
  r.year <- list.yearly[[i]]
  r.extract <- terra::extract(r.year, v.regions, fun = mean, na.rm = T)
  val.col <- setdiff(names(r.extract), 'ID')[1]
  data.frame(
    year = years.window[i],
    REGION = sf.regions$REGION,
    spei = r.extract[[val.col]]
  )
})

df.panel <- bind_rows(list.panel)
head(df.panel,10)
##    year   REGION       spei
## 1  1975 Norteast  0.3032744
## 2  1975  Midwest  0.1908994
## 3  1975    South  0.3281016
## 4  1975     West  0.3544637
## 5  1976 Norteast  0.2001266
## 6  1976  Midwest -0.6973260
## 7  1976    South -0.0784055
## 8  1976     West -0.1968005
## 9  1977 Norteast  0.2908968
## 10 1977  Midwest  0.1424372

5. Sanity Checks

We verify the data in three ways. First, we check that each region has exactly 50 observations with no missing values, confirming a complete and balanced panel. Second, we cross-check one data point by manually extracting the 12 monthly SPEI values for the South region in 1983, computing their average, and comparing it with the corresponding value in the panel dataset. If they match, it confirms the yearly raster aggregation is working correctly. Third, we compute descriptive statistics — the mean should be close to zero (since SPEI is a standardized index), standard deviations should be moderate, and there should be no extreme or implausible values.

df.panel %>% 
  group_by(REGION) %>% 
  summarise(
    n = n(),
    n_na = sum(is.na(spei))
  )
## # A tibble: 4 × 3
##   REGION       n  n_na
##   <fct>    <int> <int>
## 1 Norteast    50     0
## 2 Midwest     50     0
## 3 South       50     0
## 4 West        50     0

Cross-check: South 1983

We extract all 12 monthly SPEI layers for 1983, compute the mean for the South region manually, and compare it with the value stored in the panel.

# Extract the 12 monthly layers for 1983:
idx.1983 <- which(spei.years == 1983)

v.south <- vect(sf.regions %>% filter(REGION == 'South'))

df.monthly.1983 <- lapply(idx.1983, function(i) {
  r.extract <- terra::extract(r.spei[[i]], v.south, fun = mean, na.rm = T)
  val.col <- setdiff(names(r.extract), 'ID')[1]
  data.frame(
    month = i - idx.1983[1] + 1,
    spei = r.extract[[val.col]]
  )
})
df.monthly.1983 <- bind_rows(df.monthly.1983)
df.monthly.1983 # all 12 months
##    month        spei
## 1      1 -0.03026204
## 2      2  0.80249231
## 3      3  0.63373701
## 4      4  0.80860500
## 5      5  0.50706356
## 6      6  0.29532342
## 7      7 -1.02819357
## 8      8 -0.80992244
## 9      9 -0.38399220
## 10    10  0.45356528
## 11    11  0.75064383
## 12    12  1.05884683
# Manual average:
cat('Manual average from 12 months:', mean(df.monthly.1983$spei, na.rm = T), '\n')
## Manual average from 12 months: 0.2548256
# Compare with panel:
panel.value <- df.panel %>% 
  filter(REGION == 'South', year == 1983)
cat('Panel value:', panel.value$spei, '\n')
## Panel value: 0.2548256
df.desc <- df.panel %>% 
  group_by(REGION) %>% 
  summarise(
    n_obs = n(),
    mean_spei = mean(spei, na.rm = T),
    sd_spei = sd(spei, na.rm = T),
    min_spei = min(spei, na.rm = T),
    p25_spei = quantile(spei, 0.25, na.rm = T),
    median_spei = median(spei, na.rm = T),
    p75_spei = quantile(spei, 0.75, na.rm = T),
    max_spei = max(spei, na.rm = T)
  )

# Overall:
df.total <- df.panel %>% 
  summarise(
    REGION = 'Overall',
    n_obs = n(),
    mean_spei = mean(spei, na.rm = T),
    sd_spei = sd(spei, na.rm = T),
    min_spei = min(spei, na.rm = T),
    p25_spei = quantile(spei, 0.25, na.rm = T),
    median_spei = median(spei, na.rm = T),
    p75_spei = quantile(spei, 0.75, na.rm = T),
    max_spei = max(spei, na.rm = T)
  )

knitr::kable(bind_rows(df.desc, df.total), digits = 3)
REGION n_obs mean_spei sd_spei min_spei p25_spei median_spei p75_spei max_spei
Norteast 50 0.088 0.285 -0.509 -0.091 0.109 0.286 0.685
Midwest 50 0.106 0.287 -0.697 -0.013 0.137 0.274 0.833
South 50 0.070 0.212 -0.282 -0.105 0.080 0.218 0.486
West 50 0.020 0.266 -0.535 -0.126 -0.034 0.159 0.678
Overall 200 0.071 0.264 -0.697 -0.102 0.075 0.251 0.833

6. Plot SPEI Evolution for Each Region

Each region is shown as a colored line representing its yearly average SPEI. A single blue LOESS smooth curve (aes(group = 1)) is fitted across all regions to capture the overall US-wide trend. The grey band is the 95% confidence interval around the smooth. If the blue curve trends downward over time, it suggests increasing drought severity across the US as a whole.

ggplot(df.panel, aes(x = year, y = spei, color = REGION)) +
  geom_line(linewidth = 0.7, alpha = 0.9) +
  geom_smooth(
    aes(group = 1),
    method = 'loess',
    se = T,
    linewidth = 1.2,
    color = 'blue'
  ) +
  labs(
    x = 'year',
    y = 'spei',
    color = 'REGION'
  ) +
  theme_bw()