Generalized Additive Models (GAMs) are flexible tools that have found particular application in analysis of time series. In ecology, a host of recent papers and workshops (i.e. the 2018 Ecological Society of America workshop on GAMs hosted by Eric Pedersen, David L. Miller, Gavin Simpson, and Noam Ross) have drawn attention to the power of GAMs for addressing complex ecological analyses. Given the many ways that GAMs can model temporal data, it is tempting to extrapolate from their smooth functions to produce out of sample forecasts. Here we will inspect the behaviours of smoothing splines when extrapolating outside the training data to examine whether this can be useful in practice.
We will work in the mvgam package, which fits Dynamic
GAMs (DGAMs; Clark & Wells, 2022) using MCMC sampling (note that
either JAGS or Stan is required; installation
links are found here and here). Briefly,
assume \(\tilde{\boldsymbol{y}}_{t}\)
is the conditional expectation of a response variable \(\boldsymbol{y}\) at time \(\boldsymbol{t}\). Assuming \(\boldsymbol{y}\) is drawn from an
exponential distribution (such as Poisson or Negative Binomial) with an
invertible link function, the linear predictor for a Dynamic GAM is
written as:
\[log(\tilde{\boldsymbol{y}}_{t})={\boldsymbol{B}}_0+\sum\limits_{i=1}^I\boldsymbol{s}_{i,t}\boldsymbol{x}_{i,t}+\boldsymbol{z}_{t}\,,\] Here \(\boldsymbol{B}_{0}\) is the unknown intercept, the \(\boldsymbol{s}\)’s are unknown smooth functions of covariates (\(\boldsymbol{x}\)’s) and \(\boldsymbol{z}\) is a dynamic latent trend. Each smooth function \(\boldsymbol{s}_{i}\) is composed of basis expansions whose coefficients, which must be estimated, control the functional relationship between \(\boldsymbol{x}_{i}\) and \(log(\tilde{\boldsymbol{y}})\). The size of the basis expansion limits the smooth’s potential complexity. A larger set of basis functions allows greater flexibility. Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson, Negative Binomial, Tweedie-Poisson) that accommodate common ecological features such as zero-inflation or overdispersion, and that they can be formulated to include hierarchical smoothing for multivariate responses. For the dynamic component, in its most basic form we assume a random walk with drift:
\[\boldsymbol{z}_{t}=\phi+\boldsymbol{z}_{t-1}+\boldsymbol{e}_{t}\,,\]
where \(\phi\) is an optional drift
parameter (if the latent trend is assumed to not be stationary) and
\(\boldsymbol{e}\) is drawn from a
zero-centred Gaussian distribution. This model is easily modified to
include autoregressive terms, which mvgam accomodates up to
order = 3.
Dynamic GAMs are useful when we wish to predict future values from time series that show temporal dependence and we want to avoid extrapolating a smooth (which, as you’ll see below, can sometimes lead to unpredictable and unrealistic behaviours). In addition, smooths can often try to wiggle excessively to capture autocorrelation in a time series, which exacerbates the problem of forecasting ahead. Here we use an exaggerated example to show how a smooth tries to wiggle excessively, leading to overfitting by lessening the smoothing penalty \(\lambda\).
# Fit a model to the mcycle dataset in the MASS package, fixing the smoothing parameter at an abnormally large value
par(mfrow = c(2, 2),
mgp = c(2.5, 1, 0),
mai=c(c(0.7, 0.7, 0.2, 0.2)))
library(mgcv)
## Warning: package 'mgcv' was built under R version 4.2.2
## Loading required package: nlme
## This is mgcv 1.8-41. For overview type 'help("mgcv-package")'.
data(mcycle, package = 'MASS')
m1 <- gam(accel ~ s(times, k = 50), data = mcycle, method = 'REML', sp = 5)
plot(m1, scheme = 1, residuals = TRUE, pch= 16,
bty = 'L',
ylab = expression(lambda == 5), xlab = '',
shade.col = scales::alpha("#B97C7C", 0.6))
# Fit models with increasingly relaxed smooths (lambdas approaching zero)
m2 <- gam(accel ~ s(times, k = 50), data = mcycle, method = 'REML', sp = 1)
plot(m2, scheme = 1, residuals = TRUE, pch= 16,
bty = 'L',
ylab = expression(lambda == 1), xlab = '',
shade.col = scales::alpha("#B97C7C", 0.6))
m3 <- gam(accel ~ s(times, k = 50), data = mcycle, method = 'REML', sp = 0.01)
plot(m3, scheme = 1, residuals = TRUE, pch= 16,
bty = 'L',
ylab = expression(lambda == 0.01), xlab = '',
shade.col = scales::alpha("#B97C7C", 0.6))
m4 <- gam(accel ~ s(times, k = 50), data = mcycle, method = 'REML', sp = 0.0000001)
plot(m4, scheme = 1, residuals = TRUE, pch= 16,
bty = 'L',
ylab = expression(lambda == 0.0000001), xlab = '',
shade.col = scales::alpha("#B97C7C", 0.6))
Notice how wiggly the function becomes in the last plot when \(\lambda\) is very small, indicating that
the function is overfitting to the in-sample training data.
Incidentally, this behaviour mirrors what mgcv’s
gam.check function will often tell you to do if you are
trying to model an autocorrelated time series with a smooth function.
This is because the function will need a high degree of flexibility to
model both the true function and the autocorrelation
A simple simulation example is first used to demonstrate a comparison
between a Dynamic GAM with a latent trend process vs a conventional GAM
that includes an autoregressive observation model. First we simulate an
autocorrelated trend and then draw noisy observations of that trend
using a Negative Binomial observation process with overdispersion
parameter = 10
#devtools::install_github("nicholasjclark/mvgam")
library(mvgam)
library(dplyr)
set.seed(1111)
trend <- cumsum(rnorm(53, -0.1, 0.5))
View the trend and the observation series. Here it is clear that there is autocorrelation in the trend and in the observations
layout(matrix(1:2, ncol = 1, nrow = 2))
plot(trend, type = 'l', lwd = 2, col = 'grey', bty = 'none',
xlab = 'Time')
observations <- rnbinom(53, size = 10, mu = 4 + exp(trend))
plot(observations, type = 'l', lwd = 2, bty = 'none',
xlab = 'Time')
Gather the data into a dataframe format and calculate observation lags for lag = 1, 2, and 3
data.frame(y = observations) %>%
dplyr::mutate(lag1 = lag(y),
lag2 = lag(y, 2),
lag3 = lag(y, 3)) %>%
dplyr::slice_tail(n = 50) -> sim_dat
sim_dat$time <- 1:50
Split the data into training and testing portions and examine the objects
data_train <- sim_dat[1:40,]
data_test <- sim_dat[41:50,]
head(data_train)
head(data_test)
As a first pass at modelling this series, we will fit a GAM with parametric effects of lags 1, 2, and 3. Our primary interest is to estimate the AR parameters and the overdispersion parameter of the Negative Binomial observation process
m_mgcv <- gam(y ~ lag1 + lag2 + lag3,
data = data_train,
method = "REML", family = nb())
We can view the summary of the mgcv model to see that
the posterior estimates for the AR terms for lags 1 and 3 are strongly
non-zero
summary(m_mgcv)
##
## Family: Negative Binomial(2.168)
## Link function: log
##
## Formula:
## y ~ lag1 + lag2 + lag3
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.61782 0.17936 9.020 < 2e-16 ***
## lag1 0.04014 0.01060 3.787 0.000153 ***
## lag2 -0.01193 0.01257 -0.949 0.342730
## lag3 0.03083 0.01053 2.928 0.003408 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## R-sq.(adj) = 0.00169 Deviance explained = 50.2%
## -REML = 142.1 Scale est. = 1 n = 40
Forecasting from this model requires a function to recursively update
the design matrix (by filling in the values for the dynamic
lag(y) covariates) at each successive one-step ahead
prediction. Here we define a function to iteratively propagate a
forecast forward according to the AR3 observation model. Note that we
have used an argument to allow the lags to be fed into the model on the
log scale, this will make more sense in a moment as we inspect the
forecasts of the model
recurse_ar3
recurse_ar3 = function(model, coef_sim, lagged_vals, h, log_lags = F){
# Initiate state vector
states <- rep(NA, length = h + 3)
# Last three values of the conditional expectations begin the state vector
if(log_lags){
states[1] <- exp(as.numeric(lagged_vals[3]))
states[2] <- exp(as.numeric(lagged_vals[2]))
states[3] <- exp(as.numeric(lagged_vals[1]))
} else {
states[1] <- as.numeric(lagged_vals[3])
states[2] <- as.numeric(lagged_vals[2])
states[3] <- as.numeric(lagged_vals[1])
}
# Get a random sample of the smooth coefficient uncertainty matrix
# to use for the entire forecast horizon of this particular path
gam_coef_index <- sample(seq(1, NROW(coef_sim)), size = 1)
# For each following timestep, recursively predict based on the
# predictions at each previous lag
for (t in 4:(h + 3)) {
newdata <- data_test[t-3, ]
if(log_lags){
newdata$lag1 <- log(states[t-1] + 0.01)
newdata$lag2 <- log(states[t-2] + 0.01)
newdata$lag3 <- log(states[t-3] + 0.01)
} else {
newdata$lag1 <- states[t-1]
newdata$lag2 <- states[t-2]
newdata$lag3 <- states[t-3]
}
colnames(newdata) <- colnames(data_test)
Xp <- predict(model, newdata = newdata, type = 'lpmatrix')
# Calculate the posterior prediction for this timepoint; use an upper constraint
# to ensure the values don't blow up to infinity
mu <- rnbinom(1, mu = min(100000, exp(Xp %*% coef_sim[gam_coef_index,])),
size = model$family$getTheta(TRUE))
# Fill in the state vector and iterate to the next timepoint
states[t] <- mu
}
# Return the forecast path
states[-c(1:3)]
}
Draw 1000 parameter estimates from the GAM posterior and use these to propagate the forecast for 10 timesteps ahead
coef_sim <- MASS::mvrnorm(1000, mu = m_mgcv$coefficients, Sigma = m_mgcv$Vp)
gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
for(i in 1:1000){
gam_sims[i,] <- recurse_ar3(m_mgcv,
coef_sim,
lagged_vals = c(data_test$lag1[1],
data_test$lag2[1],
data_test$lag3[1]),
h = 10)
}
mvgamNow we will fit a competing Dynamic GAM model, which estimates a
latent temporal process rather than using an autoregressive observation
model. Note that to actually condition models with MCMC sampling, either
the JAGS software must be installed (along with the
R packages rjags and runjags) or
the Stan software must be installed (along with the package
rstan and, optionally, the cmdstanr package).
These are not listed as dependencies of mvgam to ensure
that installation is less difficult. If users wish to fit the models
using mvgam, please refer to installation links for
JAGS here or for
Stan and rstan here). This
model’s parameters are estimated in a Bayesian framework using (by
default) the Gibbs sampling algorithms available in the
JAGS software. mvgam takes a fitted
gam model and adapts the model file to fit in
JAGS, with possible extensions to deal with stochastic
trend components and other features.
m_mvgam <- mvgam(formula = y ~ 1,
data = data_train,
newdata = data_test,
family = 'nb',
trend_model = 'AR3',
chains = 4,
burnin = 1000)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 1500 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 1500 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 1500 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 1500 [ 0%] (Warmup)
## Chain 1 Iteration: 100 / 1500 [ 6%] (Warmup)
## Chain 1 Iteration: 200 / 1500 [ 13%] (Warmup)
## Chain 2 Iteration: 100 / 1500 [ 6%] (Warmup)
## Chain 2 Iteration: 200 / 1500 [ 13%] (Warmup)
## Chain 3 Iteration: 100 / 1500 [ 6%] (Warmup)
## Chain 3 Iteration: 200 / 1500 [ 13%] (Warmup)
## Chain 4 Iteration: 100 / 1500 [ 6%] (Warmup)
## Chain 4 Iteration: 200 / 1500 [ 13%] (Warmup)
## Chain 1 Iteration: 300 / 1500 [ 20%] (Warmup)
## Chain 1 Iteration: 400 / 1500 [ 26%] (Warmup)
## Chain 2 Iteration: 300 / 1500 [ 20%] (Warmup)
## Chain 4 Iteration: 300 / 1500 [ 20%] (Warmup)
## Chain 2 Iteration: 400 / 1500 [ 26%] (Warmup)
## Chain 3 Iteration: 300 / 1500 [ 20%] (Warmup)
## Chain 4 Iteration: 400 / 1500 [ 26%] (Warmup)
## Chain 1 Iteration: 500 / 1500 [ 33%] (Warmup)
## Chain 2 Iteration: 500 / 1500 [ 33%] (Warmup)
## Chain 3 Iteration: 400 / 1500 [ 26%] (Warmup)
## Chain 4 Iteration: 500 / 1500 [ 33%] (Warmup)
## Chain 1 Iteration: 600 / 1500 [ 40%] (Warmup)
## Chain 2 Iteration: 600 / 1500 [ 40%] (Warmup)
## Chain 3 Iteration: 500 / 1500 [ 33%] (Warmup)
## Chain 4 Iteration: 600 / 1500 [ 40%] (Warmup)
## Chain 1 Iteration: 700 / 1500 [ 46%] (Warmup)
## Chain 2 Iteration: 700 / 1500 [ 46%] (Warmup)
## Chain 3 Iteration: 600 / 1500 [ 40%] (Warmup)
## Chain 4 Iteration: 700 / 1500 [ 46%] (Warmup)
## Chain 1 Iteration: 800 / 1500 [ 53%] (Warmup)
## Chain 2 Iteration: 800 / 1500 [ 53%] (Warmup)
## Chain 3 Iteration: 700 / 1500 [ 46%] (Warmup)
## Chain 4 Iteration: 800 / 1500 [ 53%] (Warmup)
## Chain 1 Iteration: 900 / 1500 [ 60%] (Warmup)
## Chain 2 Iteration: 900 / 1500 [ 60%] (Warmup)
## Chain 3 Iteration: 800 / 1500 [ 53%] (Warmup)
## Chain 4 Iteration: 900 / 1500 [ 60%] (Warmup)
## Chain 1 Iteration: 1000 / 1500 [ 66%] (Warmup)
## Chain 1 Iteration: 1001 / 1500 [ 66%] (Sampling)
## Chain 3 Iteration: 900 / 1500 [ 60%] (Warmup)
## Chain 4 Iteration: 1000 / 1500 [ 66%] (Warmup)
## Chain 4 Iteration: 1001 / 1500 [ 66%] (Sampling)
## Chain 2 Iteration: 1000 / 1500 [ 66%] (Warmup)
## Chain 3 Iteration: 1000 / 1500 [ 66%] (Warmup)
## Chain 3 Iteration: 1001 / 1500 [ 66%] (Sampling)
## Chain 1 Iteration: 1100 / 1500 [ 73%] (Sampling)
## Chain 2 Iteration: 1001 / 1500 [ 66%] (Sampling)
## Chain 2 Iteration: 1100 / 1500 [ 73%] (Sampling)
## Chain 4 Iteration: 1100 / 1500 [ 73%] (Sampling)
## Chain 1 Iteration: 1200 / 1500 [ 80%] (Sampling)
## Chain 2 Iteration: 1200 / 1500 [ 80%] (Sampling)
## Chain 3 Iteration: 1100 / 1500 [ 73%] (Sampling)
## Chain 4 Iteration: 1200 / 1500 [ 80%] (Sampling)
## Chain 1 Iteration: 1300 / 1500 [ 86%] (Sampling)
## Chain 2 Iteration: 1300 / 1500 [ 86%] (Sampling)
## Chain 2 Iteration: 1400 / 1500 [ 93%] (Sampling)
## Chain 3 Iteration: 1200 / 1500 [ 80%] (Sampling)
## Chain 4 Iteration: 1300 / 1500 [ 86%] (Sampling)
## Chain 1 Iteration: 1400 / 1500 [ 93%] (Sampling)
## Chain 2 Iteration: 1500 / 1500 [100%] (Sampling)
## Chain 3 Iteration: 1300 / 1500 [ 86%] (Sampling)
## Chain 4 Iteration: 1400 / 1500 [ 93%] (Sampling)
## Chain 2 finished in 2.0 seconds.
## Chain 1 Iteration: 1500 / 1500 [100%] (Sampling)
## Chain 1 finished in 2.1 seconds.
## Chain 3 Iteration: 1400 / 1500 [ 93%] (Sampling)
## Chain 4 Iteration: 1500 / 1500 [100%] (Sampling)
## Chain 4 finished in 2.1 seconds.
## Chain 3 Iteration: 1500 / 1500 [100%] (Sampling)
## Chain 3 finished in 2.3 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 2.1 seconds.
## Total execution time: 2.5 seconds.
You can view the summary of the resulting model using the
summary.mvgam S3 function, which gives an overview of
convergence and effective sample size diagnostics for the model’s
parameters
summary(m_mvgam)
## GAM formula:
## y ~ 1
##
## Family:
## negative binomial
##
## Link function:
## log
##
## Trend model:
## AR3
##
## N series:
## 1
##
## N timepoints:
## 40
##
## Status:
## Fitted using Stan
##
## Dispersion parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi[1] 3.315389 12.1088 204.9241 1 717
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 1.887496 2.335305 2.846673 1.02 191
##
## Latent trend AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## ar1[1] 0.1778276 0.6933390 1.3049092 1.01 239
## ar2[1] -0.8116426 -0.0771788 0.6368510 1.02 328
## ar3[1] -0.4133920 0.2939680 0.7451256 1.02 220
## sigma[1] 0.2230397 0.5402350 0.8172776 1.04 116
##
## Stan MCMC diagnostics
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 4 of 2000 iterations ended with a divergence (0.2%)
## *Try running with larger adapt_delta to remove the divergences
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior
##
The model file can also be viewed, which can be handy for making modifications to the model so that extra stochastic elements can be added to a user’s bespoke model. Notice how the summary at the top of the model file describes the data elements (and their dimensions) that are needed to condition the model, again the hope is that this simplifies the task of modifying the model for any additional user-specific requirements
m_mvgam$model_file
## [1] ""
## [2] "// Stan model code generated by package mvgam"
## [3] "functions {"
## [4] "vector rep_each(vector x, int K) {"
## [5] "int N = rows(x);"
## [6] "vector[N * K] y;"
## [7] "int pos = 1;"
## [8] "for (n in 1:N) {"
## [9] "for (k in 1:K) {"
## [10] "y[pos] = x[n];"
## [11] "pos += 1;"
## [12] "}"
## [13] "}"
## [14] "return y;"
## [15] "}"
## [16] "}"
## [17] "data {"
## [18] "int<lower=0> total_obs; // total number of observations"
## [19] "int<lower=0> n; // number of timepoints per series"
## [20] "int<lower=0> n_series; // number of series"
## [21] "int<lower=0> num_basis; // total number of basis coefficients"
## [22] "real p_taus[1]; // prior precisions for parametric coefficients"
## [23] "real p_coefs[1]; // prior locations for parametric coefficients"
## [24] "matrix[total_obs, num_basis] X; // mgcv GAM design matrix"
## [25] "int<lower=0> ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)"
## [26] ""
## [27] "int<lower=0> n_nonmissing; // number of nonmissing observations"
## [28] "int<lower=0> flat_ys[n_nonmissing]; // flattened nonmissing observations"
## [29] "matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations"
## [30] "int<lower=0> obs_ind[n_nonmissing]; // indices of nonmissing observations"
## [31] "}"
## [32] "parameters {"
## [33] "// raw basis coefficients"
## [34] "vector[num_basis] b_raw;"
## [35] ""
## [36] "// negative binomial overdispersion"
## [37] "vector<lower=0>[n_series] phi_inv;"
## [38] ""
## [39] "// latent trend AR1 terms"
## [40] "vector<lower=-1.5,upper=1.5>[n_series] ar1;"
## [41] ""
## [42] "// latent trend AR2 terms"
## [43] "vector<lower=-1.5,upper=1.5>[n_series] ar2;"
## [44] ""
## [45] "// latent trend AR3 terms"
## [46] "vector<lower=-1.5,upper=1.5>[n_series] ar3;"
## [47] ""
## [48] "// latent trend variance parameters"
## [49] "vector<lower=0>[n_series] sigma;"
## [50] ""
## [51] "// latent trends"
## [52] "matrix[n, n_series] trend;"
## [53] ""
## [54] "}"
## [55] ""
## [56] "transformed parameters {"
## [57] "// basis coefficients"
## [58] "vector[num_basis] b;"
## [59] ""
## [60] "b[1:num_basis] = b_raw[1:num_basis];"
## [61] ""
## [62] "}"
## [63] ""
## [64] "model {"
## [65] "for (i in 1:1) {"
## [66] "b_raw[i] ~ normal(p_coefs[i], sqrt(1 / p_taus[i]));"
## [67] "}"
## [68] ""
## [69] "// priors for AR parameters"
## [70] "ar1 ~ std_normal();"
## [71] "ar2 ~ std_normal();"
## [72] "ar3 ~ std_normal();"
## [73] ""
## [74] "// priors for overdispersion parameters"
## [75] "phi_inv ~ student_t(3, 0, 0.1);"
## [76] ""
## [77] "// priors for latent trend variance parameters"
## [78] "sigma ~ exponential(2);"
## [79] ""
## [80] "// trend estimates"
## [81] "trend[1, 1:n_series] ~ normal(0, sigma);"
## [82] "trend[2, 1:n_series] ~ normal(trend[1, 1:n_series] * ar1, sigma);"
## [83] "trend[3, 1:n_series] ~ normal(trend[2, 1:n_series] * ar1 + trend[1, 1:n_series] * ar2, sigma);"
## [84] "for(s in 1:n_series){"
## [85] "trend[4:n, s] ~ normal(ar1[s] * trend[3:(n - 1), s] + ar2[s] * trend[2:(n - 2), s] + ar3[s] * trend[1:(n - 3), s], sigma[s]);"
## [86] "}"
## [87] ""
## [88] "{"
## [89] "// likelihood functions"
## [90] "vector[n_nonmissing] flat_trends;"
## [91] "real flat_phis[n_nonmissing];"
## [92] "flat_trends = (to_vector(trend))[obs_ind];"
## [93] "flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);"
## [94] "flat_ys ~ neg_binomial_2("
## [95] "exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0)),"
## [96] "inv(flat_phis));"
## [97] "}"
## [98] "}"
## [99] ""
## [100] ""
## [101] "generated quantities {"
## [102] "vector[total_obs] eta;"
## [103] "matrix[n, n_series] mus;"
## [104] "vector[n_series] tau;"
## [105] "array[n, n_series] int ypred;"
## [106] "matrix[n, n_series] phi_vec;"
## [107] "vector[n_series] phi;"
## [108] "phi = inv(phi_inv);"
## [109] "for (s in 1:n_series) {"
## [110] "phi_vec[1:n,s] = rep_vector(phi[s], n);"
## [111] "}"
## [112] ""
## [113] "for (s in 1:n_series) {"
## [114] "tau[s] = pow(sigma[s], -2.0);"
## [115] "}"
## [116] ""
## [117] "// posterior predictions"
## [118] "eta = X * b;"
## [119] "for(s in 1:n_series){ "
## [120] "mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];"
## [121] "ypred[1:n, s] = neg_binomial_2_rng(exp(mus[1:n, s]), phi_vec[1:n, s]);"
## [122] "}"
## [123] "}"
## [124] ""
Users can also view the data and initial values that
mvgam uses to condition the model by either setting the
argument return_model_data = TRUE or by setting
run_model = FALSE. The latter is faster for simply
generating the necessary objects and initial value functions. These
options will also return the pregam structure that is
necessary to convert MCMC samples into an object that can
be readily used by mgcv to generate predictions and
calculate important quantities such as effective degrees of freedom (see
help(jagam) for more details)
mod_skeleton <- mvgam(y ~ 1,
data = data_train,
newdata = data_test,
family = 'nb',
trend_model = 'AR3',
chains = 4,
burnin = 1000,
run_model = FALSE)
str(mod_skeleton$model_data)
## List of 15
## $ y : num [1:50, 1] 6 4 3 9 7 4 12 2 9 4 ...
## $ n : int 50
## $ X : num [1:50, 1] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:50] "X" "X.1" "X.2" "X.3" ...
## .. ..$ : chr "X.Intercept."
## $ p_coefs : num [1(1d)] 2.57
## $ p_taus : num [1(1d)] 10.8
## $ ytimes : int [1:50, 1] 1 2 3 4 5 6 7 8 9 10 ...
## $ n_series : int 1
## $ min_eps : num 2.22e-16
## $ y_observed : num [1:50, 1] 1 1 1 1 1 1 1 1 1 1 ...
## $ total_obs : int 50
## $ num_basis : int 1
## $ n_nonmissing: int 40
## $ obs_ind : int [1:40] 1 2 3 4 5 6 7 8 9 10 ...
## $ flat_ys : num [1:40] 6 4 3 9 7 4 12 2 9 4 ...
## $ flat_xs : num [1:40, 1] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:40] "X" "X.1" "X.2" "X.3" ...
## .. ..$ : NULL
mod_skeleton$inits
## function() {
## list(b_raw = runif(model_data$num_basis, -2, 2))
## }
## <bytecode: 0x000001f51f1bbc58>
## <environment: 0x000001f52e30bab0>
head(mod_skeleton$pregam)
## NULL
The same steps above can be used to generate Stan data
and modelling files:
mod_skeleton <- mvgam(y ~ 1,
data = data_train,
newdata = data_test,
family = 'nb',
trend_model = 'AR3',
chains = 4,
use_stan = TRUE,
run_model = FALSE)
str(mod_skeleton$model_data)
## List of 15
## $ y : num [1:50, 1] 6 4 3 9 7 4 12 2 9 4 ...
## $ n : int 50
## $ X : num [1:50, 1] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:50] "X" "X.1" "X.2" "X.3" ...
## .. ..$ : chr "X.Intercept."
## $ p_coefs : num [1(1d)] 2.57
## $ p_taus : num [1(1d)] 11
## $ ytimes : int [1:50, 1] 1 2 3 4 5 6 7 8 9 10 ...
## $ n_series : int 1
## $ min_eps : num 2.22e-16
## $ y_observed : num [1:50, 1] 1 1 1 1 1 1 1 1 1 1 ...
## $ total_obs : int 50
## $ num_basis : int 1
## $ n_nonmissing: int 40
## $ obs_ind : int [1:40] 1 2 3 4 5 6 7 8 9 10 ...
## $ flat_ys : num [1:40] 6 4 3 9 7 4 12 2 9 4 ...
## $ flat_xs : num [1:40, 1] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:40] "X" "X.1" "X.2" "X.3" ...
## .. ..$ : NULL
mod_skeleton$inits
## function() {
## list(b_raw = runif(model_data$num_basis, -2, 2))
## }
## <bytecode: 0x000001f51f1bbc58>
## <environment: 0x000001f52ef0c660>
head(mod_skeleton$pregam)
## NULL
The Stan model file is also designed to be as
descriptive as possible about the data objects that are required to
condition the model
cat(c("```stan", mod_skeleton$model_file, "```"), sep = "\n")
// Stan model code generated by package mvgam
functions {
vector rep_each(vector x, int K) {
int N = rows(x);
vector[N * K] y;
int pos = 1;
for (n in 1:N) {
for (k in 1:K) {
y[pos] = x[n];
pos += 1;
}
}
return y;
}
}
data {
int<lower=0> total_obs; // total number of observations
int<lower=0> n; // number of timepoints per series
int<lower=0> n_series; // number of series
int<lower=0> num_basis; // total number of basis coefficients
real p_taus[1]; // prior precisions for parametric coefficients
real p_coefs[1]; // prior locations for parametric coefficients
matrix[total_obs, num_basis] X; // mgcv GAM design matrix
int<lower=0> ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
int<lower=0> n_nonmissing; // number of nonmissing observations
int<lower=0> flat_ys[n_nonmissing]; // flattened nonmissing observations
matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
int<lower=0> obs_ind[n_nonmissing]; // indices of nonmissing observations
}
parameters {
// raw basis coefficients
vector[num_basis] b_raw;
// negative binomial overdispersion
vector<lower=0>[n_series] phi_inv;
// latent trend AR1 terms
vector<lower=-1.5,upper=1.5>[n_series] ar1;
// latent trend AR2 terms
vector<lower=-1.5,upper=1.5>[n_series] ar2;
// latent trend AR3 terms
vector<lower=-1.5,upper=1.5>[n_series] ar3;
// latent trend variance parameters
vector<lower=0>[n_series] sigma;
// latent trends
matrix[n, n_series] trend;
}
transformed parameters {
// basis coefficients
vector[num_basis] b;
b[1:num_basis] = b_raw[1:num_basis];
}
model {
for (i in 1:1) {
b_raw[i] ~ normal(p_coefs[i], sqrt(1 / p_taus[i]));
}
// priors for AR parameters
ar1 ~ std_normal();
ar2 ~ std_normal();
ar3 ~ std_normal();
// priors for overdispersion parameters
phi_inv ~ student_t(3, 0, 0.1);
// priors for latent trend variance parameters
sigma ~ exponential(2);
// trend estimates
trend[1, 1:n_series] ~ normal(0, sigma);
trend[2, 1:n_series] ~ normal(trend[1, 1:n_series] * ar1, sigma);
trend[3, 1:n_series] ~ normal(trend[2, 1:n_series] * ar1 + trend[1, 1:n_series] * ar2, sigma);
for(s in 1:n_series){
trend[4:n, s] ~ normal(ar1[s] * trend[3:(n - 1), s] + ar2[s] * trend[2:(n - 2), s] + ar3[s] * trend[1:(n - 3), s], sigma[s]);
}
{
// likelihood functions
vector[n_nonmissing] flat_trends;
real flat_phis[n_nonmissing];
flat_trends = (to_vector(trend))[obs_ind];
flat_phis = to_array_1d(rep_each(phi_inv, n)[obs_ind]);
flat_ys ~ neg_binomial_2(
exp(append_col(flat_xs, flat_trends) * append_row(b, 1.0)),
inv(flat_phis));
}
}
generated quantities {
vector[total_obs] eta;
matrix[n, n_series] mus;
vector[n_series] tau;
array[n, n_series] int ypred;
matrix[n, n_series] phi_vec;
vector[n_series] phi;
phi = inv(phi_inv);
for (s in 1:n_series) {
phi_vec[1:n,s] = rep_vector(phi[s], n);
}
for (s in 1:n_series) {
tau[s] = pow(sigma[s], -2.0);
}
// posterior predictions
eta = X * b;
for(s in 1:n_series){
mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];
ypred[1:n, s] = neg_binomial_2_rng(exp(mus[1:n, s]), phi_vec[1:n, s]);
}
}
plot_fc_horizon
plot_fc_horizon = function(gam_sims, main){
probs = c(0.05, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.95)
cred <- sapply(1:NCOL(gam_sims),
function(n) quantile(gam_sims[,n],
probs = probs))
c_light <- c("#DCBCBC")
c_light_highlight <- c("#C79999")
c_mid <- c("#B97C7C")
c_mid_highlight <- c("#A25050")
c_dark <- c("#8F2727")
c_dark_highlight <- c("#7C0000")
ylim <- range(c(0, data_test$y[1:NCOL(cred)] * 1.25,
cred * 1.25))
pred_vals <- 1:NCOL(cred)
plot(1, xlim = c(1, NCOL(cred)),
bty = 'L',
col = rgb(1,0,0, alpha = 0),
ylim = ylim,
ylab = 'Predicted counts',
xlab = 'Forecast horizon',
main = main)
polygon(c(pred_vals, rev(pred_vals)), c(cred[1,], rev(cred[9,])),
col = c_light, border = NA)
polygon(c(pred_vals, rev(pred_vals)), c(cred[2,], rev(cred[8,])),
col = c_light_highlight, border = NA)
polygon(c(pred_vals, rev(pred_vals)), c(cred[3,], rev(cred[7,])),
col = c_mid, border = NA)
polygon(c(pred_vals, rev(pred_vals)), c(cred[4,], rev(cred[6,])),
col = c_mid_highlight, border = NA)
lines(pred_vals, cred[5,], col = c_dark, lwd = 2.5)
points(data_test$y[1:NCOL(cred)], pch = 16, cex = .65, col = 'white')
points(data_test$y[1:NCOL(cred)], pch = 16, cex = .55)
box(bty = 'L', lwd = 2)
# Calculate out of sample DRPS and print the score
drps_score <- function(truth, fc, interval_width = 0.9){
nsum <- 1000
Fy = ecdf(fc)
ysum <- 0:nsum
indicator <- ifelse(ysum - truth >= 0, 1, 0)
score <- sum((indicator - Fy(ysum))^2)
# Is value within empirical interval?
interval <- quantile(fc, probs = c((1-interval_width)/2,
(interval_width + (1-interval_width)/2)))
in_interval <- ifelse(truth <= interval[2] & truth >= interval[1], 1, 0)
return(c(score, in_interval))
}
# Wrapper to operate on all observations in fc_horizon
drps_mcmc_object <- function(truth, fc, interval_width = 0.9){
indices_keep <- which(!is.na(truth))
if(length(indices_keep) == 0){
scores = data.frame('drps' = rep(NA, length(truth)),
'interval' = rep(NA, length(truth)))
} else {
scores <- matrix(NA, nrow = length(truth), ncol = 2)
for(i in indices_keep){
scores[i,] <- drps_score(truth = as.vector(truth)[i],
fc = fc[,i], interval_width)
}
}
scores
}
truth <- data_test$y[1:NCOL(cred)]
fc <- gam_sims
message('Out of sample DRPS:')
print(sum(drps_mcmc_object(as.vector(truth),
fc)[,1]))
message()
}
Evaluate forecasts for the two models; first the mgcv
autoregressive observation model
plot_fc_horizon(gam_sims = gam_sims[,1:10], main = 'mgcv')
## Out of sample DRPS:
## [1] 649.8801
##
And now the mvgam latent trend model
plot_fc_horizon(gam_sims = MCMCvis::MCMCchains(m_mvgam$model_output,
'ypred')[,(NROW(m_mvgam$obs_data)+1):
(NROW(m_mvgam$obs_data)+10)],
main = 'mvgam')
## Out of sample DRPS:
## [1] 73.42574
##
Generally the uncertainy blows out when forecasting further ahead for
the mgcv model (why might this be?); if we only look 2
timesteps ahead, it is not quite as terrible but it becomes more clear
what is happening
plot_fc_horizon(gam_sims = gam_sims[,1:2], main = 'mgcv')
## Out of sample DRPS:
## [1] 18.44607
##
plot_fc_horizon(gam_sims = MCMCvis::MCMCchains(m_mvgam$model_output,
'ypred')[,(NROW(m_mvgam$obs_data)+1):
(NROW(m_mvgam$obs_data)+2)],
main = 'mvgam')
## Out of sample DRPS:
## [1] 14.01938
##
Why does autocorrelated observation model perform so badly? Q-Q plots give some insight, as they clearly indicate the dynamic latent trend model shows a better fit of the observation process to the data compared to the autocorrelated observation model
plot(m_mvgam, 'residuals')
qq.gam(m_mgcv, pch = 16, lwd = 3, cex = 0.8)
In fact the the autocorrelated observation model is overestimating
the amount of overdispersion in the data. This can partially explain the
exponentially growing uncertainties in the mgcv forecast The truth for
the ovderdispersion parameter is 10. This is likely driven
by the autoregressive model’s high sensitivity to the noisy observation
process. Observations in ecology very often display properties of
overdispersion, and this results in observation ‘outliers’ that can have
large leverage on resulting parameter estimates for autoregressive
models. The noisy observation process is not a problem for latent trend
models such as those estimated by mvgam, as the observation
model is estimated separately from the temporal process (as is the norm
for state-space models)
m_mgcv$family$getTheta(TRUE)
## [1] 2.167732
quantile(MCMCvis::MCMCchains(m_mvgam$model_output, 'phi'), c(0.1, 0.5, 0.9))
## 10% 50% 90%
## 4.76787 12.10880 50.44248
Looking at estimates for the full trend and forecast distributions
shows how the mvgam model is clearly doing a decent job of
tracking the latent state up to the end of the training period while
also giving reasonable estimates for the overdispersion in the data
layout(matrix(1:2, ncol = 1, nrow = 2))
plot(trend[4:53], xlim = c(0, 50), bty = 'L', type = 'l', lwd = 2, col = 'grey',
ylab = 'True trend')
box(bty = 'L', lwd = 2)
plot(m_mvgam, 'trend', newdata = data_test)
layout(1)
plot(m_mvgam, 'forecast', newdata = data_test)
## Out of sample DRPS:
## [1] 73.42574
##
We can get a better idea of what is happening for the autoregressive observation model by plotting posterior realisations on the log scale. Here each line is a realisation, and we show these lines as a spaghetti plot to make it easier to visualise the diversity of possible forecast paths that are compatible with our model configuration.
plot(1, type = "n", bty = 'L',
xlab = 'Time',
ylab = 'GAMAR simulations (log scale)',
xlim = c(0, NCOL(gam_sims)),
ylim = range(log(gam_sims + 0.01)))
for(i in 1:30){
lines(x = 1:NCOL(gam_sims),
y = log(gam_sims[i,] + 0.01),
col = 'white',
lwd = 3)
lines(x = 1:NCOL(gam_sims),
y = log(gam_sims[i,] + 0.01),
col = sample(c("#DCBCBC80",
"#C7999980",
"#B97C7C80",
"#A2505080",
"#7C000080"), 1),
lwd = 2.75)
}
box(bty = 'L', lwd = 2)
mvgam offers options to plot realisations for most types
of plots as well. Repeating the trend plot above (which automatically
shows the expected latent trend on the log scale) shows that the dynamic
GAM does not suffer from the same explosion towards infinity that the
autoregressive model suffers from
layout(matrix(1:2, ncol = 1, nrow = 2))
plot(trend[4:53], xlim = c(0, 50), type = 'l', lwd = 2, col = 'grey',
ylab = 'True trend', bty = 'L')
box(bty = 'L', lwd = 2)
plot(m_mvgam, 'trend', newdata = data_test, realisations = TRUE,
n_realisations = 10)
layout(1)
From the plots above it is apparent that although some of the realisations are sensible, as soon as a path climbs upward it tends to skyrocket up towards infinity. It is sensible then to conclude that an autocorrelated observation model will be extremely sensitive to the measurement process (particularly when there is overdispersion, outliers or missing observations), which is not a problem for the latent temporal process model. With this information in mind, could we improve the autocorrelated observation model by using logs of lagged observations, rather than raw lagged observations?
data.frame(y = observations) %>%
dplyr::mutate(lag1 = log(lag(y) + 0.01),
lag2 = log(lag(y, 2) + 0.01),
lag3 = log(lag(y, 3) + 0.01)) %>%
dplyr::mutate(cov1 = rnorm(53)) %>%
dplyr::slice_tail(n = 50) -> sim_dat
sim_dat$time <- 1:50
data_train <- sim_dat[1:40,]
data_test <- sim_dat[41:50,]
m_mgcv <- gam(y ~ lag1 + lag2 + lag3,
data = data_train,
method = "REML", family = nb())
Compute the forecasts but use logged lags instead of raw lags
coef_sim <- MASS::mvrnorm(1000, mu = m_mgcv$coefficients, Sigma = m_mgcv$Vp)
gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
for(i in 1:1000){
gam_sims[i,] <- recurse_ar3(m_mgcv,
coef_sim,
lagged_vals = c(data_test$lag1[1],
data_test$lag2[1],
data_test$lag3[1]),
log_lags = TRUE,
h = 10)
}
Evaluate the log-AR model against the dynamic GAM
plot_fc_horizon(gam_sims = gam_sims[,1:10], main = 'mgcv')
## Out of sample DRPS:
## [1] 105.754
##
plot_fc_horizon(gam_sims = MCMCvis::MCMCchains(m_mvgam$model_output,
'ypred')[,(NROW(m_mvgam$obs_data)+1):
(NROW(m_mvgam$obs_data)+10)],
main = 'mvgam')
## Out of sample DRPS:
## [1] 73.42574
##
We should also re-check the autoregressive model’s realisations, which look much better now
plot(1, type = "n", bty = 'L',
xlab = 'Time',
ylab = 'GAMAR simulations (log scale)',
xlim = c(0, NCOL(gam_sims)),
ylim = range(log(gam_sims + 0.01)))
for(i in 1:30){
lines(x = 1:NCOL(gam_sims),
y = log(gam_sims[i,] + 0.01),
col = 'white',
lwd = 3)
lines(x = 1:NCOL(gam_sims),
y = log(gam_sims[i,] + 0.01),
col = sample(c("#DCBCBC80",
"#C7999980",
"#B97C7C80",
"#A2505080",
"#7C000080"), 1),
lwd = 2.75)
}
box(bty = 'L', lwd = 2)
This version (with logged lags) performs much better than the raw lag
version, with sensible forecasts that are not blowing up to infinity. In
fact, this forecast is now more comparable to the dynamic GAM where the
latent temporal process is estimated separately from the observation
process, at least for this particular simulation. There are of course
many other advantages to the state-space representation that is used by
mvgam, including:
It is far easier to handle missing values (which will result in
loss of many observations when fitting mgcv models,
especially at higher order lags, due to missing outcomes AND missing
predictors) and measurement error
Latent trend processes provide recursive expressions for h-step ahead prediction and historical filtering
Irregularly sampled observational time series can be readily modeled as the latent temporal process can simply be ‘forecasted’ over the missing observations
Multiple observation processes can depend on shared latent states
Now for an empirical example of DGAMs, using historical data on US
fisheries landings made available by a recent paper published in
Methods in Ecology and Evolution (Smoothed
dynamic factor analysis for identifying trends in multivariate time
series; Ward et al 2021). The data consists of adjusted landings for
13 species or groups reported annually from multiple fisheries over a
39-year period (1981–2019)
data <- read.csv('https://raw.githubusercontent.com/fate-ewi/gpdfa/main/data/port_landings_table2.csv')
head(data)
First some data wrangling to clean the column names, create logged
lags for the species of interest (the Ling cod) and gather values for an
additional species that will be used as a (potentially nonlinear)
covariate (the whiting). These two species were chosen in an ad-hoc
fashion based on results of the above paper, which found that they
demonstrated somewhat opposing patterns in their historical trends. Note
that this is not meant to be a full in-depth analysis of these data, but
rather a simple example to demonstrate how mvgam can be
used for empirical series.
data %>%
janitor::clean_names() %>%
dplyr::mutate(y = lingcod) %>%
dplyr::mutate(lag1 = log(lag(y) + 0.01),
lag2 = log(lag(y, 2) + 0.01),
lag3 = log(lag(y, 3) + 0.01)) %>%
dplyr::select(y,
lag1,
lag2,
lag3,
p_whiting) %>%
# The nonlinear covariate will be a smooth of standardised whiting landings
dplyr::mutate(p_whiting = as.vector(scale(p_whiting))) %>%
dplyr::slice_tail(n = 36) -> model_dat
model_dat$time <- 1:NROW(model_dat)
View the modelling dataframe and check for any NAs
(which will be omitted by mgcv automatically)
head(model_dat)
anyNA(model_dat)
## [1] FALSE
Plot the time series for the species of interest (the ling cod)
with(model_dat, plot(time, y, 'l', bty = 'L', lwd = 2, ylab = 'Ling cod landings'))
And for the covariate (whiting landings)
with(model_dat, plot(time, p_whiting, 'l', bty = 'L', lwd = 2, ylab = 'Whiting landings'))
Split into 75% train and 25% test periods
data_train <- model_dat[1:27,]
data_test <- model_dat[28:36,]
Fit an mgcv autocorrelated observation model, with an
additional TPRS smooth function of time and a TRPS smooth function of
whiting landings
m_mgcv <- gam(y ~ s(p_whiting, k = 5) +
s(time, k = 12) +
lag1 + lag2 + lag3,
data = data_train,
method = "REML", family = nb())
summary(m_mgcv)
##
## Family: Negative Binomial(8.18)
## Link function: log
##
## Formula:
## y ~ s(p_whiting, k = 5) + s(time, k = 12) + lag1 + lag2 + lag3
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.8207 1.0265 1.774 0.0761 .
## lag1 0.7439 0.1870 3.978 6.96e-05 ***
## lag2 0.2532 0.2264 1.118 0.2634
## lag3 -0.2731 0.1912 -1.429 0.1530
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(p_whiting) 1 1 0.046 0.831
## s(time) 1 1 1.842 0.175
##
## R-sq.(adj) = 0.817 Deviance explained = 92.2%
## -REML = 194.17 Scale est. = 1 n = 27
plot(m_mgcv, pages=1, shade = TRUE)
The AR1 term is estimated as important, while the smooth
functions of time and of whiting landings are
regularized to flat lines. But how does the model forecast? Draw
posterior coefficients and generate the iterative mgcv
forecast
coef_sim <- gam.mh(m_mgcv)$bs
# Forecast 9 years ahead
gam_sims <- matrix(NA, nrow = 1000, ncol = 9)
for(i in 1:1000){
gam_sims[i,] <- recurse_ar3(m_mgcv,
coef_sim,
lagged_vals = c(data_test$lag1[1],
data_test$lag2[1],
data_test$lag3[1]),
log_lags = TRUE,
h = 9)
}
Now fit an mvgam model where the latent trend is
modelled as an AR3 process. We will use Stan
for this example, which generally explores the joint posterior much more
efficiently (and more fully) than JAGS
m_mvgam <- mvgam(y ~ s(p_whiting, k = 5),
data = data_train,
newdata = data_test,
family = 'nb',
trend_model = 'AR3',
use_stan = TRUE,
chains = 4,
burnin = 500)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 2 finished in 3.3 seconds.
## Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 4 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 1 finished in 3.6 seconds.
## Chain 3 finished in 3.7 seconds.
## Chain 4 finished in 3.6 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 3.6 seconds.
## Total execution time: 3.7 seconds.
Inference on mvgam smooth functions can be conducted
using posterior predictions. This model clearly finds a negative
relationship between whiting landings and ling cod landings
plot(m_mvgam, 'smooths')
Inspect posterior realisations of the smooth function
plot(m_mvgam, 'smooths', realisations = TRUE)
The plot_mvgam_smooth function allows more flexibility
for plotting smooth functions, including an ability to supply
newdata for plotting posterior marginal simulations.
Overlay the partial Dunn-Smyth residuals on the smooth plot for
p_whiting. These residuals suggest there is still structure
left in the data and that perhaps the smooth for this term is not as
flexible as it could be (i.e. we would be justified to try this model
again by increasing k for the smooth term). But we will
leave it as-is for now
plot_mvgam_smooth(m_mvgam, series = 1, smooth = 'p_whiting', residuals = TRUE)
We can also perform a series of posterior predictive checks (using
the ppc() function) to see if the model is able to simulate
data for the training period that looks realistic and unbiased. First,
examine simulated histograms for posterior predictions
(yhat) and compare to the observations (y)
ppc(m_mvgam, series = 1, type = 'hist', legend_position = 'bottomright')
Now plot the distribution of predicted means compared to the observed mean
ppc(m_mvgam, series = 1, type = 'mean')
Next examine simulated empirical Cumulative Distribution Functions
(CDF) for posterior predictions (yhat) and compare to the
CDF of the observations (y)
ppc(m_mvgam, series = 1, type = 'cdf')
Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased one way or another (i.e. not consistently under- or over-predicting), this histogram should look roughly uniform
ppc(m_mvgam, series = 1, type = 'pit')
All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Now for some investigation of the estimated relationships and forecasts. We can also perform residual diagnostics using randomised quantile (Dunn-Smyth) residuals. These look reasonable overall, though there is some autocorrelation at recent lags left in the residuals for this series
plot(m_mvgam, series = 1, type = 'residuals')
Ok so the model is doing well when fitting against the training data, but how are its forecasts? We can now evaluate the forecasts of the two competing models just as we did above for the simulated data
plot_fc_horizon(gam_sims = gam_sims, main = 'mgcv')
## Out of sample DRPS:
## [1] 3595.449
##
plot_fc_horizon(gam_sims = MCMCvis::MCMCchains(m_mvgam$model_output,
'ypred')[,(NROW(m_mvgam$obs_data)+1):
(NROW(m_mvgam$obs_data)+9)],
main = 'mvgam')
## Out of sample DRPS:
## [1] 2065.389
##
Both models fail to anticipate the upward swing in landings following
the end of the training data, but as with the simulations above, the
autocorrelated observation model fitted by mgcv is
providing inferior forecasts compared to the dynamic trend model
estimated by mvgam. What if we try to improve the smooth
function of time to do improve the forecasts? To try and capture the
long-term trend we can use a B spline with multiple
penalties, following the excellent example by Gavin Simpson about extrapolating
with smooth terms. This is similar to what we might do when trying
to forecast ahead from a more wiggly function, as B splines
have useful properties by allowing the penalty to be extended into the
range of values we wish to predict (in this case, the years in
data_test).
m_mgcv <- gam(y ~ s(p_whiting, k = 5) +
s(time, bs = "bs", m = c(2, 1)) +
lag1 + lag2 + lag3,
knots = list(time = c(min(data_train$time) - 1,
min(data_train$time),
max(data_train$time),
max(data_test$time))),
data = data_train,
method = "REML", family = nb())
Forecast and evaluate against the dynamic GAM
coef_sim <- gam.mh(m_mgcv)$bs
gam_sims <- matrix(NA, nrow = 1000, ncol = 9)
for(i in 1:1000){
gam_sims[i,] <- recurse_ar3(m_mgcv,
coef_sim,
lagged_vals = c(data_test$lag1[1],
data_test$lag2[1],
data_test$lag3[1]),
log_lags = TRUE,
h = 9)
}
plot_fc_horizon(gam_sims = gam_sims, main = 'mgcv')
## Out of sample DRPS:
## [1] 2530.831
##
plot_fc_horizon(gam_sims = MCMCvis::MCMCchains(m_mvgam$model_output,
'ypred')[,(NROW(m_mvgam$obs_data)+1):
(NROW(m_mvgam$obs_data)+9)],
main = 'mvgam')
## Out of sample DRPS:
## [1] 2065.389
##
This model shows an improvement in forecasting ability as the
uncertainty in the temporal smooth function is more realistically
growing into the future. But again, the dynamic GAM is superior in
forecasting ability. Perhaps the AR terms are doing more
harm than good for the model’s forecasts? For a final comparison, drop
the AR functions
m_mgcv <- gam(y ~ s(p_whiting, k = 5) +
s(time, bs = "bs", m = c(2, 1), k = 15),
knots = list(time = c(min(data_train$time) - 1,
min(data_train$time),
max(data_train$time),
max(data_test$time))),
data = data_train,
method = "REML", family = nb())
coef_sim <- gam.mh(m_mgcv)$bs
gam_sims <- matrix(NA, nrow = 1000, ncol = 9)
for(i in 1:1000){
gam_sims[i,] <- recurse_ar3(m_mgcv,
coef_sim,
lagged_vals = c(data_test$lag1[1],
data_test$lag2[1],
data_test$lag3[1]),
log_lags = TRUE,
h = 9)
}
plot_fc_horizon(gam_sims = gam_sims, main = 'mgcv')
## Out of sample DRPS:
## [1] 2807.138
##
plot_fc_horizon(gam_sims = MCMCvis::MCMCchains(m_mvgam$model_output,
'ypred')[,(NROW(m_mvgam$obs_data)+1):
(NROW(m_mvgam$obs_data)+9)],
main = 'mvgam')
## Out of sample DRPS:
## [1] 2065.389
##
The dynamic GAM is difficult to beat for this example. But what other
utilities does the mvgam package make available? We can
also re-do the posterior predictive checks, but this time focusing only
on the out of sample period. This will give us better insight into how
the model is performing and whether it is able to simulate realistic and
unbiased future values
ppc(m_mvgam, series = 1, type = 'mean', newdata = data_test)
ppc(m_mvgam, series = 1, type = 'cdf', newdata = data_test)
ppc(m_mvgam, series = 1, type = 'pit', newdata = data_test)
There are some problems with the way this model is generating future
predictions, so we would need to perform a more rigorous and principled
model development strategy to improve our model’s predictive abilities.
But there are two other utilities available in mvgam that
are worth highlighting. The first is model comparison. Benchmarking
against “null” models is a very important part of evaluating a proposed
forecast model. After all, if our complex dynamic model can’t generate
better predictions then a random walk or mean forecast, is it really
telling us anything new about the data-generating process? Here we
examine the model comparison utilities in mvgam. Here we
illustrate how this can be done in mvgam by fitting a basic
random walk model with a Negative Binomial observation process. No
smooth terms are included here, the observation model simply has an
intercept term.
m_mvgam2 <- mvgam(y ~ 1,
data = data_train,
newdata = data_test,
family = 'nb',
trend_model = 'RW',
use_stan = TRUE,
chains = 4)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 2 finished in 0.8 seconds.
## Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 4 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 4 finished in 0.9 seconds.
## Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1 finished in 1.0 seconds.
## Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 3 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3 finished in 1.5 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 1.0 seconds.
## Total execution time: 1.6 seconds.
Look at this model’s proposed forecast, which incidentally isn’t too
bad compared to the original AR3 model
plot(m_mvgam2, series = 1, type = 'forecast', newdata = data_test)
## Out of sample DRPS:
## [1] 2235.263
##
plot(m_mvgam, series = 1, type = 'forecast', newdata = data_test)
## Out of sample DRPS:
## [1] 2065.389
##
Now we will showcase how different dynamic models can be compared
using rolling probabilistic forecast evaluation, which is especially
useful if we don’t already have out of sample observations for comparing
forecasts. This function sets up a sequence of evaluation timepoints
along a rolling window within the training data to evaluate
‘out-of-sample’ forecasts. The trends are rolled forward a total of
fc_horizon timesteps according to their estimated state
space dynamics to generate an ‘out-of-sample’ forecast that is evaluated
against the true observations in the horizon window. We are
therefore simulating a situation where the model’s parameters had
already been estimated but we have only observed data up to the
evaluation timepoint and would like to generate forecasts that consider
the possible future paths for the latent trends and the true observed
values for any other covariates in the horizon window.
Evaluation involves calculating the Discrete Rank Probability Score and
a binary indicator for whether or not the true value lies within the
forecast’s 90% prediction interval. For this test we compare the two
models on the exact same sequence of 30 evaluation points
using horizon = 6
compare_mvgams(model1 = m_mvgam, model2 = m_mvgam2, fc_horizon = 6,
n_evaluations = 30, n_cores = 3)
## RPS summaries per model (lower is better)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## Model 1 8.213173 10.30613 14.66776 14.06146 17.58618 19.4088
## Model 2 8.574067 10.69791 15.15757 14.72479 18.08325 21.1897
##
## 90% interval coverages per model (closer to 0.9 is better)
## Model 1 0.8666667
## Model 2 0.7888889
The series of plots generated by compare_mvgams clearly
show that the first dynamic model generates better predictions. In each
plot, DRPS for the forecast horizon is lower for the first
model than for the second model. This kind of evaluation is often more
appropriate for forecast models than complexity-penalising fit metrics
such as AIC
or BIC. Now we proceed by exploring how forecast
distributions from an mvgam object can be automatically
updated in light of new incoming observations. This works by generating
a set of “particles” that each captures a unique proposal about the
current state of the system (in this case, the current estimate of the
latent trend component). The next observation in data_assim
is assimilated and particles are weighted by how well their proposal
(i.e. their proposed forecast, prior to seeing the new data) matched the
new observations. For univariate models such as the ones we’ve fitted so
far, this weight is represented by the proposal’s Negative Binomial
log-likelihood. For multivariate models, a multivariate composite
likelihood is used for weights. Once weights are calculated, we use
importance sampling to update the model’s forecast distribution for the
remaining forecast horizon. Begin by initiating a set of
20000 particles by assimilating the next observation in
data_test and storing the particles in the default location
(in a directory called particles within the working
directory)
pfilter_mvgam_init(object = m_mvgam, n_particles = 20000,
n_cores = 3, data_assim = model_dat[28,])
## Saving particles to pfilter/particles.rda
## ESS = 1933.914
Now we are ready to run the particle filter. This function will
assimilate the next two out of sample observations in
data_test and update the forecast after each assimilation
step. This works in an iterative fashion by calculating each particle’s
weight, then using a kernel smoothing algorithm to “pull” low weight
particles toward the high-likelihood space before assimilating the next
observation. The strength of the kernel smoother is controlled by
kernel_lambda, which in our experience works well when left
to the default of 1. If the Effective Sample Size of
particles drops too low, suggesting we are putting most of our belief in
a very small set of particles, an automatic resampling step is triggered
to increase particle diversity and reduce the chance that our forecast
intervals become too narrow and incapable of adapting to changing
conditions
pfilter_mvgam_online(data_assim = model_dat[29:30, ], n_cores = 3,
kernel_lambda = 1, use_resampling = TRUE)
## Assimilating the next 2 observations
##
## Effective sample size is 2107.4 ...
##
## Resampling particles ...
## Smoothing particles ...
##
## Effective sample size is 7248.246 ...
##
## Resampling particles ...
## Smoothing particles ...
##
## Last assimilation time was 30
##
## Saving particles to pfilter/particles.rda
## ESS = 20000
Once assimilation is complete, generate the updated forecast from the
particles using the covariate information in remaining
data_test observations. This function is designed to
hopefully make it simpler to assimilate observations, as all that needs
to be provided once the particles are initiated as a dataframe of test
data in exactly the same format as the data that were used to train the
initial model. If no new observations are found (observations are
arranged by time so the consistent indexing of these two
variables is very important!) then the function returns a
NULL and the particles remain where they are in state
space.
pfilter_fc <- pfilter_mvgam_fc(file_path = "pfilter",
n_cores = 3, newdata = model_dat[31:36,],
return_forecasts = TRUE,
ylim = c(0, 5100))
Compare the updated forecast to the original forecast to see how it
has changed in light of the most recent observations. As with the
plot_mvgam_smooth() function, the
plot_mvgam_fc() function has more flexibility for plotting
posterior predictive distributions than the generic
plot.mvgam() S3 function
layout(matrix(1:2,nrow=1,ncol=2))
plot_mvgam_fc(m_mvgam,
ylim = c(0, 5100))
points(model_dat$y, pch = 16, cex = 0.4)
pfilter_fc$fc_plots$series1()
points(model_dat$y, pch = 16, cex = 0.4)
Here it is apparent that the distribution has shifted upward in light
of the 3 observations that have been assimilated. This is
an advantageous way of allowing a model to slowly adapt to new
conditions while breaking free of restrictive assumptions about residual
distributions. See some of the many
particle filtering lectures by Nathaniel Osgood for more details.
Remove the particles from their stored directory when finished
unlink('pfilter', recursive = T)
For our next univariate example, we will again pursue how challenging
it can be to forecast ahead with conventional GAMs and how
mvgam overcomes these challenges. We begin by replicating
the lynx analysis from 2018 Ecological Society
of America workshop on GAMs that was hosted by Eric Pedersen, David
L. Miller, Gavin Simpson, and Noam Ross, with some minor adjustments.
First, load the data and plot the series as well as its estimated
autocorrelation function
data(lynx)
lynx_full = data.frame(year = 1821:1934,
population = as.numeric(lynx),
time = 1:NROW(lynx))
plot(lynx_full$population, type = 'l', ylab = 'Lynx trappings',
xlab = 'Time')
acf(lynx_full$population, main = '')
There is a clear ~19-year cyclic pattern to the data, so I create a
season term that can be used to model this effect and give
a better representation of the data generating process. I also create a
new year term that represents which long-term cycle each
observation is in
plot(stl(ts(lynx_full$population, frequency = 19), s.window = 'periodic'))
lynx_full$season <- (lynx_full$year %%19) + 1
cycle_ends <- c(which(lynx_full$season == 19), NROW(lynx_full))
cycle_starts <- c(1, cycle_ends[1:length(which(lynx_full$season == 19))] + 1)
cycle <- vector(length = NROW(lynx_full))
for(i in 1:length(cycle_starts)){
cycle[cycle_starts[i]:cycle_ends[i]] <- i
}
lynx_full$year <- cycle
Add lag indicators needed to fit the nonlinear lag models that gave
the best one step ahead point forecasts in the ESA workshop example. As
in the example, we specify the default argument in the
lag function as the mean log population.
mean_pop_l = mean(log(lynx_full$population))
lynx_full = dplyr::mutate(lynx_full,
popl = log(population),
lag1 = dplyr::lag(popl,1, default = mean_pop_l),
lag2 = dplyr::lag(popl,2, default = mean_pop_l),
lag3 = dplyr::lag(popl,3, default = mean_pop_l),
lag4 = dplyr::lag(popl,4, default = mean_pop_l),
lag5 = dplyr::lag(popl,5, default = mean_pop_l),
lag6 = dplyr::lag(popl,6, default = mean_pop_l))
For mvgam models, we need an indicator of the series
name as a factor variable
lynx_full$series <- factor('series1')
Split the data into training (first 40 years) and testing (next 10 years of data) to evaluate multi-step ahead forecasts
lynx_train = lynx_full[1:40, ]
lynx_test = lynx_full[41:50, ]
The best-forecasting model in the course was with nonlinear smooths
of lags 1 and 2; we use those here is that we also include a cyclic
smooth for the 19-year cycles as this seems like an important feature,
as well as a yearly smooth for the long-term trend. Following the
information about spline extrapolation above, we again fit a cubic
B spline for the trend with a mix of penalties to try and
reign in wacky extrapolation behaviours, and we extend the penalty to
cover the years that we wish to predict. This will hopefully give us
better uncertainty estimates for the forecast. In this example we assume
the observations are Poisson distributed
lynx_mgcv = gam(population ~
s(season, bs = 'cc', k = 19) +
s(year, bs = 'bs', m = c(3, 2, 1, 0)) +
s(lag1, k = 5) +
s(lag2, k = 5),
knots = list(season = c(0.5, 19.5),
year = c(min(lynx_train$year - 1),
min(lynx_train$year),
max(lynx_test$year),
max(lynx_test$year) + 1)),
data = lynx_train, family = "poisson",
method = "REML")
Inspect the model’s summary and estimated smooth functions for the season, year and lag terms
summary(lynx_mgcv)
##
## Family: poisson
## Link function: log
##
## Formula:
## population ~ s(season, bs = "cc", k = 19) + s(year, bs = "bs",
## m = c(3, 2, 1, 0)) + s(lag1, k = 5) + s(lag2, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.666631 0.007511 887.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(season) 16.229 17.000 6770.2 <2e-16 ***
## s(year) 1.990 2.000 244.2 <2e-16 ***
## s(lag1) 3.984 3.999 712.0 <2e-16 ***
## s(lag2) 3.892 3.993 488.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.967 Deviance explained = 97.7%
## -REML = 880.28 Scale est. = 1 n = 40
plot(lynx_mgcv, select = 1)
plot(lynx_mgcv, select = 2)
plot(lynx_mgcv, select = 3)
plot(lynx_mgcv, select = 4)
This model captures most of the deviance in the series and the functions are all confidently estimated to be non-zero and non-flat. So far, so good. Now for some forecasts for the out of sample period. First we must take posterior draws of smooth beta coefficients to incorporate the uncertainties around smooth functions when simulating forecast paths
coef_sim <- gam.mh(lynx_mgcv)$bs
Now we define a function to perform forecast simulations from the
nonlinear lag model in a recursive fashion. Using starting values for
the last two lags, the function will iteratively project the path ahead
with a random sample from the model’s coefficient posterior
recurse_nonlin
recurse_nonlin = function(model, coef_sim, lagged_vals, h){
# Initiate state vector
states <- rep(NA, length = h + 2)
# Last two values of the conditional expectations begin the state vector
states[1] <- as.numeric(exp(lagged_vals[2]))
states[2] <- as.numeric(exp(lagged_vals[1]))
# Get a random sample of the smooth coefficient uncertainty matrix
# to use for the entire forecast horizon of this particular path
gam_coef_index <- sample(seq(1, NROW(coef_sim)), 1, T)
# For each following timestep, recursively predict based on the
# predictions at each previous lag
for (t in 3:(h + 2)) {
# Build the GAM linear predictor matrix using the two previous lags
# of the (log) density
newdata <- data.frame(lag1 = log(states[t-1] + 0.01),
lag2 = log(states[t-2] + 0.01),
season = lynx_test$season[t-2],
year = lynx_test$year[t-2])
colnames(newdata) <- c('lag1', 'lag2', 'season', 'year')
Xp <- predict(model, newdata = newdata, type = 'lpmatrix')
# Calculate the posterior prediction for this timepoint
mu <- rpois(1, lambda = exp(Xp %*% coef_sim[gam_coef_index,]))
# Fill in the state vector and iterate to the next timepoint
states[t] <- mu
}
# Return the forecast path
states[-c(1:2)]
}
Create the GAM’s forecast distribution by generating
1000 simulated forecast paths. Each path is fed the true
observed values for the last two lags of the first out of sample
timepoint, but they can deviate when simulating ahead depending on their
particular draw of possible coefficients. Note, this is a bit slow and
could easily be parallelised to speed up computations
gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
for(i in 1:1000){
gam_sims[i,] <- recurse_nonlin(lynx_mgcv,
coef_sim,
lagged_vals = c(lynx_test$lag1[1],
lynx_test$lag2[1]),
h = 10)
}
Plot the mgcv model’s out of sample forecast for the next 10 years ahead
cred_ints <- apply(gam_sims, 2, function(x) quantile(x, probs = c(0.025, 0.5, 0.975)))
yupper <- max(cred_ints) * 1.25
plot(cred_ints[3,] ~ seq(1:NCOL(cred_ints)), type = 'l',
col = rgb(1,0,0, alpha = 0),
ylim = c(0, yupper),
ylab = 'Predicted lynx trappings',
xlab = 'Forecast horizon',
main = 'mgcv')
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1,],rev(cred_ints[3,])),
col = rgb(150, 0, 0, max = 255, alpha = 100), border = NA)
cred_ints <- apply(gam_sims, 2, function(x) quantile(x, probs = c(0.15, 0.5, 0.85)))
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1,],rev(cred_ints[3,])),
col = rgb(150, 0, 0, max = 255, alpha = 180), border = NA)
lines(cred_ints[2,], col = rgb(150, 0, 0, max = 255), lwd = 2, lty = 'dashed')
points(lynx_test$population[1:10], pch = 16)
lines(lynx_test$population[1:10])
drps_score
# Discrete Rank Probability Score and coverage of 90% interval
drps_score <- function(truth, fc, interval_width = 0.9){
nsum <- 1000
Fy = ecdf(fc)
ysum <- 0:nsum
indicator <- ifelse(ysum - truth >= 0, 1, 0)
score <- sum((indicator - Fy(ysum))^2)
# Is value within 90% HPD?
interval <- quantile(fc, probs = c(((1-0.9)/2), 0.5, 1-((1-interval_width)/2)))
in_interval <- ifelse(truth <= interval[3] & truth >= interval[1], 1, 0)
return(c(score, in_interval))
}
# Wrapper to operate on all observations in fc_horizon
drps_mcmc_object <- function(truth, fc, interval_width = 0.9){
indices_keep <- which(!is.na(truth))
if(length(indices_keep) == 0){
scores = data.frame('drps' = rep(NA, length(truth)),
'interval' = rep(NA, length(truth)))
} else {
scores <- matrix(NA, nrow = length(truth), ncol = 2)
for(i in indices_keep){
scores[i,] <- drps_score(truth = as.vector(truth)[i],
fc = fc[,i], interval_width)
}
}
scores
}
Calculate DRPS over the 10-year horizon for the mgcv model
lynx_mgcv1_drps <- drps_mcmc_object(truth = lynx_test$population[1:10],
fc = gam_sims)
What if we remove the yearly trend and let the lag smooths capture more of the temporal dependencies? Will that improve the forecast distribution? Run a second model and plot the forecast (note that this plot will be on quite a different y-axis scale compared to the first plot above)
lynx_mgcv2 = gam(population ~
s(season, bs = 'cc', k = 19) +
s(lag1, k = 5) +
s(lag2, k = 5),
data = lynx_train, family = "poisson",
method = "REML")
coef_sim <- gam.mh(lynx_mgcv2)$bs
gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
for(i in 1:1000){
gam_sims[i,] <- recurse_nonlin(lynx_mgcv2,
coef_sim,
lagged_vals = c(lynx_test$lag1[1],
lynx_test$lag2[1]),
h = 10)
}
cred_ints <- apply(gam_sims, 2, function(x) quantile(x, probs = c(0.025, 0.5, 0.975)))
yupper <- max(cred_ints) * 1.25
plot(cred_ints[3,] ~ seq(1:NCOL(cred_ints)), type = 'l',
col = rgb(1,0,0, alpha = 0),
ylim = c(0, yupper),
ylab = 'Predicted lynx trappings',
xlab = 'Forecast horizon',
main = 'mgcv')
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1,],rev(cred_ints[3,])),
col = rgb(150, 0, 0, max = 255, alpha = 100), border = NA)
cred_ints <- apply(gam_sims, 2, function(x) quantile(x, probs = c(0.15, 0.5, 0.85)))
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1,],rev(cred_ints[3,])),
col = rgb(150, 0, 0, max = 255, alpha = 180), border = NA)
lines(cred_ints[2,], col = rgb(150, 0, 0, max = 255), lwd = 2, lty = 'dashed')
points(lynx_test$population[1:10], pch = 16)
lines(lynx_test$population[1:10])
Calculate DRPS over the 10-year horizon for the second mgcv model
lynx_mgcv2_drps <- drps_mcmc_object(truth = lynx_test$population[1:10],
fc = gam_sims)
This forecast is highly overconfident, with very unrealistic
uncertainty intervals due to the interpolation behaviours of the lag
smooths. You can certainly keep trying different formulations (our
experience is that the B spline variant above produces the best
forecasts from any tested mgcv model, but we did not test
an exhaustive set), but hopefully it is clear that forecasting using
splines is tricky business and it is likely that each time you do it
you’ll end up honing in on different combinations of penalties, knot
selections etc…. Now we will fit an mvgam model for
comparison. This model fits a similar model to the mgcv
model directly above but with a full time series model for the errors
(in this case an AR1 process), rather than smoothing
splines that do not incorporate a concept of the future. We do not use a
year term to reduce any possible extrapolation and because
the latent dynamic component should capture this temporal variation.
lynx_mvgam <- mvgam(data = lynx_train,
newdata = lynx_test,
formula = population ~ s(season, bs = 'cc', k = 15),
knots = list(season = c(0.5, 19.5)),
family = 'poisson',
trend_model = 'AR2',
use_stan = TRUE,
chains = 4)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 1000 [ 0%] (Warmup)
## Chain 4 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 2 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 3 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
## Chain 4 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 3 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 2 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
## Chain 4 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 2 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 3 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
## Chain 4 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 2 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 3 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 4 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
## Chain 4 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 3 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 3 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 2 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
## Chain 2 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
## Chain 4 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 3 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 2 Iteration: 600 / 1000 [ 60%] (Sampling)
## Chain 4 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 3 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 4 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 2 Iteration: 700 / 1000 [ 70%] (Sampling)
## Chain 3 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 4 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 3 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 2 Iteration: 800 / 1000 [ 80%] (Sampling)
## Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 4 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 4 finished in 17.9 seconds.
## Chain 3 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 3 finished in 18.6 seconds.
## Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 1 finished in 19.1 seconds.
## Chain 2 Iteration: 900 / 1000 [ 90%] (Sampling)
## Chain 2 Iteration: 1000 / 1000 [100%] (Sampling)
## Chain 2 finished in 21.7 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 19.3 seconds.
## Total execution time: 21.8 seconds.
Calculate the out of sample forecast from the fitted
mvgam model and plot
fits <- MCMCvis::MCMCchains(lynx_mvgam$model_output, 'ypred')
fits <- fits[,(NROW(lynx_mvgam$obs_data)+1):(NROW(lynx_mvgam$obs_data)+10)]
cred_ints <- apply(fits, 2, function(x) quantile(x, probs = c(0.025, 0.5, 0.975)))
yupper <- max(cred_ints) * 1.25
plot(cred_ints[3,] ~ seq(1:NCOL(cred_ints)), type = 'l',
col = rgb(1,0,0, alpha = 0),
ylim = c(0, yupper),
ylab = '',
xlab = 'Forecast horizon',
main = 'mvgam')
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1,],rev(cred_ints[3,])),
col = rgb(150, 0, 0, max = 255, alpha = 100), border = NA)
cred_ints <- apply(fits, 2, function(x) quantile(x, probs = c(0.15, 0.5, 0.85)))
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1,],rev(cred_ints[3,])),
col = rgb(150, 0, 0, max = 255, alpha = 180), border = NA)
lines(cred_ints[2,], col = rgb(150, 0, 0, max = 255), lwd = 2, lty = 'dashed')
points(lynx_test$population[1:10], pch = 16)
lines(lynx_test$population[1:10])
Calculate DRPS over the 10-year horizon for the mvgam model
lynx_mvgam_drps <- drps_mcmc_object(truth = lynx_test$population[1:10],
fc = fits)
How do the out of sample DRPS scores stack up for these three models?
Remember, our goal is to minimise DRPS while providing 90% intervals
that are near, but not less than, 0.9. The DRPS and 90%
interval coverage for the first mgcv model (with the
B spline year term)
sum(lynx_mgcv1_drps[,1])
## [1] 1582.792
mean(lynx_mgcv1_drps[,2])
## [1] 0.7
For the second mgcv model
sum(lynx_mgcv2_drps[,1])
## [1] 2041.37
mean(lynx_mgcv2_drps[,2])
## [1] 0
And for the mvgam model
sum(lynx_mvgam_drps[,1])
## [1] 1079.727
mean(lynx_mvgam_drps[,2])
## [1] 1
The mvgam has much more realistic uncertainty than the
mgcv versions above. Of course this is just one out of
sample comparison, and to really determine which model is most
appropriate for forecasting we would want to run many of these tests
using a rolling window
approach. Have a look at this model’s summary to see what is being
estimated (note that longer MCMC runs would probably be needed to
increase effective sample sizes)
summary(lynx_mvgam)
## GAM formula:
## population ~ s(season, bs = "cc", k = 15)
##
## Family:
## poisson
##
## Link function:
## log
##
## Trend model:
## AR2
##
## N series:
## 1
##
## N timepoints:
## 40
##
## Status:
## Fitted using Stan
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 6.7485765 6.7736000 6.7998830 1 2935
## s(season).1 -1.1497198 -0.3395660 0.5030385 1 788
## s(season).2 -0.2356094 0.7192335 1.5167980 1 993
## s(season).3 0.2079440 1.4030350 2.3590005 1 743
## s(season).4 0.4463240 1.6770950 2.7285402 1 665
## s(season).5 -0.1793549 0.8541495 1.9450290 1 914
## s(season).6 -1.3312665 -0.3484950 0.9341040 1 735
## s(season).7 -1.5843665 -0.5629790 0.8390222 1 508
## s(season).8 -1.1092257 -0.1295785 1.0189660 1 661
## s(season).9 -0.3095619 0.8128725 1.8037217 1 845
## s(season).10 -0.3619385 1.0929200 2.1285362 1 549
## s(season).11 -0.6672120 0.6392295 1.5637725 1 484
## s(season).12 -1.1670315 -0.4387825 0.2376330 1 656
## s(season).13 -1.5258752 -0.9260185 -0.2078251 1 880
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 2.464491 3.482175 4.22814 1 987
##
## Latent trend AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## ar1[1] 0.6060752 1.0638400 1.4583877 1.01 561
## ar2[1] -0.7969748 -0.3303535 0.1133375 1.01 527
## sigma[1] 0.3606698 0.4730545 0.6231203 1.00 861
##
## Stan MCMC diagnostics
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 0 of 2000 iterations ended with a divergence (0%)
## 0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior
##
Now inspect each model’s estimated smooth for the 19-year cyclic
pattern. Note that the mvgam smooth plot is on a different
scale compared to the mgcv plot, but interpretation is
similar. The mgcv smooth is much wigglier, likely because
it is compensating for any remaining autocorrelation not captured by the
lag smooths. We could probably remedy this by reducing k in
the seasonal smooth for the mgcv model (in practice this
works well, but leaving k larger for the
mvgam’s seasonal smooth is recommended as our experience is
that this tends to lead to better performance and convergence)
plot(lynx_mgcv, select=1, shade=T)
plot_mvgam_smooth(lynx_mvgam, 1, 'season')
We can also view the mvgam’s posterior predictions for the entire series (testing and training)
plot(lynx_mvgam, type = 'forecast', newdata = lynx_test)
## Out of sample DRPS:
## [1] 1079.727
##
And the estimated trend
plot(lynx_mvgam, type = 'trend', newdata = lynx_test)
A key aspect of ecological forecasting is to understand how
different components of a model contribute to forecast uncertainty.
We can estimate contributions to forecast uncertainty for the GAM smooth
functions and the latent trend using mvgam
plot(lynx_mvgam, type = 'uncertainty', newdata = lynx_test)
Both components contribute to forecast uncertainty, with the trend
component contributing more over time (as it should since this is the
stochastic forecast component). This suggests we would still need some
more work to learn about factors driving the dynamics of the system. But
we will leave the model as-is for this example. Diagnostics of the model
can also be performed using mvgam. Have a look at the
model’s Dunn-Smyth randomised quantile residuals. Again we have more
options for plotting when using the plot_mvgam_resids()
function compared to the S3 generic
plot.mvgam(). Here we use this flexibility to modify the
number of bins for the residuals vs fitted plot (top-left) so that the
patterns are more clear. We are primarily looking for a lack of
autocorrelation, which would suggest our AR1 model is appropriate for
the latent trend
plot_mvgam_resids(lynx_mvgam, n_bins = 25)
Clark, N.J. and Wells, K. (2022). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. DOI: https://doi.org/10.1111/2041-210X.13974