In this example we will examine multivariate forecasting models using
mvgam, which fits dynamic GAMs (DGAMs; Clark & Wells,
2022) using MCMC sampling (note that either JAGS or
Stan is required; installation links are found here and here). First a
simulation experiment to determine whether mvgam‘s
inclusion of complexity penalisation works by reducing the number of
un-needed dynamic factors. In any factor model, choosing the appropriate
number of factors M can be difficult. The approach used by
mvgam when sampling with JAGS is to estimate a
penalty for each factor that squeezes the factor’s variance toward zero,
effectively forcing the factor to evolve as a flat white noise process.
By allowing each factor’s penalty to be estimated in an exponentially
increasing manner (following Welty et al 2009), we hope that we can
guard against specifying too large a M. Note that when
sampling with Stan, we capitalise on the superior sampling
and exploration of Hamiltonian Monte Carlo to choose the number of
factors. Begin by simulating 6 series that evolve with a
shared seasonal pattern and that depend on 2 latent random
walk factors. Each series is 100 time steps long, with a
seasonal frequency of 12. We give the trend moderate
importance by setting trend_rel = 0.6 and we allow each
series’ observation process to be drawn from slightly different Poisson
distributions
set.seed(1111)
library(mvgam)
## Welcome to mvgam. Please cite as: Clark, NJ, and Wells, K. 2022. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 2022, https://doi.org/10.1111/2041-210X.13974
dat <- sim_mvgam(T = 100, n_series = 6, n_lv = 2,
family = 'poisson',
mu = runif(6, -1, 2),
trend_rel = 0.6, train_prop = 0.85)
Have a look at the series
par(mfrow = c(3,2))
for(i in 1:6){
plot(dat$data_train$y[which(as.numeric(dat$data_train$series) == i)], type = 'l',
ylab = paste('Series', i), xlab = 'Time', bty = 'L')
box(bty = 'L', lwd = 2)
}
par(mfrow = c(1,1))
Clearly there are some correlations in the trends for these series.
But how does a dynamic factor process allow us to potentially capture
these dependencies? The below example demonstrates how. Essentially, a
dynamic factor is an unobserved (latent) random process that
induces correlations between time series via a set of factor loadings
(\(\beta\)) while exercising dimension
reduction. The loadings represent constant associations between the
observed time series and the dynamic factor, but each series can still
deviate from the factor through its error process and its associations
with other factors (if we estimate >1 latent factor in
our model).
mvgamA challenge with any factor model is the need to determine the number
of factors M. Setting M too small prevents
temporal dependencies from being adequately modelled, leading to poor
convergence and difficulty estimating smooth parameters. By contrast,
setting M too large leads to unnecessary computation.
mvgam approaches this problem by either formulating a prior
distribution that enforces exponentially increasing penalties on the
factor variances to allow any un-needed factors to evolve as flat lines
(if using JAGS) or by regularising the factor loadings (if
using Stan). Now let’s fit a well-specified model for our
simulated series in which we estimate random intercepts, a shared
seasonal cyclic smooth and 2 latent dynamic factors. Note
that when using Stan, which is heavily recommended for
complex models such as this, the dynamic factor variances are fixed to
allow the loading coefficients to control the magnitudes of the induced
trends. This is necessary to ensure identifiability
mod1 <- mvgam(data = dat$data_train,
newdata = dat$data_test,
formula = y ~ s(series, bs = 're') +
s(season, bs = c('cc'), k = 7) - 1,
knots = list(season = c(0.5, 12.5)),
use_lv = TRUE,
n_lv = 2,
family = 'poisson',
trend_model = 'RW',
use_stan = TRUE,
chains = 4,
burnin = 300,
samples = 400)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 2 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 3 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 1 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 4 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 2 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 1 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 3 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 4 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 2 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 2 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 3 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 3 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 1 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 1 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 4 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 4 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 2 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 1 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 3 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 4 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 2 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 3 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 1 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 4 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 2 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 3 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 1 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 4 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 3 Iteration: 700 / 700 [100%] (Sampling)
## Chain 3 finished in 44.4 seconds.
## Chain 2 Iteration: 700 / 700 [100%] (Sampling)
## Chain 2 finished in 44.8 seconds.
## Chain 1 Iteration: 700 / 700 [100%] (Sampling)
## Chain 1 finished in 47.2 seconds.
## Chain 4 Iteration: 700 / 700 [100%] (Sampling)
## Chain 4 finished in 50.6 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 46.7 seconds.
## Total execution time: 50.8 seconds.
Look at a few plots. The estimated smooth function
plot_mvgam_smooth(object = mod1, series = 1, smooth = 'season')
And the true seasonal function in the simulation
plot(dat$global_seasonality[1:12], type = 'l', bty = 'L', ylab = 'True function', xlab = 'season')
Check whether each factor was retained using the
plot_mvgam_factors function. Here, each factor is tested
against a null hypothesis of white noise by calculating the sum of the
factor’s 1st derivatives. A factor that has a larger contribution to the
series’ latent trends will have a larger sum, both because that factor’s
absolute magnitudes will be larger (due to the weaker penalty on the
factor’s precision) and because the factor will move around more. By
normalising these estimated first derivative sums, it should be apparent
whether some factors have been dropped from the model. Here we see that
each factor is contributing to the series’ latent trends, and the plots
show that neither has been forced to evolve as white noise
plot_mvgam_factors(mod1)
Now we fit the same model but assume that we no nothing about how
many factors to use, so we specify the maximum allowed (the total number
of series; 6). Note that this model is computationally more
expensive so it will take longer to fit
mod2 <- mvgam(data = dat$data_train,
newdata = dat$data_test,
formula = y ~ s(series, bs = 're') +
s(season, bs = c('cc'), k = 7) - 1,
knots = list(season = c(0.5, 12.5)),
use_lv = TRUE,
n_lv = 6,
family = 'poisson',
use_stan = TRUE,
trend_model = 'RW',
chains = 4,
burnin = 300,
samples = 400)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 1 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 2 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 3 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 4 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 1 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 2 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 4 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 3 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 4 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 4 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 2 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 2 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 3 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 3 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 1 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 1 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 2 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 4 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 3 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 1 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 2 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 4 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 3 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 1 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 2 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 4 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 3 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 2 Iteration: 700 / 700 [100%] (Sampling)
## Chain 2 finished in 159.8 seconds.
## Chain 1 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 4 Iteration: 700 / 700 [100%] (Sampling)
## Chain 4 finished in 161.7 seconds.
## Chain 3 Iteration: 700 / 700 [100%] (Sampling)
## Chain 3 finished in 173.3 seconds.
## Chain 1 Iteration: 700 / 700 [100%] (Sampling)
## Chain 1 finished in 185.1 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 170.0 seconds.
## Total execution time: 185.2 seconds.
Use the same plots as model 1 to see if this model has also fit the data well
plot_mvgam_smooth(object = mod2, series = 1, smooth = 'season')
plot(dat$global_seasonality[1:12], type = 'l', bty = 'L', ylab = 'True function', xlab = 'season')
Examining the factor contributions gives us some insight into whether
we set n_lv larger than we perhaps needed to (with some of
the factors clearly evolving as unconstrained, zero-centred random
walks). These contributions can be interpreted similarly to ordination
axes when deciding how many latent variables to specify
plot_mvgam_factors(mod2)
The very weak contributions by some of the factors are a result of
the loading regularisation, which will become more important as the
dimensionality of the data grows. Now onto an empirical example. Here we
will access monthly search volume data from Google Trends,
focusing on relative importances of search terms related to tick
paralysis in Queensland, Australia
library(tidyr)
if(!require(gtrendsR)){
install.packages('gtrendsR')
}
terms = c("tick bite",
"tick paralysis",
"dog tick",
"paralysis tick dog")
trends <- gtrendsR::gtrends(terms, geo = "AU-QLD",
time = "all", onlyInterest = T)
Google Trends modified their algorithm for extracting
search volume data in 2012, so we filter the series to only include
observations after this point in time
trends$interest_over_time %>%
tidyr::spread(keyword, hits) %>%
dplyr::select(-geo, -time, -gprop, -category) %>%
dplyr::mutate(date = lubridate::ymd(date)) %>%
dplyr::mutate(year = lubridate::year(date)) %>%
dplyr::filter(year > 2012) %>%
dplyr::select(-year) -> gtest
Convert to an xts object and then to the required
mvgam format, holding out the final 10% of observations as
the test data
series <- xts::xts(x = gtest[,-1], order.by = gtest$date)
trends_data <- series_to_mvgam(series, freq = 12, train_prop = 0.9)
Plot the series to see how similar their seasonal shapes are over time
plot(series, legend.loc = 'topleft')
Now we will fit an mvgam model with shared seasonality
and random intercepts per series. Our first attempt will ignore any
temporal component in the residuals so that we can identidy which GAM
predictor combination gives us the best fit, prior to investigating how
to deal with any remaining autocorrelation. We assume a Poisson
observation model for the response. Also note that any smooths using the
random effects basis (s(series, bs = "re") below) are
automatically re-parameterised to use the non-centred
parameterisation that is necessary to help avoid common posterior
degeneracies in hierarchical models. This parameterisation tends to
work better for most ecological problems where the data for each group /
context are not highly informative, but it is still probably worth
investigating whether a centred or even a mix of centred / non-centred
will give better computational performance. We suppress the global
intercept as it is not needed and will lead to identifiability issues
when estimating the series-specific random intercepts
trends_mod1 <- mvgam(data = trends_data$data_train,
newdata = trends_data$data_test,
formula = y ~ s(series, bs = 're') +
s(season, k = 7, m = 2, bs = 'cc') - 1,
knots = list(season = c(0.5, 12.5)),
trend_model = 'None',
family = 'poisson',
use_stan = TRUE,
chains = 4,
burnin = 300,
samples = 400)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 1 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 1 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 2 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 2 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 3 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 3 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 4 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 4 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 4 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 4 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 1 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 2 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 3 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 1 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 2 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 3 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 1 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 2 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 4 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 3 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 1 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 4 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 2 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 3 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 1 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 4 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 2 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 3 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 1 Iteration: 700 / 700 [100%] (Sampling)
## Chain 4 Iteration: 700 / 700 [100%] (Sampling)
## Chain 1 finished in 1.6 seconds.
## Chain 4 finished in 1.5 seconds.
## Chain 2 Iteration: 700 / 700 [100%] (Sampling)
## Chain 3 Iteration: 700 / 700 [100%] (Sampling)
## Chain 2 finished in 1.8 seconds.
## Chain 3 finished in 1.8 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 1.7 seconds.
## Total execution time: 1.9 seconds.
Given that these series could potentially be following a hierarchical seasonality, we will also trial a slghtly more complex model with an extra smoothing term per series that allows its seasonal curve to deviate from the global seasonal smooth. Ignore the warning about repeated smooths, as this is not an issue for estimation.
trends_mod2 <- mvgam(data = trends_data$data_train,
newdata = trends_data$data_test,
formula = y ~ s(season, k = 7, m = 2, bs = 'cc') +
s(season, series, k = 4, bs = 'fs', m = 1),
knots = list(season = c(0.5, 12.5)),
trend_model = 'None',
family = 'poisson',
use_stan = TRUE,
chains = 4,
burnin = 300,
samples = 400)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 1 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 2 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 1 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 2 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 2 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 3 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 3 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 4 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 4 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 2 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 2 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 3 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 3 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 1 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 4 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 1 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 2 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 3 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 4 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 4 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 1 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 2 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 3 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 1 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 2 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 3 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 4 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 2 Iteration: 700 / 700 [100%] (Sampling)
## Chain 2 finished in 1.4 seconds.
## Chain 1 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 3 Iteration: 700 / 700 [100%] (Sampling)
## Chain 3 finished in 1.4 seconds.
## Chain 4 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 1 Iteration: 700 / 700 [100%] (Sampling)
## Chain 4 Iteration: 700 / 700 [100%] (Sampling)
## Chain 1 finished in 1.8 seconds.
## Chain 4 finished in 1.8 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 1.6 seconds.
## Total execution time: 1.9 seconds.
How can we compare these models to ensure we choose one that performs
well and provides useful inferences? Beyond posterior retrodictive and
predictive checks, we can take advantage of the fact that
mvgam fits an mgcv model to provide all the
necessary penalty matrices, as well as to identify good initial values
for smoothing parameters. Because we did not modify this model by adding
a trend component (the only modification is that we estimated
series-specific overdispersion parameters), we can still employ the
usual mgcv model comparison routines
anova(trends_mod1$mgcv_model,
trends_mod2$mgcv_model, test = 'LRT')
AIC(trends_mod1$mgcv_model,
trends_mod2$mgcv_model)
summary(trends_mod1$mgcv_model)
##
## Family: poisson
## Link function: log
##
## Formula:
## y ~ s(series, bs = "re") + s(season, k = 7, m = 2, bs = "cc") -
## 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(series) 3.992 4 3458 <2e-16 ***
## s(season) 4.392 5 34526 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.704 Deviance explained = 64.2%
## -REML = 863.3 Scale est. = 1 n = 452
summary(trends_mod2$mgcv_model)
##
## Family: poisson
## Link function: log
##
## Formula:
## y ~ s(season, k = 7, m = 2, bs = "cc") + s(season, series, k = 4,
## bs = "fs", m = 1)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1203 0.4668 2.4 0.0164 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(season) 4.186 5 96.84 <2e-16 ***
## s(season,series) 9.803 16 369.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.744 Deviance explained = 67.6%
## -REML = 850.14 Scale est. = 1 n = 452
Model 2 seems to fit better so far, suggesting that hierarchical
seasonality gives better performance for these series. But a problem
with both of the above models is that their forecast uncertainties will
not increase into the future, which is not how time series forecasts
should behave. Here we fit Model 2 again but specifying a time series
model for the latent trends. We assume the dynamic trends can be
represented using latent factors that each follow a squared exponential
Gaussian Process (GP), and we will set n_lv = 2. Note that
mvgam uses a Hilbert space approximate
Gaussian Process for computational efficiency, with the number of
basis functions \(m\) set to
min(25, n_timepoints)
trends_mod3 <- mvgam(data = trends_data$data_train,
newdata = trends_data$data_test,
formula = y ~ s(season, k = 7, m = 2, bs = 'cc') +
s(season, series, k = 4, bs = 'fs', m = 1),
knots = list(season = c(0.5, 12.5)),
trend_model = 'GP',
use_lv = TRUE,
n_lv = 2,
family = 'poisson',
use_stan = TRUE,
chains = 4,
burnin = 300,
samples = 400)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 3 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 1 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 2 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 4 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 3 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 1 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 2 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 4 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 1 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 1 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 2 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 2 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 3 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 3 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 4 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 4 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 1 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 3 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 2 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 4 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 1 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 3 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 2 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 4 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 1 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 3 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 2 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 4 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 1 Iteration: 700 / 700 [100%] (Sampling)
## Chain 1 finished in 18.0 seconds.
## Chain 3 Iteration: 700 / 700 [100%] (Sampling)
## Chain 3 finished in 18.9 seconds.
## Chain 2 Iteration: 700 / 700 [100%] (Sampling)
## Chain 2 finished in 19.9 seconds.
## Chain 4 Iteration: 700 / 700 [100%] (Sampling)
## Chain 4 finished in 20.0 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 19.2 seconds.
## Total execution time: 20.1 seconds.
Have a look at the returned Stan model file to see how
the GP factors are incorporated
cat(c("```stan", trends_mod3$model_file, "```"), sep = "\n")
// Stan model code generated by package mvgam
functions {
real lambda_gp(real L, int m) {
real lam;
lam = ((m*pi())/(2*L))^2;
return lam;
}
vector drift_SE(real L, int m, vector x) {
vector[rows(x)] fi;
fi = 1/sqrt(L) * sin(m*pi()/(2*L) * (x+L));
return fi;
}
real spd_SE(real alpha, real rho, real w) {
real S;
S = (alpha^2) * sqrt(2*pi()) * rho * exp(-0.5*(rho^2)*(w^2));
return S;
}
}
data {
int idx1[12]; // discontiguous index values
int idx2[4]; // discontiguous index values
int<lower=0> total_obs; // total number of observations
int<lower=0> n; // number of timepoints per series
int<lower=0> n_lv; // number of dynamic factors
int<lower=0> n_sp; // number of smoothing parameters
int<lower=0> n_series; // number of series
int<lower=0> num_basis; // total number of basis coefficients
real p_taus[1]; // prior precisions for parametric coefficients
real p_coefs[1]; // prior locations for parametric coefficients
vector[num_basis] zero; // prior locations for basis coefficients
matrix[total_obs, num_basis] X; // mgcv GAM design matrix
int<lower=0> ytimes[n, n_series]; // time-ordered matrix (which col in X belongs to each [time, series] observation?)
matrix[5,5] S1; // mgcv smooth penalty matrix S1
int<lower=0> n_nonmissing; // number of nonmissing observations
int<lower=0> flat_ys[n_nonmissing]; // flattened nonmissing observations
matrix[n_nonmissing, num_basis] flat_xs; // X values for nonmissing observations
int<lower=0> obs_ind[n_nonmissing]; // indices of nonmissing observations
}
transformed data {
vector<lower=1>[n] times;
vector[n] times_cent;
real mean_times;
real<lower=0> boundary;
int<lower=1> num_gp_basis;
num_gp_basis = min(20, n);
matrix[n, num_gp_basis] gp_drift;
for (t in 1:n){
times[t] = t;
}
mean_times = mean(times);
times_cent = times - mean_times;
boundary = (5.0/4) * (max(times_cent) - min(times_cent));
for (m in 1:num_gp_basis){
gp_drift[,m] = drift_SE(boundary, m, times_cent);
}
// Number of non-zero lower triangular factor loadings
// Ensures identifiability of the model - no rotation of factors
int<lower=1> M;
M = n_lv * (n_series - n_lv) + n_lv * (n_lv - 1) / 2 + n_lv;
}
parameters {
// raw basis coefficients
vector[num_basis] b_raw;
// dynamic factor lower triangle loading coefficients
vector[M] L;
// gp parameters
vector<lower=0>[n_lv] rho_gp;
// gp coefficient weights
matrix[num_gp_basis, n_lv] b_gp;
// smoothing parameters
vector<lower=0>[n_sp] lambda;
}
transformed parameters {
// gp spectral densities
matrix[n, n_lv] LV_raw;
matrix[num_gp_basis, n_lv] diag_SPD;
matrix[num_gp_basis, n_lv] SPD_beta;
// trends and dynamic factor loading matrix
matrix[n, n_series] trend;
matrix[n_series, n_lv] lv_coefs_raw;
// basis coefficients
vector[num_basis] b;
b[1:num_basis] = b_raw[1:num_basis];
// constraints allow identifiability of loadings
for (i in 1:(n_lv - 1)) {
for (j in (i + 1):(n_lv)){
lv_coefs_raw[i, j] = 0;
}
}
{
int index;
index = 0;
for (j in 1:n_lv) {
for (i in j:n_series) {
index = index + 1;
lv_coefs_raw[i, j] = L[index];
}
}
}
// gp LV estimates
for (m in 1:num_gp_basis){
for (s in 1:n_lv){
diag_SPD[m, s] = sqrt(spd_SE(0.25, rho_gp[s], sqrt(lambda_gp(boundary, m))));
}
}
SPD_beta = diag_SPD .* b_gp;
LV_raw = gp_drift * SPD_beta;
// derived latent trends
for (i in 1:n){;
for (s in 1:n_series){
trend[i, s] = dot_product(lv_coefs_raw[s,], LV_raw[i,1:n_lv]);
}
}
}
model {
// parametric effect priors (regularised for identifiability)
for (i in 1:1) {
b_raw[i] ~ normal(p_coefs[i], sqrt(1 / p_taus[i]));
}
// prior for s(season)...
b_raw[2:6] ~ multi_normal_prec(zero[2:6],S1[1:5,1:5] * lambda[1]);
// prior for s(season,series)...
for (i in idx1) { b_raw[i] ~ normal(0, lambda[2]); }
for (i in idx2) { b_raw[i] ~ normal(0, lambda[3]); }
// priors for smoothing parameters
lambda ~ normal(10, 25);
// priors for gp parameters
for (s in 1:n_lv){
b_gp[1:num_gp_basis, s] ~ std_normal();
}
rho_gp ~ inv_gamma(1.499007, 5.670433);
// priors for dynamic factor loading coefficients
L ~ student_t(5, 0, 1);
{
// likelihood functions
vector[n_nonmissing] flat_trends;
flat_trends = (to_vector(trend))[obs_ind];
flat_ys ~ poisson_log_glm(append_col(flat_xs, flat_trends),
0.0,append_row(b, 1.0));
}
}
generated quantities {
vector[total_obs] eta;
matrix[n, n_series] mus;
matrix[n, n_lv] LV;
matrix[n_series, n_lv] lv_coefs;
vector[n_sp] rho;
vector[n_lv] alpha_gp;
array[n, n_series] int ypred;
rho = log(lambda);
alpha_gp = rep_vector(0.25, n_lv);
// Sign correct factor loadings and factors
for(j in 1:n_lv){
if(lv_coefs_raw[j, j] < 0){
lv_coefs[,j] = -1 * lv_coefs_raw[,j];
LV[,j] = -1 * LV_raw[,j];
} else {
lv_coefs[,j] = lv_coefs_raw[,j];
LV[,j] = LV_raw[,j];
}
}
// posterior predictions
eta = X * b;
for(s in 1:n_series){
mus[1:n, s] = eta[ytimes[1:n, s]] + trend[1:n, s];
ypred[1:n, s] = poisson_log_rng(mus[1:n, s]);
}
}
There is a clear problem with this model, as we Stan’s
dynamic HMC algorightm has encountered a large number of divergent
transitions
summary(trends_mod3)
## GAM formula:
## y ~ s(season, k = 7, m = 2, bs = "cc") + s(season, series, k = 4,
## bs = "fs", m = 1)
##
## Family:
## poisson
##
## Link function:
## log
##
## Trend model:
## GP
##
## N latent factors:
## 2
##
## N series:
## 4
##
## N timepoints:
## 113
##
## Status:
## Fitted using Stan
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 0.11351923 1.079330000 2.006350750 1.02 264
## s(season).1 -0.60338812 -0.398332500 -0.198766125 1.00 713
## s(season).2 -0.51494392 -0.290707000 -0.052168780 1.00 754
## s(season).3 -0.28992167 -0.087008500 0.119435550 1.00 772
## s(season).4 0.24019952 0.435477000 0.656606775 1.00 670
## s(season).5 0.41393838 0.579559000 0.774547125 1.00 778
## s(season,series).1 -0.21824417 -0.043435550 0.116120375 1.00 1292
## s(season,series).2 -0.05107220 0.091146750 0.252658225 1.00 748
## s(season,series).3 -0.18496455 -0.083387950 -0.003976947 1.00 670
## s(season,series).4 -1.54067825 -0.534984500 0.382572475 1.02 286
## s(season,series).5 -0.18946125 -0.000539307 0.216417175 1.00 1755
## s(season,series).6 -0.12544250 0.039150050 0.248457575 1.00 1053
## s(season,series).7 -0.19429018 -0.055704850 0.062464712 1.00 980
## s(season,series).8 0.02589798 1.031665000 1.983330000 1.01 297
## s(season,series).9 -0.25555823 -0.053402700 0.103467050 1.00 1095
## s(season,series).10 -0.34437400 -0.152399000 -0.004541713 1.00 945
## s(season,series).11 -0.13354487 -0.034091900 0.063691492 1.00 762
## s(season,series).12 -1.08866875 -0.093641850 0.846556075 1.02 290
## s(season,series).13 -0.03499759 0.118613500 0.319880275 1.00 1261
## s(season,series).14 -0.12298042 0.027749650 0.196837525 1.00 677
## s(season,series).15 -0.03115462 0.064571950 0.152147025 1.00 715
## s(season,series).16 -1.48448350 -0.493031000 0.503648625 1.02 242
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 2.2953020 3.42477500 4.132058 1 1247
## s(season,series) -2.9123648 -2.18980000 -1.471212 1 646
## s(season,series)2 -0.8579098 -0.01094055 1.253594 1 822
##
## Latent trend length scale (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## rho_gp[1] 1.072011 6.04902 108.7481 1.02 197
## rho_gp[2] 1.163935 6.20813 111.5357 1.01 322
##
## Stan MCMC diagnostics
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 63 of 1600 iterations ended with a divergence (3.9375%)
## *Try running with larger adapt_delta to remove the divergences
## 0 of 1600 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior
##
Our experience is that there are fundamental degeneracies in latent
dynamic factor models, particularly occurring due to the inherent weak
identifiability of GP length scale parameters rho (which is
magnified when trying to estimate latent GPs in a factor model). A pairs
plot of the loading coefficients against the estimated GP length scale
parameters helps to reveal the problem:
pairs(trends_mod3$model_output, pars = c('lv_coefs','rho_gp'))
mvgam modelsThere are some nasty funnel shapes in this plot, primarily occurring
when some of the loadings go to 0 and the length scales
approach small values, suggesting that the latent length scales are
completely unidentifiable in this space. The default prior for length
scales in mvgam is an informative inverse Gamma, following
principled
prior recommendations by Michael Betancourt. But defaults should
always be critiqued, and in our experience it is far more efficient to
decide on an appropriate level of smoothness that we wish to induce in
the latent GPs for a given problem and to then use a stronger prior. For
these series, we may expect that any changes in online search behaviour
could reflect environmental / biological processes that extend over a
number of months, indicating that length scales of 4-8
would be more in line with domain expertise. We can use a suitable prior
for the length scales to help pull values away from very small or very
large length scales that go against our prior expectations
priors <- get_mvgam_priors(data = trends_data$data_train,
formula = y ~ s(season, k = 7, m = 2, bs = 'cc') +
s(season, series, k = 4, bs = 'fs', m = 1),
trend_model = 'GP',
use_lv = TRUE,
n_lv = 2,
family = 'poisson',
use_stan = TRUE)
priors$prior[2] <- 'rho_gp ~ normal(6, 1);'
trends_mod3 <- mvgam(data = trends_data$data_train,
newdata = trends_data$data_test,
formula = y ~ s(season, k = 7, m = 2, bs = 'cc') +
s(season, series, k = 4, bs = 'fs', m = 1),
knots = list(season = c(0.5, 12.5)),
trend_model = 'GP',
use_lv = TRUE,
n_lv = 2,
family = 'poisson',
use_stan = TRUE,
chains = 4,
burnin = 300,
samples = 400,
priors = priors)
## Running MCMC with 4 parallel chains...
##
## Chain 1 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 3 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 4 Iteration: 1 / 700 [ 0%] (Warmup)
## Chain 1 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 4 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 2 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 3 Iteration: 100 / 700 [ 14%] (Warmup)
## Chain 1 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 2 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 4 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 3 Iteration: 200 / 700 [ 28%] (Warmup)
## Chain 1 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 1 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 4 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 2 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 4 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 2 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 3 Iteration: 300 / 700 [ 42%] (Warmup)
## Chain 3 Iteration: 301 / 700 [ 43%] (Sampling)
## Chain 1 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 4 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 2 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 3 Iteration: 400 / 700 [ 57%] (Sampling)
## Chain 4 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 1 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 2 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 3 Iteration: 500 / 700 [ 71%] (Sampling)
## Chain 4 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 1 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 2 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 4 Iteration: 700 / 700 [100%] (Sampling)
## Chain 4 finished in 18.1 seconds.
## Chain 3 Iteration: 600 / 700 [ 85%] (Sampling)
## Chain 1 Iteration: 700 / 700 [100%] (Sampling)
## Chain 1 finished in 19.2 seconds.
## Chain 2 Iteration: 700 / 700 [100%] (Sampling)
## Chain 2 finished in 19.8 seconds.
## Chain 3 Iteration: 700 / 700 [100%] (Sampling)
## Chain 3 finished in 21.2 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 19.6 seconds.
## Total execution time: 21.3 seconds.
This model fits efficiently and should provide far superior forecasts than relying only on the estimated smooths. Inspect the model summary
summary(trends_mod3)
## GAM formula:
## y ~ s(season, k = 7, m = 2, bs = "cc") + s(season, series, k = 4,
## bs = "fs", m = 1)
##
## Family:
## poisson
##
## Link function:
## log
##
## Trend model:
## GP
##
## N latent factors:
## 2
##
## N series:
## 4
##
## N timepoints:
## 113
##
## Status:
## Fitted using Stan
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 0.19357708 1.094390000 2.025238e+00 1.01 582
## s(season).1 -0.62211515 -0.399532000 -1.881841e-01 1.00 994
## s(season).2 -0.54891363 -0.290626500 -4.166829e-02 1.00 818
## s(season).3 -0.30927960 -0.084793850 1.431061e-01 1.00 774
## s(season).4 0.21873468 0.438584000 6.898838e-01 1.00 877
## s(season).5 0.40634070 0.585135000 7.765456e-01 1.00 1154
## s(season,series).1 -0.22926153 -0.045097400 1.328376e-01 1.00 1200
## s(season,series).2 -0.07153536 0.091679250 2.814197e-01 1.00 882
## s(season,series).3 -0.17879695 -0.083486450 -4.526358e-06 1.00 910
## s(season,series).4 -1.42792075 -0.528309500 3.936460e-01 1.01 586
## s(season,series).5 -0.24041315 0.008016205 2.266785e-01 1.00 1545
## s(season,series).6 -0.13979345 0.043690150 2.674129e-01 1.00 1237
## s(season,series).7 -0.19024437 -0.055225550 6.307380e-02 1.00 1281
## s(season,series).8 0.11326945 1.058000000 2.047936e+00 1.00 582
## s(season,series).9 -0.28049937 -0.061551250 1.101255e-01 1.00 1378
## s(season,series).10 -0.39315173 -0.155901000 1.270916e-02 1.00 884
## s(season,series).11 -0.13110057 -0.033245500 6.517362e-02 1.00 1079
## s(season,series).12 -0.96050215 -0.072709950 8.429874e-01 1.01 595
## s(season,series).13 -0.03608681 0.119451500 3.283346e-01 1.00 1205
## s(season,series).14 -0.12803350 0.028691800 2.180661e-01 1.00 749
## s(season,series).15 -0.02217901 0.064057500 1.528710e-01 1.00 999
## s(season,series).16 -1.41495000 -0.501609000 4.422013e-01 1.00 582
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 2.216557 3.42763500 4.176332 1 1622
## s(season,series) -2.881528 -2.13914500 -1.440796 1 673
## s(season,series)2 -0.797726 -0.02149205 1.227495 1 1264
##
## Latent trend length scale (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## rho_gp[1] 4.152892 5.941095 7.894373 1 2214
## rho_gp[2] 3.973331 5.942575 7.903335 1 1893
##
## Stan MCMC diagnostics
## n_eff / iter looks reasonable for all parameters
## Rhat looks reasonable for all parameters
## 0 of 1600 iterations ended with a divergence (0%)
## 0 of 1600 iterations saturated the maximum tree depth of 12 (0%)
## E-FMI indicated no pathological behavior
##
The pairs plot now looks much better
pairs(trends_mod3$model_output, pars = c('lv_coefs','rho_gp'))
Inspection of the dynamic factors and their relative contributions indicates that both factors are contributing the the series’ latent trends
plot_mvgam_factors(trends_mod3)
Look at Dunn-Smyth residuals for some series from this preferred model to ensure that our dynamic factor process has captured most of the temporal dependencies in the observations
plot_mvgam_resids(trends_mod3, series = 1)
plot_mvgam_resids(trends_mod3, series = 2)
plot_mvgam_resids(trends_mod3, series = 3)
plot_mvgam_resids(trends_mod3, series = 4)
Perform posterior predictive checks to see if the model is able to
simulate data that looks realistic and unbiased by examining simulated
kernel densities for posterior predictions (yhat) compared
to the density of the observations (y). This will be
particularly useful for examining whether the Poisson observation model
is able to produce realistic looking simulations for each individual
series.
ppc(trends_mod3, series = 1, type = 'hist')
ppc(trends_mod3, series = 2, type = 'hist')
ppc(trends_mod3, series = 3, type = 'hist')
ppc(trends_mod3, series = 4, type = 'hist')
Look at traceplots for the smoothing parameters
(rho)
rstan::stan_trace(trends_mod3$model_output, pars = 'rho')
Plot posterior predictive distributions for the training and testing periods for each series
plot_mvgam_fc(object = trends_mod3, series = 1, newdata = trends_data$data_test)
## Out of sample DRPS:
## [1] 20.38138
##
plot_mvgam_fc(object = trends_mod3, series = 2, newdata = trends_data$data_test)
## Out of sample DRPS:
## [1] 6.291654
##
plot_mvgam_fc(object = trends_mod3, series = 3, newdata = trends_data$data_test)
## Out of sample DRPS:
## [1] 9.977941
##
plot_mvgam_fc(object = trends_mod3, series = 4, newdata = trends_data$data_test)
## Out of sample DRPS:
## [1] 14.00317
##
Plot posterior distributions for the latent trend estimates, again for the training and testing periods
plot_mvgam_trend(object = trends_mod3, series = 1, newdata = trends_data$data_test)
plot_mvgam_trend(object = trends_mod3, series = 2, newdata = trends_data$data_test)
plot_mvgam_trend(object = trends_mod3, series = 3, newdata = trends_data$data_test)
plot_mvgam_trend(object = trends_mod3, series = 4, newdata = trends_data$data_test)
newdata to inspect smooth functionsGiven that we fit a model with hierarchical seasonality, the seasonal
smooths are able to deviate from one another (though they share the same
wiggliness and all deviate from a common ‘global’ seasonal function).
Here we use the newdata argument to generate predictions
for each of the hierarchical smooth functions (note that the intercept
is still included in these plots so they do not center on zero)
newdat <- data.frame(season = seq(1, 12, length.out = 100),
series = levels(trends_data$data_train$series)[1])
plot_mvgam_smooth(object = trends_mod3, series = 1,
smooth = 'season',
newdata = newdat)
newdat <- data.frame(season = seq(1, 12, length.out = 100),
series = levels(trends_data$data_train$series)[2])
plot_mvgam_smooth(object = trends_mod3, series = 2,
smooth = 'season',
newdata = newdat)
newdat <- data.frame(season = seq(1, 12, length.out = 100),
series = levels(trends_data$data_train$series)[3])
plot_mvgam_smooth(object = trends_mod3, series = 3,
smooth = 'season',
newdata = newdat)
newdat <- data.frame(season = seq(1, 12, length.out = 100),
series = levels(trends_data$data_train$series)[4])
plot_mvgam_smooth(object = trends_mod3, series = 4,
smooth = 'season',
newdata = newdat)
Plot posterior mean estimates of latent trend correlations. These correlations are more useful than looking at latent factor loadings, for example to inspect ordinations. This is because the orders of the loadings (although constrained for identifiability purposes) can vary from chain to chain
correlations <- lv_correlations(object = trends_mod3)
library(ggplot2)
mean_correlations <- correlations$mean_correlations
mean_correlations[upper.tri(mean_correlations)] <- NA
mean_correlations <- data.frame(mean_correlations)
ggplot(mean_correlations %>%
tibble::rownames_to_column("series1") %>%
tidyr::pivot_longer(-c(series1), names_to = "series2", values_to = "Correlation"),
aes(x = series1, y = series2)) + geom_tile(aes(fill = Correlation)) +
scale_fill_gradient2(low="darkred", mid="white", high="darkblue",
midpoint = 0,
breaks = seq(-1,1,length.out = 5),
limits = c(-1, 1),
name = 'Trend\ncorrelation') + labs(x = '', y = '') + theme_dark() +
theme(axis.text.x = element_text(angle = 45, hjust=1))
There is certainly some evidence of positive trend correlations for a few of these search terms, which is not surprising given how similar some of them are and how closely linked they should be to interest about tick paralysis in Queensland. Plot some STL decompositions of these series to see if these trends are noticeable in the data
plot(stl(ts(as.vector(series$`tick paralysis`), frequency = 12), 'periodic'))
plot(stl(ts(as.vector(series$`paralysis tick dog`), frequency = 12), 'periodic'))
plot(stl(ts(as.vector(series$`dog tick`), frequency = 12), 'periodic'))
plot(stl(ts(as.vector(series$`tick bite`), frequency = 12), 'periodic'))
Forecast period posterior predictive checks suggest that the model still has room for improvement:
ppc(trends_mod3, series = 1, type = 'hist', newdata = trends_data$data_test)
ppc(trends_mod3, series = 1, type = 'mean', newdata = trends_data$data_test)
ppc(trends_mod3, series = 2, type = 'hist', newdata = trends_data$data_test)
ppc(trends_mod3, series = 2, type = 'mean', newdata = trends_data$data_test)
ppc(trends_mod3, series = 3, type = 'hist', newdata = trends_data$data_test)
ppc(trends_mod3, series = 3, type = 'mean', newdata = trends_data$data_test)
ppc(trends_mod3, series = 4, type = 'hist', newdata = trends_data$data_test)
ppc(trends_mod3, series = 4, type = 'mean', newdata = trends_data$data_test)
Other next steps could involve devising a more goal-specific set of posterior predictive checks (see this paper by Gelman et al and relevant works by Betancourt for examples) and compare out of sample Discrete Rank Probability Scores for this model and other versions for the observations (Negative Binomial) and latent trends (i.e. AR1, Random Walk)
Clark, N.J. and Wells, K. (2022). Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. DOI: https://doi.org/10.1111/2041-210X.13974
Welty, L.J., et al. (2009). Bayesian distributed lag models: estimating effects of particulate matter air pollution on daily mortality Biometrics 65.1: 282-291