\[ y_t = \beta_{1t} + \sum_{k=1}^{p} \beta_{1+k,t} y_{t-k} + \sum_{\ell=1}^{r} \beta_{1+p+\ell,t} x_{\ell t} + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, \sigma_t^2) \]
with:
\[ \beta_{jt} = \beta_{j,t-1} + w_{jt}, \quad w_{jt} \sim \mathcal{N}(0, \theta_j), \quad j = 1, \ldots, 1 + p + r \] where: \(j=1,…,d\) with \(d=1+p+r\) the total number of coefficients (including the intercept).
The coefficients \(\beta_{jt}\) are time-varying and evolve as random walks.
The model includes:
p lags of the dependent variable \(y_t\)
r exogenous variables \(x_{\ell t}\), which could be financial indicators or other macro variables.
The innovation variance of \(\beta_{jt}\) is controlled by \(\theta_j\), and shrinkage priors are applied to these (via the shrinkTVP package).
The observation error variance \(\sigma_t^2\) may also vary over time.
# Load libraries
library(bvarsv) # For data
library(shrinkTVP) # For modeling
# Load the US macroeconomic dataset
data("usmacro.update", package = "bvarsv")
# View column names to decide on y_t and x_t
colnames(usmacro.update)
## [1] "inf" "une" "tbi"
In most macroeconomic forecasting models, it is common to treat inflation (inf) as the dependent variable.
Lags of inf itself (for capturing inflation persistence)
une (unemployment rate)
tbi (interest rate)
Thus:
p = 2: number of lags of inf
r = 2: two exogenous variables → une, tbi
# Plot the time series (to understand trends)
ts.plot(usmacro.update, col = 1:NCOL(usmacro.update), lty = 1, main = "US Macro Time Series")
legend("topright", legend = colnames(usmacro.update), col = 1:NCOL(usmacro.update), lty = 1, cex = 0.8)
Figure: The US macroeconomic time series used in the model. The inflation rate (inf), unemployment rate (une), and treasury bill rate (tbi) all exhibit strong non-stationarity and time variation, justifying the need for a flexible (TVP) modeling framework.
In this analysis, we model the U.S. inflation rate as the dependent variable in a Bayesian Time-Varying Parameter (TVP) regression framework. The model incorporates:
Two lags of the inflation rate (p = 2)
Two exogenous covariates (r = 2):
Lagged unemployment rate
Lagged treasury bill rate
This results in a total of 5 predictors (including the intercept), each of which is allowed to vary over time. The model structure is:
\[ y_t = \beta_{1t} + \beta_{2t} y_{t-1} + \beta_{3t} y_{t-2} + \beta_{4t} x_{1t} + \beta_{5t} x_{2t} + \varepsilon_t,\quad \varepsilon_t \sim \mathcal{N}(0, \sigma_t^2) \] \[ \beta_{jt} = \beta_{j,t-1} + w_{jt}, \quad w_{jt} \sim \mathcal{N}(0, \theta_j) \]
This model is highly overparameterized: it estimates separate time paths for each coefficient (\(\beta_{jt}\)) and even allows the error variance \(\sigma_t^2\) to vary over time. While this flexibility allows the model to adapt to changing relationships in the data, it also risks overfitting, especially in macroeconomic data where sample sizes are relatively limited.
To address this, we incorporate shrinkage priors using the shrinkTVP package (Knaus et al., 2021). These priors act as regularization mechanisms that pull coefficient paths toward a “null model” unless the data provide strong evidence against it.
The null model in this context is typically:
A homoskedastic linear regression, where:
All coefficients are constant over time
And possibly zero
This design enforces the principle of “letting the data speak, but only if it speaks loudly enough.” In other words, shrinkage priors:
Encourage simpler models where parameters don’t vary
Allow for time variation or nonzero values only when the data justify it
If a coefficient does not significantly change over time, or is not relevant, the posterior estimate will reflect that — it will be shrunk toward zero or toward a constant value. This helps maintain model parsimony and interpretability while still allowing for flexibility where needed.
This balance of flexibility and parsimony is key in modern Bayesian macroeconomic modeling.
We consider a ridge prior for the standard deviation of the process variance:
\[ \sqrt{\theta_j} \sim \mathcal{N}\left(0, \frac{2}{\kappa_B^2} \right) \]
with a very large value for \(\kappa_B^2\). This implies the prior is sharply peaked at zero, favoring very small variances in the coefficient evolution process.
Interpretation:
This prior strongly shrinks the time variation in coefficients toward zero. In other words, it favors the belief that:
“Each coefficient \(\beta_{jt}\) is nearly constant over time.”
Under this prior, since \(\theta_j \approx 0\), we have:
\[ \beta_{jt} = \beta_{j, t-1} + w_{jt}, \quad \text{with } w_{jt} \approx 0 \]
Therefore, all \(\beta_{jt} \approx \beta_j\), meaning the model behaves like a standard linear regression with time-invariant coefficients.
This highlights how prior choice can effectively collapse the TVP model into its simpler, constant-coefficient counterpart — unless the data force the model to deviate from it.
We want to show that when \(\kappa_B^2\) is very
large, the estimated time-varying coefficients \(\beta_{jt}\) become
flat,
i.e., nearly constant across time.
1. Fit a TVP Model With a Strong Ridge Prior
Use a very large value for \(\kappa_B^2\), (e.g., 1e6), to impose strong shrinkage.
# Prepare data
df <- as.data.frame(usmacro.update)
df$inf_lag1 <- c(NA, head(df$inf, -1))
df$inf_lag2 <- c(NA, NA, head(df$inf, -2))
df$une_lag1 <- c(NA, head(df$une, -1))
df$tbi_lag1 <- c(NA, head(df$tbi, -1))
# Drop NA rows
df_model <- na.omit(df)
set.seed(12)
# Fit model with strong ridge prior ($\approx$ constant coefficients)
fit_ridge_strong <- shrinkTVP(
formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
data = df_model,
mod_type = "ridge",
learn_kappa2_B = FALSE,
kappa2_B = 1e10, # Freeze time variation (make coefficients constant)
learn_lambda2_B = FALSE,
niter = 5000,
nburn = 1000,
sv = FALSE
)
summary(fit_ridge_strong, showprior = FALSE)
##
## Summary of 4000 MCMC draws after burn-in of 1000.
##
## Statistics of posterior draws of parameters (thinning = 1):
##
## param mean sd median HPD 2.5% HPD 97.5% ESS
## beta_mean_Intercept 0.22 0.08 0.22 0.064 0.378 4000
## beta_mean_inf_lag1 1.473 0.054 1.473 1.366 1.575 4000
## beta_mean_inf_lag2 -0.497 0.056 -0.498 -0.608 -0.394 4000
## beta_mean_une_lag1 -0.028 0.013 -0.028 -0.054 -0.004 4000
## beta_mean_tbi_lag1 0.005 0.009 0.006 -0.013 0.023 4000
##
## abs(theta_sr_Intercept) 0 0 0 0 0 4000
## abs(theta_sr_inf_lag1) 0 0 0 0 0 4000
## abs(theta_sr_inf_lag2) 0 0 0 0 0 3897
## abs(theta_sr_une_lag1) 0 0 0 0 0 3417
## abs(theta_sr_tbi_lag1) 0 0 0 0 0 4282
##
## sigma2 0.092 0.009 0.092 0.076 0.109 3346
##
## C0 0.53 0.198 0.505 0.193 0.93 2997
##
This model behaves like a constant-coefficient regression as expected. All time variation has been eliminated by the ridge prior, verifying that under strong shrinkage on \(\theta_j\), the TVP model collapses into a standard regression.
Plot the Posterior Paths of Coefficients Over Time
If the TVP collapses to the standard regression model, the posterior paths of \(\beta_{jt}\) will be almost flat lines.
#checking Median like the paper:
library(coda)
par(mfrow = c(3, 2), # Layout
mar = c(4, 4, 2, 1), # Margins: bottom, left, top, right
oma = c(1, 1, 1, 1)) # Outer margins
for (j in 1:length(fit_ridge_strong$beta)) {
beta_draws <- fit_ridge_strong$beta[[j]]
beta_median <- apply(beta_draws, 2, median)
beta_lower <- apply(beta_draws, 2, quantile, probs = 0.025)
beta_upper <- apply(beta_draws, 2, quantile, probs = 0.975)
plot(beta_median, type = "l", main = paste0("Posterior median of beta", j, "(t)"),
ylab = bquote(beta[.(j)](t)), xlab = "Time", ylim = range(beta_lower, beta_upper))
lines(beta_lower, col = "red", lty = 2)
lines(beta_upper, col = "red", lty = 2)
legend("topright", legend = c("Median", "95% CI"), col = c("black", "red"), lty = c(1, 2))
}
These plots confirm that, under the ridge prior with large shrinkage on time variation, all coefficients remain constant over time, effectively collapsing the TVP model into a standard static linear regression.
The paths are flat, with tight credible intervals, providing strong evidence that the data do not demand time-varying behavior in these parameters
In this part, we fit the TVP model using a ridge prior with a very small value of \(\lambda_B^2\), which implies a very large prior variance on the initial level of each coefficient:
\[ \beta_j \sim \mathcal{N}\left(0, \frac{2}{\lambda_B^2}\right) \]
This makes the prior uninformative, meaning the model does not strongly pull coefficients toward zero. It allows the data to “speak freely” about the importance of each predictor.
Unlike part (a-1), where we used a very large \(\kappa_B^2\) to force all coefficients to be constant over time, here we allow time-variation (by learning \(\kappa_B^2\)) and relax shrinkage toward zero (by setting a small \(\lambda_B^2\)).
This confirms that an uninformative prior lets the model differentiate between important and unimportant predictors based on the data.
set.seed(12) # new seed for new prior
# Fit model with weakly informative prior (small lambda^2_B)
fit_ridge_weak <- shrinkTVP(
formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
data = df_model,
mod_type = "ridge",
learn_lambda2_B = FALSE,
lambda2_B = 1e-6, # VERY SMALL → weak shrinkage → uninformative
learn_kappa2_B = TRUE, # Let model decide whether coefficients vary over time
sv = FALSE,
niter = 5000,
nburn = 1000
)
# summary of posterior means
summary(fit_ridge_weak, showprior = FALSE)
##
## Summary of 4000 MCMC draws after burn-in of 1000.
##
## Statistics of posterior draws of parameters (thinning = 1):
##
## param mean sd median HPD 2.5% HPD 97.5% ESS
## beta_mean_Intercept 0.914 0.421 0.903 0.121 1.758 103
## beta_mean_inf_lag1 0.759 0.177 0.762 0.429 1.119 181
## beta_mean_inf_lag2 -0.178 0.151 -0.158 -0.486 0.091 101
## beta_mean_une_lag1 -0.182 0.062 -0.181 -0.302 -0.058 120
## beta_mean_tbi_lag1 0 0.06 0.007 -0.119 0.117 70
##
## abs(theta_sr_Intercept) 0.1 0.044 0.107 0.001 0.162 27
## abs(theta_sr_inf_lag1) 0.037 0.011 0.038 0.016 0.061 47
## abs(theta_sr_inf_lag2) 0.017 0.013 0.014 0 0.043 14
## abs(theta_sr_une_lag1) 0.009 0.005 0.01 0 0.018 42
## abs(theta_sr_tbi_lag1) 0.006 0.004 0.006 0 0.015 51
##
## sigma2 0.022 0.008 0.022 0.007 0.038 97
##
## C0 0.156 0.08 0.142 0.028 0.311 214
##
why this is a very uninformative prior distribution?
A small \(\lambda_B^2\) leads to a large variance Gaussian prior, which spreads the probability mass widely and allows the model to freely estimate coefficients based on the data. It does not bias the model toward zero, making it an uninformative prior that “lets the data speak.”
Which regression coefficients appear to be insignificant?
The coefficient for lagged treasury bill rate \(\beta_5(t)\) appears to be insignificant, as its posterior median remains close to zero and its entire credible interval consistently includes zero. All other predictors show evidence of being significant, at least during part of the time horizon.
Step 2: Visualize the \(\beta_j\) paths
library(coda)
par(mfrow = c(2, 3), mar = c(4.5, 4.5, 2.5, 1), oma = c(1, 1, 1, 1))
for (j in 1:length(fit_ridge_weak$beta)) {
beta_draws <- fit_ridge_weak$beta[[j]]
beta_median <- apply(beta_draws, 2, median)
beta_lower <- apply(beta_draws, 2, quantile, probs = 0.025)
beta_upper <- apply(beta_draws, 2, quantile, probs = 0.975)
plot(beta_median, type = "l", col = "black",
ylim = range(beta_lower, beta_upper),
main = bquote("Posterior median of" ~ beta[.(j)](t)),
xlab = "Time", ylab = bquote(beta[.(j)](t)))
lines(beta_lower, col = "red", lty = 2)
lines(beta_upper, col = "red", lty = 2)
legend("topright", legend = c("Median", "95% CI"),
col = c("black", "red"), lty = c(1, 2), bty = "n")
}
Under the uninformative ridge prior, the model allows the data to
determine both the level and variability of each coefficient.
We observe clear time-variation in the intercept and
modest evolution in the inflation lags, while the interest rate
appears largely irrelevant.
This confirms the benefit of using shrinkage to detect
structural stability or change, rather than assuming constant
coefficients.
In this part, we examine the effect of weakening the shrinkage on time variation in the regression coefficients.
Use a ridge prior on the process standard deviations: \[ \sqrt{\theta_j} \sim \mathcal{N}\left(0, \frac{2}{\kappa_B^2} \right) \]
Set a small value for \(\kappa_B^2\)
→ This increases the prior variance on \(\theta_j\)
→ Encourages the coefficients \(\beta_{jt}\) to vary over time
Keep the same prior as in part (a-2) for \(\beta_j\):
\[
\beta_j \sim \mathcal{N}\left(0, \frac{2}{\lambda_B^2} \right)
\] with a small \(\lambda_B^2\) \(\Rightarrow\) weakly informative prior on
the coefficient levels.
Modeling Intuition:
By reducing \(\kappa_B^2\), we allow
the model to capture more flexible, time-varying
behavior in the coefficients — especially if the data support
it.
This contrasts with earlier settings where shrinkage forced coefficients
to be nearly constant.
set.seed(12)
# Fit model with weak shrinkage on theta_j (small kappa2_B)
fit_b <- shrinkTVP(
formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
data = df_model,
mod_type = "ridge",
learn_kappa2_B = FALSE,
kappa2_B = 1e-6, # Small : weak shrinkage -> allows time-variation
learn_lambda2_B = FALSE,
lambda2_B = 1e-6, # Same as in (a-2) -> weak prior on coefficient levels
sv = FALSE,
niter = 5000,
nburn = 1000
)
summary(fit_b, showprior = FALSE)
##
## Summary of 4000 MCMC draws after burn-in of 1000.
##
## Statistics of posterior draws of parameters (thinning = 1):
##
## param mean sd median HPD 2.5% HPD 97.5% ESS
## beta_mean_Intercept 0.899 0.428 0.888 0.087 1.749 100
## beta_mean_inf_lag1 0.754 0.177 0.753 0.42 1.125 178
## beta_mean_inf_lag2 -0.17 0.154 -0.149 -0.486 0.115 93
## beta_mean_une_lag1 -0.18 0.062 -0.179 -0.306 -0.061 132
## beta_mean_tbi_lag1 0.003 0.06 0.012 -0.126 0.111 70
##
## abs(theta_sr_Intercept) 0.101 0.045 0.109 0.001 0.164 25
## abs(theta_sr_inf_lag1) 0.037 0.011 0.038 0.016 0.06 43
## abs(theta_sr_inf_lag2) 0.017 0.014 0.011 0 0.043 11
## abs(theta_sr_une_lag1) 0.009 0.005 0.009 0 0.018 43
## abs(theta_sr_tbi_lag1) 0.006 0.004 0.006 0 0.014 51
##
## sigma2 0.022 0.008 0.021 0.007 0.038 91
##
## C0 0.155 0.08 0.141 0.031 0.311 211
##
Posterior Summary Interpretation
After setting a small \(\kappa_B^2\) (weak shrinkage on time variation), the model allowed more flexibility in how the regression coefficients evolve over time. Here’s what we observe:
Time-variation Indicators — abs(theta_sr_) These are relatively large compared to others and clearly non-zero, indicating noticeable time variation.
→ Suggest mild time variation — some dynamics may be present but less pronounced.
Significance of Coefficients — beta_mean_…: explain the mean of each Beta and HPD excludes or includes 0
(HPD includes 0) → insignificant
Allowing less shrinkage on \(\theta_j\) (via small \(\kappa_B^2\)) results in more flexible, time-varying coefficients.
The model expresses greater uncertainty (larger standard deviations) and reveals clear time variation especially in the intercept and first lag of inflation.
The insignificance of interest rate (tbi_lag1) remains robust, despite more flexibility.
# Plot beta paths
par(mfrow = c(2, 3), mar = c(4.5, 4.5, 2.5, 1), oma = c(1, 1, 1, 1))
for (j in 1:length(fit_b$beta)) {
beta_draws <- fit_b$beta[[j]]
beta_median <- apply(beta_draws, 2, median)
beta_lower <- apply(beta_draws, 2, quantile, probs = 0.025)
beta_upper <- apply(beta_draws, 2, quantile, probs = 0.975)
plot(beta_median, type = "l", col = "black",
ylim = range(beta_lower, beta_upper),
main = bquote("Posterior median of" ~ beta[.(j)](t)),
xlab = "Time", ylab = bquote(beta[.(j)](t)))
lines(beta_lower, col = "red", lty = 2)
lines(beta_upper, col = "red", lty = 2)
legend("topright", legend = c("Median", "95% CI"),
col = c("black", "red"), lty = c(1, 2), bty = "n")
}
1- discuss the difference to the results obtained in (a)
| Feature | Part (a-1): Constant \(\beta\), Fixed Size | Part (a-2): Weak Prior on \(\beta\) | Part (b): Allow Time-Varying \(\beta\) |
|---|---|---|---|
| Time variation control (\(\kappa_B^2\)) | Fixed large (1e10) → No time variation | Learned by model → Adaptive | Fixed small (1e-6) → Encourages dynamics |
| Shrinkage on \(\beta\) (\(\lambda_B^2\)) | Fixed at default (20) → Moderate shrinkage | Fixed very small (1e-6) → Weak shrinkage | Fixed very small (1e-6) → Weak shrinkage |
| Time-varying behavior | Suppressed (frozen by prior) | Mild, if data supports it | Clearly allowed and model-driven |
| Role of data | Strong within fixed structure | Very strong → data dominates | Very strong, time-dynamic estimation |
| Uncertainty (posterior SD) | Moderate | Slightly higher due to flexibility | Highest → more coefficient freedom |
| Coefficient estimates (\(\beta\)) | Static, time-invariant estimates | Mostly static, some flexibility allowed | Clearly dynamic across time |
2- What is the effect of changing the global shrinkage \(\kappa_B^2\) on uncertainty quantification?
The global shrinkage parameter \(\kappa_B^2\) controls how much variation over time is allowed in the regression coefficients \(\beta_{jt}\):
A large \(\kappa_B^2\) (as in part a-1) leads to very small process variances \(\theta_j\), which forces the coefficients to remain nearly constant over time. As a result, the uncertainty intervals (HPD bands) around coefficients are narrow and relatively stable, since the model assumes no time dynamics.
A small \(\kappa_B^2\) (as in part b) implies large prior variance on time variation. This allows \(\beta_{jt}\) to evolve more freely over time, behaving like a random walk. The increased flexibility leads to wider credible intervals, especially in time periods where data support change or are sparse. Hence, uncertainty quantification is more honest and adaptive, reflecting both time dynamics and data noise.
Decreasing \(\kappa_B^2\) increases the model’s belief in possible time variation, which increases posterior uncertainty and gives a more flexible, dynamic interpretation of the coefficients.
3- Which regression coefficients appear to be time-varying?
Based on the posterior summaries and coefficient path plots in part (b):
Time-varying coefficients:
Intercept \(\beta_1(t)\): shows clear structural changes over time.
Lag 1 of inflation \(\beta_2(t)\): moderate time variation with visible shifts.
Lag 2 of inflation \(\beta_3(t)\): mild time variation.
Not time-varying (or weakly varying):
Unemployment \(\beta_4(t)\): mostly stable, minimal movement.
Interest rate \(\beta_5(t)\): flat and insignificant throughout.
These results indicate that the model captures dynamic behavior in core inflation drivers, while suppressing variation in less informative predictors.
Ridge Prior (Simpler)
The ridge prior imposes fixed, global shrinkage on coefficients and time-variation:
Coefficient prior: \[ \beta_j \sim \mathcal{N}\left(0, \frac{2}{\lambda_B^2}\right) \]
Time-variation prior (process SD): \[ \sqrt{\theta_j} \sim \mathcal{N}\left(0, \frac{2}{\kappa_B^2}\right) \]
This means that all coefficients are shrunk uniformly and the model does not adapt the shrinkage per parameter.
Lasso Prior (More Hierarchical)
The Lasso prior adds flexibility by introducing hierarchical shrinkage:
Coefficient prior: \[ \beta_j \sim \mathcal{N}(0, \tau_j^2), \quad \tau_j^2 \sim \text{Exp}\left(\frac{\lambda_B^2}{2}\right) \]
Time-variation prior: \[ \sqrt{\theta_j} \sim \mathcal{N}(0, \xi_j^2), \quad \xi_j^2 \sim \text{Exp}\left(\frac{\kappa_B^2}{2}\right) \]
This allows each coefficient to adaptively shrink, and potentially sparsify, some coefficients by pushing them close to zero.
\[ \begin{array}{|l|l|l|} \hline \textbf{Feature} & \textbf{Ridge Prior} & \textbf{Lasso Prior (Hierarchical)} \\ \hline \text{Prior for } \beta_j & \mathcal{N}\left(0, \frac{2}{\lambda_B^2}\right) & \mathcal{N}(0, \tau_j^2), \, \tau_j^2 \sim \text{Exp}\left(\frac{\lambda_B^2}{2}\right) \\ \hline \text{Shrinkage structure} & \text{Fixed global shrinkage} & \text{Adaptive local shrinkage} \\ \hline \text{Interpretation} & \text{Shrinks all coefficients equally} & \text{Allows some coefficients to shrink more (sparse)} \\ \hline \text{Time-variation prior} & \mathcal{N}\left(0, \frac{2}{\kappa_B^2}\right) & \mathcal{N}(0, \xi_j^2), \, \xi_j^2 \sim \text{Exp}\left(\frac{\kappa_B^2}{2}\right) \\ \hline \text{Effect on time-variation} & \text{Uniform across all coefficients} & \text{Adaptive shrinkage per coefficient} \\ \hline \text{Number of layers} & \text{1–2 (less hierarchical)} & \text{3+ (more hierarchical)} \\ \hline \end{array} \]
Ridge: Think of ridge as a global shrinkage. It applies a uniform push toward zero for all coefficients. Best used when all variables are expected to contribute, and we want to prevent overfitting.
Lasso: Lasso introduces local flexibility by letting each coefficient choose how much to shrink. This allows the model to sparsify (effectively turning off irrelevant predictors), which improves feature selection and interpretability.
set.seed(12)
# Fit hierarchical Bayesian Lasso with fixed a_xi and a_tau
fit_lasso_like <- shrinkTVP(
formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
data = df_model,
# mod_type = "lasso", # default, so not needed
learn_a_xi = FALSE,
learn_a_tau = FALSE,
a_xi = 1,
a_tau = 1,
niter = 5000,
nburn = 1000,
sv = FALSE
)
summary(fit_lasso_like, showprior = FALSE)
##
## Summary of 4000 MCMC draws after burn-in of 1000.
##
## Statistics of posterior draws of parameters (thinning = 1):
##
## param mean sd median HPD 2.5% HPD 97.5% ESS
## beta_mean_Intercept 0.507 0.36 0.48 -0.104 1.203 111
## beta_mean_inf_lag1 0.68 0.183 0.679 0.34 1.055 99
## beta_mean_inf_lag2 -0.071 0.107 -0.061 -0.302 0.124 181
## beta_mean_une_lag1 -0.135 0.059 -0.137 -0.253 -0.026 77
## beta_mean_tbi_lag1 0.024 0.056 0.027 -0.08 0.14 64
##
## abs(theta_sr_Intercept) 0.123 0.035 0.124 0.05 0.188 46
## abs(theta_sr_inf_lag1) 0.04 0.007 0.04 0.024 0.054 162
## abs(theta_sr_inf_lag2) 0.011 0.008 0.009 0 0.027 32
## abs(theta_sr_une_lag1) 0.007 0.005 0.006 0 0.016 38
## abs(theta_sr_tbi_lag1) 0.005 0.003 0.004 0 0.011 68
##
## tau2_Intercept 0.404 0.691 0.196 0 1.496 403
## tau2_inf_lag1 0.439 0.653 0.242 0.009 1.392 548
## tau2_inf_lag2 0.223 0.697 0.069 0 0.912 1028
## tau2_une_lag1 0.228 0.503 0.077 0 0.904 863
## tau2_tbi_lag1 0.196 0.54 0.049 0 0.777 1292
##
## xi2_Intercept 0.011 0.015 0.007 0 0.035 1115
## xi2_inf_lag1 0.006 0.013 0.003 0 0.021 1639
## xi2_inf_lag2 0.004 0.01 0.002 0 0.017 1740
## xi2_une_lag1 0.005 0.014 0.001 0 0.017 1559
## xi2_tbi_lag1 0.004 0.01 0.001 0 0.018 1353
##
## kappa2_B 548.418 466.515 423.824 17.079 1448.961 636
##
## lambda2_B 18.354 24.885 10.409 0.191 59.336 372
##
## sigma2 0.021 0.008 0.02 0.006 0.036 67
##
## C0 0.143 0.075 0.129 0.026 0.291 104
##
The Lasso prior adaptively shrinks coefficients, leading to sparse estimates.
Some coefficients show evidence of time variation, particularly the intercept and inf_lag1.
Uncertainty is greater compared to Ridge due to the learned shrinkage levels, which are data-driven and hierarchical.
Overall, Lasso offers more flexibility and can suppress irrelevant predictors effectively.
par(mfrow = c(2, 3), mar = c(4.5, 4.5, 2.5, 1), oma = c(1, 1, 1, 1))
for (j in 1:length(fit_lasso_like$beta)) {
beta_draws <- fit_lasso_like$beta[[j]]
beta_median <- apply(beta_draws, 2, median)
beta_lower <- apply(beta_draws, 2, quantile, probs = 0.025)
beta_upper <- apply(beta_draws, 2, quantile, probs = 0.975)
plot(beta_median, type = "l", col = "black",
ylim = range(beta_lower, beta_upper),
main = bquote("Posterior median of" ~ beta[.(j)](t)),
xlab = "Time", ylab = bquote(beta[.(j)](t)))
lines(beta_lower, col = "red", lty = 2)
lines(beta_upper, col = "red", lty = 2)
legend("topright", legend = c("Median", "95% CI"),
col = c("black", "red"), lty = c(1, 2), bty = "n")
}
1- Visualize the evolution of \(\beta_j(t)\) for all regression coefficients
The Lasso prior adds adaptive shrinkage, which sparsifies insignificant coefficients and allows important ones to evolve flexibly over time.
2- comparison a-1 , a-2 , b and c
2 - Comparison of parts a-1, a-2, b, and c
| Part | Coefficient Behavior |
|---|---|
| a-1 (Ridge, fixed large \(\kappa_B^2\)) | - All \(\beta_j(t)\) are constant
over time. - No time variation captured. - Posterior intervals are narrow and stable. |
| a-2 (Ridge, learned \(\kappa_B^2\), weak prior on \(\beta_j\)) | - Time-varying effects allowed where data supports. - Higher uncertainty due to weak shrinkage. - \(\beta_{\text{inf\_lag1}}(t)\) and \(\beta_{\text{inf\_lag2}}(t)\) show moderate variation. |
| b (Ridge, small \(\kappa_B^2\)) | - Time variation clearly allowed. - Weak shrinkage on \(\theta_j\) leads to dynamic \(\beta_j(t)\). - Uncertainty increases with flexibility. |
| c (Lasso, hierarchical shrinkage) | - Adaptive shrinkage promotes sparsity. - \(\beta_{\text{inf\_lag1}}(t)\) and Intercept remain strong and dynamic. - \(\beta_{\text{tbi\_lag1}}(t)\) close to zero (suppressed). - Selective uncertainty — narrow for irrelevant predictors, wider where dynamics are present. |
What is the effect of choosing Lasso instead of ridge on uncertainty quantification?
Choosing Lasso instead of Ridge leads to stronger, adaptive shrinkage—particularly for irrelevant or weak coefficients. This results in tighter uncertainty bands for those coefficients, enhancing sparsity and interpretability. However, for relevant or time-varying coefficients, the posterior uncertainty can remain wide if the data supports it. Overall, Lasso improves variable selection by concentrating uncertainty around zero for unimportant predictors.
Which regression coefficients appear to be time-varying? Which regression coefficients appear to be insignificant?
Time-varying coefficients:
The intercept and inf_lag1 show clear time-varying behavior based on their posterior dynamics.
Insignificant coefficients:
tbi_lag1 and inf_lag2 appear insignificant, as their credible intervals include zero and stay close to zero over time.
Significant but stable:
une_lag1 is significant, though less dynamic. Its posterior excludes zero.
In part (d), we apply a normal-gamma prior to allow for even more flexible shrinkage than the Lasso prior used in part (c).
Prior Structure
We replace the exponential priors on local variances \(\tau_j^2\) and \(\xi_j^2\) with Gamma priors:
\(\tau_j^2 \sim \mathcal{G}(a^\tau, a^\tau \lambda_B^2 / 2)\)
\(\xi_j^2 \sim \mathcal{G}(a^\xi, a^\xi \kappa_B^2 / 2)\)
Unlike fixed or exponential priors, Gamma priors allow for adjustable tails and more adaptive shrinkage behavior.
Key Ideas
The mean of the Gamma distribution is \(1 / \lambda_B^2\) (for \(\tau_j^2\)), and the variance depends inversely on \(a^\tau\), giving flexibility.
We no longer fix the form of the local variances; instead, we model them hierarchically, allowing the data to determine the degree of shrinkage for each coefficient.
Extra Parameters
Compared to the Lasso prior, we now estimate more hyperparameters:
This introduces 2–4 additional parameters, depending on which ones are learned.
| Feature | Lasso Prior | Normal-Gamma Prior |
|---|---|---|
| Local variance prior | \(\text{Exp}(\lambda_B^2 / 2)\) | \(\mathcal{G}(a^\tau, a^\tau \lambda_B^2 / 2)\) |
| Flexibility | Medium | High (heavy-tailed) |
| Adaptivity | Moderate (per-coefficient) | High (learns how much to shrink) |
| Extra parameters learned | 0 | \(a^\tau, \lambda_B^2, a^\xi, \kappa_B^2\) |
The Normal-Gamma prior is especially useful when we want strong shrinkage for small/noisy coefficients, while preserving large/important signals through heavy-tailed behavior.
set.seed(12)
# Fit model using Normal-Gamma (double gamma) prior
fit_ng <- shrinkTVP(
formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
data = df_model,
mod_type = "double", # Normal-Gamma prior
niter = 5000,
nburn = 1000,
sv = FALSE # No stochastic volatility
)
summary(fit_ng, showprior = FALSE)
##
## Summary of 4000 MCMC draws after burn-in of 1000.
##
## Statistics of posterior draws of parameters (thinning = 1):
##
## param mean sd median HPD 2.5% HPD 97.5% ESS
## beta_mean_Intercept 0.349 0.427 0.162 -0.101 1.293 53
## beta_mean_inf_lag1 0.686 0.216 0.705 0.272 1.122 70
## beta_mean_inf_lag2 -0.013 0.044 0 -0.124 0.068 274
## beta_mean_une_lag1 -0.122 0.078 -0.133 -0.243 0.004 22
## beta_mean_tbi_lag1 0.013 0.025 0.001 -0.026 0.07 63
##
## abs(theta_sr_Intercept) 0.143 0.027 0.143 0.089 0.194 94
## abs(theta_sr_inf_lag1) 0.042 0.006 0.042 0.03 0.055 240
## abs(theta_sr_inf_lag2) 0.003 0.005 0 0 0.014 33
## abs(theta_sr_une_lag1) 0.005 0.006 0.001 0 0.016 16
## abs(theta_sr_tbi_lag1) 0.001 0.003 0 0 0.006 30
##
## tau2_Intercept 17.853 686.015 0.073 0 9.382 4000
## tau2_inf_lag1 10.387 143.764 0.554 0 15.298 4000
## tau2_inf_lag2 0.242 5.304 0 0 0.161 3752
## tau2_une_lag1 1.413 20.181 0.03 0 1.814 4000
## tau2_tbi_lag1 0.056 0.536 0 0 0.105 2718
##
## xi2_Intercept 0.294 2.583 0.028 0.001 0.675 4000
## xi2_inf_lag1 285.399 18045.876 0.004 0 0.157 4000
## xi2_inf_lag2 0.018 0.692 0 0 0.002 4000
## xi2_une_lag1 0.01 0.191 0 0 0.007 3558
## xi2_tbi_lag1 0.004 0.13 0 0 0.001 4000
##
## a_xi 0.094 0.036 0.089 0.029 0.161 204
##
## a_tau 0.096 0.039 0.091 0.03 0.17 210
##
## kappa2_B 156.825 297.458 40.987 0 714.758 3119
##
## lambda2_B 13.843 37.122 2.136 0 65.966 1254
##
## sigma2 0.018 0.006 0.018 0.006 0.029 109
##
## C0 0.126 0.063 0.116 0.017 0.244 286
##
What is the effect of choosing this shrinkage prior instead of the Lasso prior on uncertainty quantification?
Choosing the Normal-Gamma (double gamma) prior instead of the Lasso prior results in greater flexibility in shrinkage, which can lead to:
Wider uncertainty bands for relevant coefficients (e.g., inf_lag1), as the NG prior allows for heavy tails and less aggressive shrinkage when the data supports it.
More adaptive shrinkage across coefficients, with some being tightly shrunk (e.g., inf_lag2, tbi_lag1) and others retained with higher variance.
Overall, uncertainty is more selectively expressed — relevant signals retain variance, while irrelevant predictors are more confidently suppressed.
Does this affect your conclusions regarding which regression coefficients appear to be time-varying and which regression coefficients appear to be insignificant?
Time-varying coefficients:
The intercept and inf_lag1 still show clear time variation, confirmed by relatively large abs(theta_sr) values.
Compared to Lasso, these effects appear more confidently time-varying under the NG prior due to heavier tails and less shrinkage.
Insignificant coefficients:
tbi_lag1 and inf_lag2 remain insignificant, as their posterior means are near zero and HPD intervals include zero.
une_lag1 is still significant, as its HPD interval excludes zero — this is consistent across priors.
# Extract posterior draws of a_tau and a_xi
a_tau_draws <- fit_ng$posteriors$a_tau
a_xi_draws <- fit_ng$posteriors$a_xi
# Plotting
par(mfrow = c(2, 2)) # two plots side by side
# Plot a_tau: local shrinkage for coefficients
plot(fit_ng$a_tau, type = "l",
main = expression("(" * a[tau] * ") controls shrinkage on coeff"),
ylab = expression(a[tau]),
xlab = "Coefficient Index / Time")
# Plot a_xi: local shrinkage for log-variances
plot(fit_ng$a_xi, type = "l",
main = expression("Local Shrinkage for Log-Variance (" * a[xi] * ")"),
ylab = expression(a[xi]),
xlab = "Coefficient Index / Time")
Interpretation of Plots
1- Time Series Plots (Left panels):
Both \(a_\tau\) and \(a_\xi\) fluctuate over time but remain generally low.
Occasional spikes indicate temporary flexibility allowed in certain periods.
Shrinkage is largely stable — no wild variations in regularization intensity.
2- Density Plots (Right panels):
Both densities peak around 0.1, consistent with the summary statistics.
Most values are between 0.05–0.15, indicating consistent regularization across time and coefficients.
The model is applying effective, adaptive shrinkage using the Normal-Gamma prior.
Shrinkage parameters are moderate but not overly aggressive, supporting both sparsity and flexibility.
The plots and summaries align well, confirming that
posterior shrinkage behaves as expected under the
mod_type = "double" specification.
Relationship Between the Normal-Gamma Prior and the Lasso Prior (a=1)
The exponential distribution used in the Lasso prior is a special case of the Gamma distribution:
\[ \mathcal{E}\left(\frac{C}{2}\right) = \mathcal{G}\left(a, \frac{aC}{2}\right) \quad \text{when } a = 1 \]
This means the Lasso prior corresponds to a Normal-Gamma prior with fixed shape parameter \(a = 1\), inducing exponential shrinkage on the local variance components.
In contrast, the Normal-Gamma prior used in this
model (mod_type = "double") allows both \(a^\tau\) and \(a^\xi\) to be learned from the
data. This hierarchical setup introduces greater
flexibility, enabling the model to adapt shrinkage
intensity based on posterior evidence.
From the posterior summary:
The values \(a^\tau\) and \(a^\xi\) are well below 1, suggesting the data do not support Lasso-style exponential shrinkage. Instead, the Normal-Gamma prior allows heavier-tailed local variances, which means:
Less aggressive shrinkage on some coefficients
More robustness to outliers or time-varying effects
The posterior density plots of \(a^\tau\) and \(a^\xi\) confirm this behavior: both distributions are skewed left, peaking clearly around 0.1.