Model to Fit: Time-Varying Parameter (TVP) Model

\[ y_t = \beta_{1t} + \sum_{k=1}^{p} \beta_{1+k,t} y_{t-k} + \sum_{\ell=1}^{r} \beta_{1+p+\ell,t} x_{\ell t} + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, \sigma_t^2) \]

with:

\[ \beta_{jt} = \beta_{j,t-1} + w_{jt}, \quad w_{jt} \sim \mathcal{N}(0, \theta_j), \quad j = 1, \ldots, 1 + p + r \] where: \(j=1,…,d\) with \(d=1+p+r\) the total number of coefficients (including the intercept).

The coefficients \(\beta_{jt}\) are time-varying and evolve as random walks.
The model includes:
- p lags of the dependent variable \(y_t\)
- r exogenous variables \(x_{\ell t}\), which could be financial indicators or other macro variables.
The innovation variance of \(\beta_{jt}\) is controlled by \(\theta_j\), and shrinkage priors are applied to these (via the shrinkTVP package).
The observation error variance \(\sigma_t^2\) may also vary over time.

# Load libraries
library(bvarsv)          # For data
library(shrinkTVP)       # For modeling   

# Load the US macroeconomic dataset
data("usmacro.update", package = "bvarsv")


# View column names to decide on y_t and x_t
colnames(usmacro.update)

## [1] "inf" "une" "tbi"

In most macroeconomic forecasting models, it is common to treat inflation (inf) as the dependent variable.

Lags of inf itself (for capturing inflation persistence)
une (unemployment rate)
tbi (interest rate)

Thus:

p = 2: number of lags of inf
r = 2: two exogenous variables → une, tbi

# Plot the time series (to understand trends)
ts.plot(usmacro.update, col = 1:NCOL(usmacro.update), lty = 1, main = "US Macro Time Series")
legend("topright", legend = colnames(usmacro.update), col = 1:NCOL(usmacro.update), lty = 1,  cex = 0.8)

Figure: The US macroeconomic time series used in the model. The inflation rate (inf), unemployment rate (une), and treasury bill rate (tbi) all exhibit strong non-stationarity and time variation, justifying the need for a flexible (TVP) modeling framework.

Model Structure and Justification for Shrinkage

In this analysis, we model the U.S. inflation rate as the dependent variable in a Bayesian Time-Varying Parameter (TVP) regression framework. The model incorporates:

Two lags of the inflation rate (p = 2)
Two exogenous covariates (r = 2):
- Lagged unemployment rate
- Lagged treasury bill rate

This results in a total of 5 predictors (including the intercept), each of which is allowed to vary over time. The model structure is:

\[ y_t = \beta_{1t} + \beta_{2t} y_{t-1} + \beta_{3t} y_{t-2} + \beta_{4t} x_{1t} + \beta_{5t} x_{2t} + \varepsilon_t,\quad \varepsilon_t \sim \mathcal{N}(0, \sigma_t^2) \] \[ \beta_{jt} = \beta_{j,t-1} + w_{jt}, \quad w_{jt} \sim \mathcal{N}(0, \theta_j) \]

This model is highly overparameterized: it estimates separate time paths for each coefficient (\(\beta_{jt}\)) and even allows the error variance \(\sigma_t^2\) to vary over time. While this flexibility allows the model to adapt to changing relationships in the data, it also risks overfitting, especially in macroeconomic data where sample sizes are relatively limited.

The Role of Shrinkage

To address this, we incorporate shrinkage priors using the shrinkTVP package (Knaus et al., 2021). These priors act as regularization mechanisms that pull coefficient paths toward a “null model” unless the data provide strong evidence against it.

The null model in this context is typically:

A homoskedastic linear regression, where:
- All coefficients are constant over time
- And possibly zero

This design enforces the principle of “letting the data speak, but only if it speaks loudly enough.” In other words, shrinkage priors:

Encourage simpler models where parameters don’t vary
Allow for time variation or nonzero values only when the data justify it

Practical Implication

If a coefficient does not significantly change over time, or is not relevant, the posterior estimate will reflect that — it will be shrunk toward zero or toward a constant value. This helps maintain model parsimony and interpretability while still allowing for flexibility where needed.

This balance of flexibility and parsimony is key in modern Bayesian macroeconomic modeling.

(a)Prior Setup and Interpretation – Ridge Prior

We consider a ridge prior for the standard deviation of the process variance:

\[ \sqrt{\theta_j} \sim \mathcal{N}\left(0, \frac{2}{\kappa_B^2} \right) \]

with a very large value for \(\kappa_B^2\). This implies the prior is sharply peaked at zero, favoring very small variances in the coefficient evolution process.

Interpretation:

This prior strongly shrinks the time variation in coefficients toward zero. In other words, it favors the belief that:

“Each coefficient \(\beta_{jt}\) is nearly constant over time.”

(a-1) Why This Is a Constant Coefficient Model :

Under this prior, since \(\theta_j \approx 0\), we have:

\[ \beta_{jt} = \beta_{j, t-1} + w_{jt}, \quad \text{with } w_{jt} \approx 0 \]

Therefore, all \(\beta_{jt} \approx \beta_j\), meaning the model behaves like a standard linear regression with time-invariant coefficients.

This highlights how prior choice can effectively collapse the TVP model into its simpler, constant-coefficient counterpart — unless the data force the model to deviate from it.

We want to show that when \(\kappa_B^2\) is very large, the estimated time-varying coefficients \(\beta_{jt}\) become flat,
i.e., nearly constant across time.

1. Fit a TVP Model With a Strong Ridge Prior

Use a very large value for \(\kappa_B^2\), (e.g., 1e6), to impose strong shrinkage.

# Prepare data
df <- as.data.frame(usmacro.update)
df$inf_lag1 <- c(NA, head(df$inf, -1))
df$inf_lag2 <- c(NA, NA, head(df$inf, -2))
df$une_lag1 <- c(NA, head(df$une, -1))
df$tbi_lag1 <- c(NA, head(df$tbi, -1))

# Drop NA rows
df_model <- na.omit(df)

set.seed(12)

# Fit model with strong ridge prior ($\approx$ constant coefficients)
fit_ridge_strong <- shrinkTVP(
  formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
  data = df_model,
  mod_type = "ridge",
  learn_kappa2_B = FALSE,
  kappa2_B = 1e10,      # Freeze time variation (make coefficients constant)
  learn_lambda2_B = FALSE,    
  niter = 5000,
  nburn = 1000,
  sv = FALSE
)


summary(fit_ridge_strong, showprior = FALSE)

## 
## Summary of 4000 MCMC draws after burn-in of 1000.
## 
## Statistics of posterior draws of parameters (thinning = 1):
## 
##  param                   mean   sd    median HPD 2.5% HPD 97.5% ESS 
##  beta_mean_Intercept     0.22   0.08  0.22   0.064    0.378     4000
##  beta_mean_inf_lag1      1.473  0.054 1.473  1.366    1.575     4000
##  beta_mean_inf_lag2      -0.497 0.056 -0.498 -0.608   -0.394    4000
##  beta_mean_une_lag1      -0.028 0.013 -0.028 -0.054   -0.004    4000
##  beta_mean_tbi_lag1      0.005  0.009 0.006  -0.013   0.023     4000
##                                                                     
##  abs(theta_sr_Intercept) 0      0     0      0        0         4000
##  abs(theta_sr_inf_lag1)  0      0     0      0        0         4000
##  abs(theta_sr_inf_lag2)  0      0     0      0        0         3897
##  abs(theta_sr_une_lag1)  0      0     0      0        0         3417
##  abs(theta_sr_tbi_lag1)  0      0     0      0        0         4282
##                                                                     
##  sigma2                  0.092  0.009 0.092  0.076    0.109     3346
##                                                                     
##  C0                      0.53   0.198 0.505  0.193    0.93      2997
##

This model behaves like a constant-coefficient regression as expected. All time variation has been eliminated by the ridge prior, verifying that under strong shrinkage on \(\theta_j\), the TVP model collapses into a standard regression.

Plot the Posterior Paths of Coefficients Over Time

If the TVP collapses to the standard regression model, the posterior paths of \(\beta_{jt}\) will be almost flat lines.

#checking Median like the paper:
library(coda)
par(mfrow = c(3, 2),           # Layout
    mar = c(4, 4, 2, 1),       # Margins: bottom, left, top, right
    oma = c(1, 1, 1, 1))       # Outer margins

for (j in 1:length(fit_ridge_strong$beta)) {
  beta_draws <- fit_ridge_strong$beta[[j]]
  
  beta_median <- apply(beta_draws, 2, median)
  beta_lower <- apply(beta_draws, 2, quantile, probs = 0.025)
  beta_upper <- apply(beta_draws, 2, quantile, probs = 0.975)
  
  plot(beta_median, type = "l", main = paste0("Posterior median of beta", j, "(t)"),
       ylab = bquote(beta[.(j)](t)), xlab = "Time", ylim = range(beta_lower, beta_upper))
  lines(beta_lower, col = "red", lty = 2)
  lines(beta_upper, col = "red", lty = 2)
  legend("topright", legend = c("Median", "95% CI"), col = c("black", "red"), lty = c(1, 2))
}

These plots confirm that, under the ridge prior with large shrinkage on time variation, all coefficients remain constant over time, effectively collapsing the TVP model into a standard static linear regression.

The paths are flat, with tight credible intervals, providing strong evidence that the data do not demand time-varying behavior in these parameters

(a-2) Ridge Prior with Weak Shrinkage on Coefficients

In this part, we fit the TVP model using a ridge prior with a very small value of \(\lambda_B^2\), which implies a very large prior variance on the initial level of each coefficient:

\[ \beta_j \sim \mathcal{N}\left(0, \frac{2}{\lambda_B^2}\right) \]

This makes the prior uninformative, meaning the model does not strongly pull coefficients toward zero. It allows the data to “speak freely” about the importance of each predictor.

Unlike part (a-1), where we used a very large \(\kappa_B^2\) to force all coefficients to be constant over time, here we allow time-variation (by learning \(\kappa_B^2\)) and relax shrinkage toward zero (by setting a small \(\lambda_B^2\)).

This confirms that an uninformative prior lets the model differentiate between important and unimportant predictors based on the data.

set.seed(12)  # new seed for new prior

# Fit model with weakly informative prior (small lambda^2_B)
fit_ridge_weak <- shrinkTVP(
  formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
  data = df_model,
  mod_type = "ridge",
  learn_lambda2_B = FALSE,
  lambda2_B = 1e-6,            # VERY SMALL → weak shrinkage → uninformative
  learn_kappa2_B = TRUE,       # Let model decide whether coefficients vary over time
  sv = FALSE,
  niter = 5000,
  nburn = 1000
)

# summary of posterior means
summary(fit_ridge_weak, showprior = FALSE)

## 
## Summary of 4000 MCMC draws after burn-in of 1000.
## 
## Statistics of posterior draws of parameters (thinning = 1):
## 
##  param                   mean   sd    median HPD 2.5% HPD 97.5% ESS
##  beta_mean_Intercept     0.914  0.421 0.903  0.121    1.758     103
##  beta_mean_inf_lag1      0.759  0.177 0.762  0.429    1.119     181
##  beta_mean_inf_lag2      -0.178 0.151 -0.158 -0.486   0.091     101
##  beta_mean_une_lag1      -0.182 0.062 -0.181 -0.302   -0.058    120
##  beta_mean_tbi_lag1      0      0.06  0.007  -0.119   0.117     70 
##                                                                    
##  abs(theta_sr_Intercept) 0.1    0.044 0.107  0.001    0.162     27 
##  abs(theta_sr_inf_lag1)  0.037  0.011 0.038  0.016    0.061     47 
##  abs(theta_sr_inf_lag2)  0.017  0.013 0.014  0        0.043     14 
##  abs(theta_sr_une_lag1)  0.009  0.005 0.01   0        0.018     42 
##  abs(theta_sr_tbi_lag1)  0.006  0.004 0.006  0        0.015     51 
##                                                                    
##  sigma2                  0.022  0.008 0.022  0.007    0.038     97 
##                                                                    
##  C0                      0.156  0.08  0.142  0.028    0.311     214
##

why this is a very uninformative prior distribution?

A small \(\lambda_B^2\) leads to a large variance Gaussian prior, which spreads the probability mass widely and allows the model to freely estimate coefficients based on the data. It does not bias the model toward zero, making it an uninformative prior that “lets the data speak.”

Which regression coefficients appear to be insignificant?

The coefficient for lagged treasury bill rate \(\beta_5(t)\) appears to be insignificant, as its posterior median remains close to zero and its entire credible interval consistently includes zero. All other predictors show evidence of being significant, at least during part of the time horizon.

Step 2: Visualize the \(\beta_j\) paths

library(coda)

par(mfrow = c(2, 3), mar = c(4.5, 4.5, 2.5, 1), oma = c(1, 1, 1, 1))
   
for (j in 1:length(fit_ridge_weak$beta)) {
  beta_draws <- fit_ridge_weak$beta[[j]]
  beta_median <- apply(beta_draws, 2, median)
  beta_lower  <- apply(beta_draws, 2, quantile, probs = 0.025)
  beta_upper  <- apply(beta_draws, 2, quantile, probs = 0.975)
  
  plot(beta_median, type = "l", col = "black",
       ylim = range(beta_lower, beta_upper),
       main = bquote("Posterior median of" ~ beta[.(j)](t)),
       xlab = "Time", ylab = bquote(beta[.(j)](t)))
  lines(beta_lower, col = "red", lty = 2)
  lines(beta_upper, col = "red", lty = 2)
  legend("topright", legend = c("Median", "95% CI"),
         col = c("black", "red"), lty = c(1, 2), bty = "n")
}

Under the uninformative ridge prior, the model allows the data to determine both the level and variability of each coefficient.
We observe clear time-variation in the intercept and modest evolution in the inflation lags, while the interest rate appears largely irrelevant.
This confirms the benefit of using shrinkage to detect structural stability or change, rather than assuming constant coefficients.

(b) Allowing Time Variation via Weak Shrinkage on Process Variance

In this part, we examine the effect of weakening the shrinkage on time variation in the regression coefficients.

Use a ridge prior on the process standard deviations: \[ \sqrt{\theta_j} \sim \mathcal{N}\left(0, \frac{2}{\kappa_B^2} \right) \]
Set a small value for \(\kappa_B^2\)

→ This increases the prior variance on \(\theta_j\)

→ Encourages the coefficients \(\beta_{jt}\) to vary over time
Keep the same prior as in part (a-2) for \(\beta_j\):
\[ \beta_j \sim \mathcal{N}\left(0, \frac{2}{\lambda_B^2} \right) \] with a small \(\lambda_B^2\) \(\Rightarrow\) weakly informative prior on the coefficient levels.

Modeling Intuition:

By reducing \(\kappa_B^2\), we allow the model to capture more flexible, time-varying behavior in the coefficients — especially if the data support it.
This contrasts with earlier settings where shrinkage forced coefficients to be nearly constant.

set.seed(12)
# Fit model with weak shrinkage on theta_j (small kappa2_B)
fit_b <- shrinkTVP(
  formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
  data = df_model,
  mod_type = "ridge",
  learn_kappa2_B = FALSE,
  kappa2_B = 1e-6,              # Small : weak shrinkage -> allows time-variation
  learn_lambda2_B = FALSE,
  lambda2_B = 1e-6,             # Same as in (a-2) -> weak prior on coefficient levels
  sv = FALSE,
  niter = 5000,
  nburn = 1000
)

summary(fit_b, showprior = FALSE)

## 
## Summary of 4000 MCMC draws after burn-in of 1000.
## 
## Statistics of posterior draws of parameters (thinning = 1):
## 
##  param                   mean  sd    median HPD 2.5% HPD 97.5% ESS
##  beta_mean_Intercept     0.899 0.428 0.888  0.087    1.749     100
##  beta_mean_inf_lag1      0.754 0.177 0.753  0.42     1.125     178
##  beta_mean_inf_lag2      -0.17 0.154 -0.149 -0.486   0.115     93 
##  beta_mean_une_lag1      -0.18 0.062 -0.179 -0.306   -0.061    132
##  beta_mean_tbi_lag1      0.003 0.06  0.012  -0.126   0.111     70 
##                                                                   
##  abs(theta_sr_Intercept) 0.101 0.045 0.109  0.001    0.164     25 
##  abs(theta_sr_inf_lag1)  0.037 0.011 0.038  0.016    0.06      43 
##  abs(theta_sr_inf_lag2)  0.017 0.014 0.011  0        0.043     11 
##  abs(theta_sr_une_lag1)  0.009 0.005 0.009  0        0.018     43 
##  abs(theta_sr_tbi_lag1)  0.006 0.004 0.006  0        0.014     51 
##                                                                   
##  sigma2                  0.022 0.008 0.021  0.007    0.038     91 
##                                                                   
##  C0                      0.155 0.08  0.141  0.031    0.311     211
##

Posterior Summary Interpretation

After setting a small \(\kappa_B^2\) (weak shrinkage on time variation), the model allowed more flexibility in how the regression coefficients evolve over time. Here’s what we observe:

Time-variation Indicators — abs(theta_sr_) These are relatively large compared to others and clearly non-zero, indicating noticeable time variation.

theta_sr_inf_lag2, theta_sr_une_lag1, and theta_sr_tbi_lag1 are small but non-zero

→ Suggest mild time variation — some dynamics may be present but less pronounced.

Significance of Coefficients — beta_mean_…: explain the mean of each Beta and HPD excludes or includes 0

(HPD includes 0) → insignificant

Allowing less shrinkage on \(\theta_j\) (via small \(\kappa_B^2\)) results in more flexible, time-varying coefficients.
The model expresses greater uncertainty (larger standard deviations) and reveals clear time variation especially in the intercept and first lag of inflation.
The insignificance of interest rate (tbi_lag1) remains robust, despite more flexibility.

# Plot beta paths

par(mfrow = c(2, 3), mar = c(4.5, 4.5, 2.5, 1), oma = c(1, 1, 1, 1))

for (j in 1:length(fit_b$beta)) {
  beta_draws <- fit_b$beta[[j]]
  beta_median <- apply(beta_draws, 2, median)
  beta_lower  <- apply(beta_draws, 2, quantile, probs = 0.025)
  beta_upper  <- apply(beta_draws, 2, quantile, probs = 0.975)

  plot(beta_median, type = "l", col = "black",
       ylim = range(beta_lower, beta_upper),
       main = bquote("Posterior median of" ~ beta[.(j)](t)),
       xlab = "Time", ylab = bquote(beta[.(j)](t)))
  lines(beta_lower, col = "red", lty = 2)
  lines(beta_upper, col = "red", lty = 2)
  legend("topright", legend = c("Median", "95% CI"),
         col = c("black", "red"), lty = c(1, 2), bty = "n")
}

1- discuss the difference to the results obtained in (a)

Feature	Part (a-1): Constant \(\beta\), Fixed Size	Part (a-2): Weak Prior on \(\beta\)	Part (b): Allow Time-Varying \(\beta\)
Time variation control (\(\kappa_B^2\))	Fixed large (1e10) → No time variation	Learned by model → Adaptive	Fixed small (1e-6) → Encourages dynamics
Shrinkage on \(\beta\) (\(\lambda_B^2\))	Fixed at default (20) → Moderate shrinkage	Fixed very small (1e-6) → Weak shrinkage	Fixed very small (1e-6) → Weak shrinkage
Time-varying behavior	Suppressed (frozen by prior)	Mild, if data supports it	Clearly allowed and model-driven
Role of data	Strong within fixed structure	Very strong → data dominates	Very strong, time-dynamic estimation
Uncertainty (posterior SD)	Moderate	Slightly higher due to flexibility	Highest → more coefficient freedom
Coefficient estimates (\(\beta\))	Static, time-invariant estimates	Mostly static, some flexibility allowed	Clearly dynamic across time

2- What is the effect of changing the global shrinkage \(\kappa_B^2\) on uncertainty quantification?

The global shrinkage parameter \(\kappa_B^2\) controls how much variation over time is allowed in the regression coefficients \(\beta_{jt}\):

A large \(\kappa_B^2\) (as in part a-1) leads to very small process variances \(\theta_j\), which forces the coefficients to remain nearly constant over time. As a result, the uncertainty intervals (HPD bands) around coefficients are narrow and relatively stable, since the model assumes no time dynamics.
A small \(\kappa_B^2\) (as in part b) implies large prior variance on time variation. This allows \(\beta_{jt}\) to evolve more freely over time, behaving like a random walk. The increased flexibility leads to wider credible intervals, especially in time periods where data support change or are sparse. Hence, uncertainty quantification is more honest and adaptive, reflecting both time dynamics and data noise.

Decreasing \(\kappa_B^2\) increases the model’s belief in possible time variation, which increases posterior uncertainty and gives a more flexible, dynamic interpretation of the coefficients.

3- Which regression coefficients appear to be time-varying?

Based on the posterior summaries and coefficient path plots in part (b):

Time-varying coefficients:
- Intercept \(\beta_1(t)\): shows clear structural changes over time.
- Lag 1 of inflation \(\beta_2(t)\): moderate time variation with visible shifts.
- Lag 2 of inflation \(\beta_3(t)\): mild time variation.
Not time-varying (or weakly varying):
- Unemployment \(\beta_4(t)\): mostly stable, minimal movement.
- Interest rate \(\beta_5(t)\): flat and insignificant throughout.

These results indicate that the model captures dynamic behavior in core inflation drivers, while suppressing variation in less informative predictors.

(c)Ridge vs. Lasso Priors in shrinkTVP

Ridge Prior (Simpler)

The ridge prior imposes fixed, global shrinkage on coefficients and time-variation:

Coefficient prior: \[ \beta_j \sim \mathcal{N}\left(0, \frac{2}{\lambda_B^2}\right) \]
Time-variation prior (process SD): \[ \sqrt{\theta_j} \sim \mathcal{N}\left(0, \frac{2}{\kappa_B^2}\right) \]

This means that all coefficients are shrunk uniformly and the model does not adapt the shrinkage per parameter.

Lasso Prior (More Hierarchical)

The Lasso prior adds flexibility by introducing hierarchical shrinkage:

Coefficient prior: \[ \beta_j \sim \mathcal{N}(0, \tau_j^2), \quad \tau_j^2 \sim \text{Exp}\left(\frac{\lambda_B^2}{2}\right) \]
Time-variation prior: \[ \sqrt{\theta_j} \sim \mathcal{N}(0, \xi_j^2), \quad \xi_j^2 \sim \text{Exp}\left(\frac{\kappa_B^2}{2}\right) \]

This allows each coefficient to adaptively shrink, and potentially sparsify, some coefficients by pushing them close to zero.

\[ \begin{array}{|l|l|l|} \hline \textbf{Feature} & \textbf{Ridge Prior} & \textbf{Lasso Prior (Hierarchical)} \\ \hline \text{Prior for } \beta_j & \mathcal{N}\left(0, \frac{2}{\lambda_B^2}\right) & \mathcal{N}(0, \tau_j^2), \, \tau_j^2 \sim \text{Exp}\left(\frac{\lambda_B^2}{2}\right) \\ \hline \text{Shrinkage structure} & \text{Fixed global shrinkage} & \text{Adaptive local shrinkage} \\ \hline \text{Interpretation} & \text{Shrinks all coefficients equally} & \text{Allows some coefficients to shrink more (sparse)} \\ \hline \text{Time-variation prior} & \mathcal{N}\left(0, \frac{2}{\kappa_B^2}\right) & \mathcal{N}(0, \xi_j^2), \, \xi_j^2 \sim \text{Exp}\left(\frac{\kappa_B^2}{2}\right) \\ \hline \text{Effect on time-variation} & \text{Uniform across all coefficients} & \text{Adaptive shrinkage per coefficient} \\ \hline \text{Number of layers} & \text{1–2 (less hierarchical)} & \text{3+ (more hierarchical)} \\ \hline \end{array} \]

Ridge: Think of ridge as a global shrinkage. It applies a uniform push toward zero for all coefficients. Best used when all variables are expected to contribute, and we want to prevent overfitting.
Lasso: Lasso introduces local flexibility by letting each coefficient choose how much to shrink. This allows the model to sparsify (effectively turning off irrelevant predictors), which improves feature selection and interpretability.

set.seed(12)
# Fit hierarchical Bayesian Lasso with fixed a_xi and a_tau
fit_lasso_like <- shrinkTVP(
  formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
  data = df_model,
  # mod_type = "lasso",  # default, so not needed
  learn_a_xi = FALSE,
  learn_a_tau = FALSE,
  a_xi = 1,
  a_tau = 1,
  niter = 5000,
  nburn = 1000,
  sv = FALSE
)

summary(fit_lasso_like, showprior = FALSE)

## 
## Summary of 4000 MCMC draws after burn-in of 1000.
## 
## Statistics of posterior draws of parameters (thinning = 1):
## 
##  param                   mean    sd      median  HPD 2.5% HPD 97.5% ESS 
##  beta_mean_Intercept     0.507   0.36    0.48    -0.104   1.203     111 
##  beta_mean_inf_lag1      0.68    0.183   0.679   0.34     1.055     99  
##  beta_mean_inf_lag2      -0.071  0.107   -0.061  -0.302   0.124     181 
##  beta_mean_une_lag1      -0.135  0.059   -0.137  -0.253   -0.026    77  
##  beta_mean_tbi_lag1      0.024   0.056   0.027   -0.08    0.14      64  
##                                                                         
##  abs(theta_sr_Intercept) 0.123   0.035   0.124   0.05     0.188     46  
##  abs(theta_sr_inf_lag1)  0.04    0.007   0.04    0.024    0.054     162 
##  abs(theta_sr_inf_lag2)  0.011   0.008   0.009   0        0.027     32  
##  abs(theta_sr_une_lag1)  0.007   0.005   0.006   0        0.016     38  
##  abs(theta_sr_tbi_lag1)  0.005   0.003   0.004   0        0.011     68  
##                                                                         
##  tau2_Intercept          0.404   0.691   0.196   0        1.496     403 
##  tau2_inf_lag1           0.439   0.653   0.242   0.009    1.392     548 
##  tau2_inf_lag2           0.223   0.697   0.069   0        0.912     1028
##  tau2_une_lag1           0.228   0.503   0.077   0        0.904     863 
##  tau2_tbi_lag1           0.196   0.54    0.049   0        0.777     1292
##                                                                         
##  xi2_Intercept           0.011   0.015   0.007   0        0.035     1115
##  xi2_inf_lag1            0.006   0.013   0.003   0        0.021     1639
##  xi2_inf_lag2            0.004   0.01    0.002   0        0.017     1740
##  xi2_une_lag1            0.005   0.014   0.001   0        0.017     1559
##  xi2_tbi_lag1            0.004   0.01    0.001   0        0.018     1353
##                                                                         
##  kappa2_B                548.418 466.515 423.824 17.079   1448.961  636 
##                                                                         
##  lambda2_B               18.354  24.885  10.409  0.191    59.336    372 
##                                                                         
##  sigma2                  0.021   0.008   0.02    0.006    0.036     67  
##                                                                         
##  C0                      0.143   0.075   0.129   0.026    0.291     104 
##

The Lasso prior adaptively shrinks coefficients, leading to sparse estimates.
Some coefficients show evidence of time variation, particularly the intercept and inf_lag1.
Uncertainty is greater compared to Ridge due to the learned shrinkage levels, which are data-driven and hierarchical.
Overall, Lasso offers more flexibility and can suppress irrelevant predictors effectively.

par(mfrow = c(2, 3), mar = c(4.5, 4.5, 2.5, 1), oma = c(1, 1, 1, 1))

for (j in 1:length(fit_lasso_like$beta)) {
  beta_draws <- fit_lasso_like$beta[[j]]
  beta_median <- apply(beta_draws, 2, median)
  beta_lower  <- apply(beta_draws, 2, quantile, probs = 0.025)
  beta_upper  <- apply(beta_draws, 2, quantile, probs = 0.975)

  plot(beta_median, type = "l", col = "black",
       ylim = range(beta_lower, beta_upper),
       main = bquote("Posterior median of" ~ beta[.(j)](t)),
       xlab = "Time", ylab = bquote(beta[.(j)](t)))
  lines(beta_lower, col = "red", lty = 2)
  lines(beta_upper, col = "red", lty = 2)
  legend("topright", legend = c("Median", "95% CI"),
         col = c("black", "red"), lty = c(1, 2), bty = "n")
}

1- Visualize the evolution of \(\beta_j(t)\) for all regression coefficients

The Lasso prior adds adaptive shrinkage, which sparsifies insignificant coefficients and allows important ones to evolve flexibly over time.

2- comparison a-1 , a-2 , b and c

2 - Comparison of parts a-1, a-2, b, and c

Part	Coefficient Behavior
a-1 (Ridge, fixed large \(\kappa_B^2\))	- All \(\beta_j(t)\) are constant over time. - No time variation captured. - Posterior intervals are narrow and stable.
a-2 (Ridge, learned \(\kappa_B^2\), weak prior on \(\beta_j\))	- Time-varying effects allowed where data supports. - Higher uncertainty due to weak shrinkage. - \(\beta_{\text{inf\_lag1}}(t)\) and \(\beta_{\text{inf\_lag2}}(t)\) show moderate variation.
b (Ridge, small \(\kappa_B^2\))	- Time variation clearly allowed. - Weak shrinkage on \(\theta_j\) leads to dynamic \(\beta_j(t)\). - Uncertainty increases with flexibility.
c (Lasso, hierarchical shrinkage)	- Adaptive shrinkage promotes sparsity. - \(\beta_{\text{inf\_lag1}}(t)\) and Intercept remain strong and dynamic. - \(\beta_{\text{tbi\_lag1}}(t)\) close to zero (suppressed). - Selective uncertainty — narrow for irrelevant predictors, wider where dynamics are present.

What is the effect of choosing Lasso instead of ridge on uncertainty quantification?

Choosing Lasso instead of Ridge leads to stronger, adaptive shrinkage—particularly for irrelevant or weak coefficients. This results in tighter uncertainty bands for those coefficients, enhancing sparsity and interpretability. However, for relevant or time-varying coefficients, the posterior uncertainty can remain wide if the data supports it. Overall, Lasso improves variable selection by concentrating uncertainty around zero for unimportant predictors.

Which regression coefficients appear to be time-varying? Which regression coefficients appear to be insignificant?

Time-varying coefficients:

The intercept and inf_lag1 show clear time-varying behavior based on their posterior dynamics.
Insignificant coefficients:

tbi_lag1 and inf_lag2 appear insignificant, as their credible intervals include zero and stay close to zero over time.
Significant but stable:

une_lag1 is significant, though less dynamic. Its posterior excludes zero.

(d): Normal-Gamma Prior — Conceptual Overview

In part (d), we apply a normal-gamma prior to allow for even more flexible shrinkage than the Lasso prior used in part (c).

Prior Structure

We replace the exponential priors on local variances \(\tau_j^2\) and \(\xi_j^2\) with Gamma priors:

\(\tau_j^2 \sim \mathcal{G}(a^\tau, a^\tau \lambda_B^2 / 2)\)
\(\xi_j^2 \sim \mathcal{G}(a^\xi, a^\xi \kappa_B^2 / 2)\)

Unlike fixed or exponential priors, Gamma priors allow for adjustable tails and more adaptive shrinkage behavior.

Key Ideas

The mean of the Gamma distribution is \(1 / \lambda_B^2\) (for \(\tau_j^2\)), and the variance depends inversely on \(a^\tau\), giving flexibility.
We no longer fix the form of the local variances; instead, we model them hierarchically, allowing the data to determine the degree of shrinkage for each coefficient.

Extra Parameters

Compared to the Lasso prior, we now estimate more hyperparameters:

\(a^\tau\), \(\lambda_B^2\), \(a^\xi\), \(\kappa_B^2\)

This introduces 2–4 additional parameters, depending on which ones are learned.

Feature	Lasso Prior	Normal-Gamma Prior
Local variance prior	\(\text{Exp}(\lambda_B^2 / 2)\)	\(\mathcal{G}(a^\tau, a^\tau \lambda_B^2 / 2)\)
Flexibility	Medium	High (heavy-tailed)
Adaptivity	Moderate (per-coefficient)	High (learns how much to shrink)
Extra parameters learned	0	\(a^\tau, \lambda_B^2, a^\xi, \kappa_B^2\)

The Normal-Gamma prior is especially useful when we want strong shrinkage for small/noisy coefficients, while preserving large/important signals through heavy-tailed behavior.

set.seed(12)
# Fit model using Normal-Gamma (double gamma) prior
fit_ng <- shrinkTVP(
  formula = inf ~ inf_lag1 + inf_lag2 + une_lag1 + tbi_lag1,
  data = df_model,
  mod_type = "double",        # Normal-Gamma prior
  niter = 5000,
  nburn = 1000,
  sv = FALSE                  # No stochastic volatility
)

summary(fit_ng, showprior = FALSE)

## 
## Summary of 4000 MCMC draws after burn-in of 1000.
## 
## Statistics of posterior draws of parameters (thinning = 1):
## 
##  param                   mean    sd        median HPD 2.5% HPD 97.5% ESS 
##  beta_mean_Intercept     0.349   0.427     0.162  -0.101   1.293     53  
##  beta_mean_inf_lag1      0.686   0.216     0.705  0.272    1.122     70  
##  beta_mean_inf_lag2      -0.013  0.044     0      -0.124   0.068     274 
##  beta_mean_une_lag1      -0.122  0.078     -0.133 -0.243   0.004     22  
##  beta_mean_tbi_lag1      0.013   0.025     0.001  -0.026   0.07      63  
##                                                                          
##  abs(theta_sr_Intercept) 0.143   0.027     0.143  0.089    0.194     94  
##  abs(theta_sr_inf_lag1)  0.042   0.006     0.042  0.03     0.055     240 
##  abs(theta_sr_inf_lag2)  0.003   0.005     0      0        0.014     33  
##  abs(theta_sr_une_lag1)  0.005   0.006     0.001  0        0.016     16  
##  abs(theta_sr_tbi_lag1)  0.001   0.003     0      0        0.006     30  
##                                                                          
##  tau2_Intercept          17.853  686.015   0.073  0        9.382     4000
##  tau2_inf_lag1           10.387  143.764   0.554  0        15.298    4000
##  tau2_inf_lag2           0.242   5.304     0      0        0.161     3752
##  tau2_une_lag1           1.413   20.181    0.03   0        1.814     4000
##  tau2_tbi_lag1           0.056   0.536     0      0        0.105     2718
##                                                                          
##  xi2_Intercept           0.294   2.583     0.028  0.001    0.675     4000
##  xi2_inf_lag1            285.399 18045.876 0.004  0        0.157     4000
##  xi2_inf_lag2            0.018   0.692     0      0        0.002     4000
##  xi2_une_lag1            0.01    0.191     0      0        0.007     3558
##  xi2_tbi_lag1            0.004   0.13      0      0        0.001     4000
##                                                                          
##  a_xi                    0.094   0.036     0.089  0.029    0.161     204 
##                                                                          
##  a_tau                   0.096   0.039     0.091  0.03     0.17      210 
##                                                                          
##  kappa2_B                156.825 297.458   40.987 0        714.758   3119
##                                                                          
##  lambda2_B               13.843  37.122    2.136  0        65.966    1254
##                                                                          
##  sigma2                  0.018   0.006     0.018  0.006    0.029     109 
##                                                                          
##  C0                      0.126   0.063     0.116  0.017    0.244     286 
##

d-1

What is the effect of choosing this shrinkage prior instead of the Lasso prior on uncertainty quantification?

Choosing the Normal-Gamma (double gamma) prior instead of the Lasso prior results in greater flexibility in shrinkage, which can lead to:

Wider uncertainty bands for relevant coefficients (e.g., inf_lag1), as the NG prior allows for heavy tails and less aggressive shrinkage when the data supports it.
More adaptive shrinkage across coefficients, with some being tightly shrunk (e.g., inf_lag2, tbi_lag1) and others retained with higher variance.
Overall, uncertainty is more selectively expressed — relevant signals retain variance, while irrelevant predictors are more confidently suppressed.

Does this affect your conclusions regarding which regression coefficients appear to be time-varying and which regression coefficients appear to be insignificant?

Time-varying coefficients:

The intercept and inf_lag1 still show clear time variation, confirmed by relatively large abs(theta_sr) values.
Compared to Lasso, these effects appear more confidently time-varying under the NG prior due to heavier tails and less shrinkage.

Insignificant coefficients:

tbi_lag1 and inf_lag2 remain insignificant, as their posterior means are near zero and HPD intervals include zero.
une_lag1 is still significant, as its HPD interval excludes zero — this is consistent across priors.

d-2 : Local Shrinkage Priors: Interpretation of \(a_\tau\) and \(a_\xi\)

# Extract posterior draws of a_tau and a_xi
a_tau_draws <- fit_ng$posteriors$a_tau
a_xi_draws  <- fit_ng$posteriors$a_xi

# Plotting
par(mfrow = c(2, 2))  # two plots side by side

# Plot a_tau: local shrinkage for coefficients
plot(fit_ng$a_tau, type = "l",
     main = expression("(" * a[tau] * ") controls shrinkage on coeff"),
     ylab = expression(a[tau]),
     xlab = "Coefficient Index / Time")

# Plot a_xi: local shrinkage for log-variances
plot(fit_ng$a_xi, type = "l",
     main = expression("Local Shrinkage for Log-Variance (" * a[xi] * ")"),
     ylab = expression(a[xi]),
     xlab = "Coefficient Index / Time")

Interpretation of Plots

1- Time Series Plots (Left panels):

Both \(a_\tau\) and \(a_\xi\) fluctuate over time but remain generally low.
Occasional spikes indicate temporary flexibility allowed in certain periods.
Shrinkage is largely stable — no wild variations in regularization intensity.

2- Density Plots (Right panels):

Both densities peak around 0.1, consistent with the summary statistics.
Most values are between 0.05–0.15, indicating consistent regularization across time and coefficients.

The model is applying effective, adaptive shrinkage using the Normal-Gamma prior.
Shrinkage parameters are moderate but not overly aggressive, supporting both sparsity and flexibility.
The plots and summaries align well, confirming that posterior shrinkage behaves as expected under the mod_type = "double" specification.

Relationship Between the Normal-Gamma Prior and the Lasso Prior (a=1)

The exponential distribution used in the Lasso prior is a special case of the Gamma distribution:

\[ \mathcal{E}\left(\frac{C}{2}\right) = \mathcal{G}\left(a, \frac{aC}{2}\right) \quad \text{when } a = 1 \]

This means the Lasso prior corresponds to a Normal-Gamma prior with fixed shape parameter \(a = 1\), inducing exponential shrinkage on the local variance components.

In contrast, the Normal-Gamma prior used in this model (mod_type = "double") allows both \(a^\tau\) and \(a^\xi\) to be learned from the data. This hierarchical setup introduces greater flexibility, enabling the model to adapt shrinkage intensity based on posterior evidence.

From the posterior summary:

The values \(a^\tau\) and \(a^\xi\) are well below 1, suggesting the data do not support Lasso-style exponential shrinkage. Instead, the Normal-Gamma prior allows heavier-tailed local variances, which means:

Less aggressive shrinkage on some coefficients
More robustness to outliers or time-varying effects

The posterior density plots of \(a^\tau\) and \(a^\xi\) confirm this behavior: both distributions are skewed left, peaking clearly around 0.1.

assignment4

Mahnaz Aliabadi

2025-04-05