Packages

library(rjags) #Uses JAGS to create bayesian models
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
library(coda)
library(tidyverse) #Utility functions
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ dplyr   1.0.5
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Task 1

  • Evaluate the MCMC chain for convergence. Include relevant diagnostics and plots. Determine and remove burn-in
  • Report parameter summary table and plot marginal distributions
  • Describe and explain the parameter covariances that you observe in the pairs plot and parameter correlation matrix.
  • Compare the summary statistics for the Bayesian regression model to those from the classical regression: summary(lm( y ~ x1 )). This should include a comparison of the means and uncertainties of all 3 model parameters

Loading in Data

load("data/Ex05.Part1.RData")

Modeling in Rjags

Model priors
T.1.model <- "model {
#Model
for (i in 1:length(Y)) {
Y[i] ~ dnorm(mu[i], s^(-2))
mu[i] <- b0 + b1*x1[i]
}

#Prior
b0 ~ dnorm(0, 200^(-2))
b1 ~ dnorm(0, 200^(-2))

s ~ dunif(0,200)

}"
Model Data
T.1.data <- list(
  Y = y,
  x1 = x1
)
Initalization
set.seed(10)
T.1.n.chains <- 3
T.1.inital.coef <- coef(T.1.lm <- lm(y ~ x1))
T.1.inits = list()
for(i in 1:T.1.n.chains){
  T.1.inits[[i]] <- list(b0 = rnorm(1,T.1.inital.coef['(Intercept)'],200),
                     b1 = rnorm(1,T.1.inital.coef["x1"], 200),
                     s = sigma(T.1.lm),
                     .RNG.name = "base::Wichmann-Hill",
                     .RNG.seed = 10)
}
Running Rjags
T.1.n.adapt <- 1000
T.1.jags <- jags.model(
  file = textConnection(T.1.model),
  data = T.1.data,
  inits = T.1.inits,
  n.chains = T.1.n.chains,
  n.adapt = T.1.n.adapt
)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 100
##    Unobserved stochastic nodes: 3
##    Total graph size: 409
## 
## Initializing model
Simulating Posterior
T.1.n.iter <- 10000

T.1.sim <- coda.samples(
  model = T.1.jags,
  variable.names = c("b0", "b1", 's'),
  n.iter = T.1.n.iter
)
Diagnostics

First we examine the trace plots. We want to see convergence and overlap over the 3 chains

plot(T.1.sim)

Let’s see if the last 1000 iterations look like

window(T.1.sim, T.1.n.iter) %>% 
  plot(., density = F)

I looks like the 3 chains are overlaping each other

Next let’s examine Brooks-Gelman-Rubin (BGR) statistic

gelman.diag(T.1.sim)
## Potential scale reduction factors:
## 
##    Point est. Upper C.I.
## b0          1          1
## b1          1          1
## s           1          1
## 
## Multivariate psrf
## 
## 1

A value of 1 is great. If it was higher than 1.1 that would be bad.

Next let’s get rid of samples that were before convergence. BGR statistic versus sample plots below. We want to remove samples that were when the BGR was greater then 1.05

gelman.plot(T.1.sim)

Looking at the plots 1000 iterations seems to be a good burnin time. Keep in mind the plots are starting at 1000 because of the n.adapt from the jags.model

Inference

Before doing any inference it is important to remove burn in samples as they were before the convergence.

T.1.burn <- 1000
T.1.sim.prior <- window(T.1.sim, T.1.n.adapt + T.1.burn) #observations after n.adapt
acfplot(T.1.sim.prior) # how to make it not smushed

The autocorelation seems to die off after 10 lags.

acf(T.1.sim.prior[[1]])

acf(T.1.sim.prior[[2]])

acf(T.1.sim.prior[[3]])

Yep in all the chains the acf dies down after 10 lags.

I want the effective Size to be greater then 5000. This is with 10^{4} number of iterations. I might need to change the priors if I want to increase the effective sample size?

effectiveSize(T.1.sim.prior) #Change the priors to increase?
##        b0        b1         s 
##  3117.550  3226.224 15295.336
cumuplot(T.1.sim.prior,probs=c(0.025,0.25,0.5,0.75,0.975))

Saving prior distribution as dataframe to do further analysis.

T.1.prior.df <- data.frame(as.matrix(T.1.sim.prior))
pairs(T.1.prior.df) ## pairs plot to evaluate parameter correlation

cor(T.1.prior.df)    ## correlation matrix among model parameters
##              b0           b1            s
## b0  1.000000000 -0.889614394  0.003497632
## b1 -0.889614394  1.000000000 -0.008760438
## s   0.003497632 -0.008760438  1.000000000

There is a linearity for b1 and b0. This is typical in linear regression. The standard deviation seems to uncorrelated with the parameters b0 and b1.

#Density for each parameter
T.1.prior.df %>% 
  gather() %>% 
  ggplot(mapping = aes(value))+
  facet_wrap(~key, scales = "free")+
  geom_density()

summary(T.1.sim.prior)
## 
## Iterations = 2000:11000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 9001 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##     Mean      SD  Naive SE Time-series SE
## b0 9.770 0.90979 0.0055365       0.016307
## b1 2.013 0.07675 0.0004671       0.001357
## s  4.117 0.30171 0.0018361       0.002442
## 
## 2. Quantiles for each variable:
## 
##     2.5%   25%   50%    75%  97.5%
## b0 7.977 9.164 9.774 10.369 11.565
## b1 1.861 1.963 2.013  2.064  2.167
## s  3.586 3.903 4.098  4.308  4.759

Comparing with the linear regression

summary(T.1.lm)
## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9014 -3.2745  0.2913  2.9131 11.5248 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.76950    0.88709   11.01   <2e-16 ***
## x1           2.01369    0.07513   26.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.062 on 98 degrees of freedom
## Multiple R-squared:   0.88,  Adjusted R-squared:  0.8787 
## F-statistic: 718.4 on 1 and 98 DF,  p-value: < 2.2e-16
sigma(T.1.lm)
## [1] 4.062347
confint(T.1.lm)
##                2.5 %    97.5 %
## (Intercept) 8.009094 11.529905
## x1          1.864606  2.162779

The results are very similar from the two approaches. The estimates are almost identical. The estimate for the standard deviation is lower then the linear regression. The confidence interval compared to the credible interval are also very similar.

Task 2

Using the dataset “data/Ex05.Part2.RData”, extend your univariate regression to a multiple regression model with two covariates (x1, x2) and an interaction term (x1*x2). In doing so, not only do you need to update your process model, but you also need to make sure to update the dimension of your prior and initial conditions on \(b\) from 2 to 4.

  • Show the JAGS and R code used.
  • Include relevant convergence diagnostics and plots.
  • Report parameter summary table.
  • Plot marginal parameter distributions and pairwise parameter correlations (stats and scatter plots).

Loading in Data

load("data/Ex05.Part2.RData")

Modeling in Rjags

Model priors
T.2.model <- "model {
#Model
for (i in 1:length(Y)) {
Y[i] ~ dnorm(mu[i], s^(-2))
mu[i] <- b0 + b1*x1[i] + b2*x2[i] + b3*x1.2[i]
}
#x1.2 is the interaction term
#Prior
b0 ~ dnorm(0, 200^(-2))
b1 ~ dnorm(0, 200^(-2))
b2 ~ dnorm(0, 200^(-2))
b3 ~ dnorm(0, 200^(-2))

s ~ dunif(0,200)
}"
Model data
T.2.data <- list(
  Y = y,
  x1 = x1,
  x2 = x2,
  x1.2 = x1*x2
)
Initalization
T.2.n.chains <- 3
T.2.inital.coef <- coef(T.2.lm <- lm(y ~ x1*x2))
T.2.inits = list()
for(i in 1:T.2.n.chains){
  T.2.inits[[i]] <- list(b0 = rnorm(1,T.2.inital.coef['(Intercept)'],200),
                     b1 = rnorm(1,T.2.inital.coef["x1"], 200),
                     b2 = rnorm(1,T.2.inital.coef["x2"], 200),
                     b3 = rnorm(1, T.2.inital.coef["x1:x2"], 200),
                     s = sigma(T.2.lm),
                     .RNG.name = "base::Wichmann-Hill",
                     .RNG.seed = 10)
}
Running Rjags
T.2.n.adapt <- 1000
T.2.jags <- jags.model(
  file = textConnection(T.2.model),
  data = T.2.data,
  inits = T.2.inits,
  n.chains = T.2.n.chains,
  n.adapt = T.2.n.adapt
)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 250
##    Unobserved stochastic nodes: 5
##    Total graph size: 2011
## 
## Initializing model
Simulating Posterior
T.2.n.iter <- 10000

T.2.sim <- coda.samples(
  model = T.2.jags,
  variable.names = c("b0", "b1", "b2", "b3", 's'),
  n.iter = T.2.n.iter
)
Diagnostics

Looks good zoomed out.

plot(T.2.sim, density = F)

b1 looks a little off.

plot(window(T.2.sim, T.2.n.iter - 1000), density = F)

The Brooks-Gelman-Rubin (BGR) statistic looks good. All less than 1.1.

gelman.diag(T.2.sim)
## Potential scale reduction factors:
## 
##    Point est. Upper C.I.
## b0       1.01       1.02
## b1       1.01       1.02
## b2       1.01       1.02
## b3       1.01       1.03
## s        1.00       1.00
## 
## Multivariate psrf
## 
## 1.01
gelman.plot(T.2.sim)

Possible burnin time that would be good seems to be 2500.

Inference

Before doing any inference it is important to remove burn in samples as they were before the convergence.

T.2.burn <- 2500
T.2.sim.prior <- window(T.2.sim, T.2.burn) #observations after n.adapt
acf(T.2.sim.prior[[1]])

acf(T.2.sim.prior[[2]])

acf(T.2.sim.prior[[3]])

Autocorrelation is worse is this model. The ACF does not cut off it decays slowly.

I want the effective Size to be greater then 5000. This is with 10^{4} number of iterations. I might need to change the priors if I want to increase the effective sample size?

effectiveSize(T.2.sim.prior) #Change the priors to increase?
##         b0         b1         b2         b3          s 
##   487.1630   485.6567   537.0315   493.8652 17512.4019

Yes the effective size is extremely small

cumuplot(T.2.sim.prior,probs=c(0.025,0.25,0.5,0.75,0.975))

Saving prior distribution as dataframe to do further analysis.

T.2.prior.df <- data.frame(as.matrix(T.2.sim.prior))
pairs(T.2.prior.df) ## pairs plot to evaluate parameter correlation

cor(T.2.prior.df)    ## correlation matrix among model parameters
##             b0          b1           b2           b3            s
## b0  1.00000000 -0.85475396 -0.874512027  0.772118808 -0.015222234
## b1 -0.85475396  1.00000000  0.735224566 -0.878143163  0.011654255
## b2 -0.87451203  0.73522457  1.000000000 -0.866434301  0.007769943
## b3  0.77211881 -0.87814316 -0.866434301  1.000000000 -0.004860782
## s  -0.01522223  0.01165425  0.007769943 -0.004860782  1.000000000

There is a linearity for b1 and b0 and b3.

#Density for each parameter
T.2.prior.df %>% 
  gather() %>% 
  ggplot(mapping = aes(value))+
  facet_wrap(~key, scales = "free")+
  geom_density()

summary(T.2.sim.prior)
## 
## Iterations = 2500:11000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 8501 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##       Mean      SD  Naive SE Time-series SE
## b0 10.8157 0.98042 6.139e-03       0.044705
## b1  1.8914 0.08813 5.518e-04       0.004015
## b2 -3.9879 0.11800 7.389e-04       0.005113
## b3  0.4978 0.01013 6.344e-05       0.000458
## s   3.8019 0.17216 1.078e-03       0.001301
## 
## 2. Quantiles for each variable:
## 
##       2.5%     25%    50%     75%   97.5%
## b0  8.9154 10.1427 10.818 11.4864 12.7243
## b1  1.7207  1.8312  1.891  1.9520  2.0624
## b2 -4.2164 -4.0678 -3.989 -3.9069 -3.7566
## b3  0.4779  0.4909  0.498  0.5049  0.5174
## s   3.4832  3.6821  3.795  3.9153  4.1523

Comparing with the linear regression

summary(T.2.lm)
## 
## Call:
## lm(formula = y ~ x1 * x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.0518  -2.4692   0.0252   2.3968   8.6046 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.89965    1.00248   10.87   <2e-16 ***
## x1           1.88394    0.09076   20.76   <2e-16 ***
## x2          -3.99743    0.11919  -33.54   <2e-16 ***
## x1:x2        0.49868    0.01034   48.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.783 on 246 degrees of freedom
## Multiple R-squared:  0.9888, Adjusted R-squared:  0.9886 
## F-statistic:  7222 on 3 and 246 DF,  p-value: < 2.2e-16
sigma(T.2.lm)
## [1] 3.782779
confint(T.2.lm)
##                  2.5 %     97.5 %
## (Intercept)  8.9251074 12.8741966
## x1           1.7051821  2.0627048
## x2          -4.2321988 -3.7626624
## x1:x2        0.4783194  0.5190415

The results are very similar from the two approaches. The estimates are almost identical. The estimate for the standard deviation is higher then the linear regression. The confidence interval compared to the credible interval are also very similar.