In the following, a bayesian linear model will be presented using the built-in dataset mtcars. The model will be set up with the JAGS package. JAGS is an acronym that stands for Just Another Gibbs Sampler.
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
pairs(mtcars)
Among others, there appears to be a linear relationship between the variables mpg (Miles per Gallon) and wt(weight). Let’s look at this relationship in more detail. In addition, the distributions of both variables are presented.
plot(mpg ~ wt, mtcars)
hist(mtcars$mpg)
hist(mtcars$wt)
First, a default model is set up using the common function in R for linear regression models. From a bayesian perspective, this model can be regarded as having a non-informative prior.
mod_0 <- lm(mpg ~ wt, data = mtcars)
summary(mod_0)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Next, a bayesian model will be set up. But first, let’s check, if there are any missing values in the dataset.
any(is.na(mtcars))
## [1] FALSE
There are no missing values in the dataset. The model is specified as follows.
library(rjags)
## Loading required package: coda
## Linked to JAGS 4.2.0
## Loaded modules: basemod,bugs
mod1_string <- " model{
for(i in 1:n){
#Likelihood
y[i] ~ dnorm(mu[i], prec)
mu[i] = b[1] + b[2]*wt[i]
}
for(i in 1:2){
#Prior for coefficients
b[i] ~ dnorm(0.0, 1.0/1.0e6)
}
#Prior for left, unaccounted variance
prec ~ dgamma(5/2.0, 5*10.0/2.0)
sig2 = 1.0 / prec
sig = sqrt(sig2)
}"
Now, the model is set up. A seed is used so as to reproduce the results obtained. The parameters to be monitored are the betas and the left variance of the model. The initial values are randomly defined using the function rnorm and rgamma respectively. Finally, three independent Markov Chains are set up each starting off from the random values obtained .
set.seed(72)
data1_jags = list(y = mtcars$mpg, n = nrow(mtcars),
wt = mtcars$wt)
#parameters to observe
params1 = c("b", "sig")
#initial values
inits1 = function(){
inits = list("b" = rnorm(2, mean = 0.0, sd = 100.0), "prec" = rgamma(1, shape = 1.0, rate = 1.0))
}
#Setting up three seperate chains, with different starting values for each chain
mod1 <- jags.model(textConnection(mod1_string), data = data1_jags, inits = inits1, n.chains = 3)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 32
## Unobserved stochastic nodes: 3
## Total graph size: 145
##
## Initializing model
Next, a burn-in period of 1000 is specified. That is, the first 1000 iterations are discarded, as the chains usually need some time at the beginning to travel to the posterior distribution. Each chain is run 5000 times. Lastly, the chains are combined in one
#burn-in
update(mod1, 1000)
mod1_sim <- coda.samples(model = mod1,
variable.names = params1,
n.iter = 5000)
#Combine multiple chains
mod1_csim = do.call(rbind, mod1_sim)
Before looking at the results, it is important to check whether the chains have converged. There are a couple of diagnostics available for doing so.
#Convergence diagnostics
plot(mod1_sim)
#Gelman statistics looks okay
gelman.diag(mod1_sim)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## b[1] 1.01 1.02
## b[2] 1.01 1.02
## sig 1.00 1.00
##
## Multivariate psrf
##
## 1
#Check for autocorrelation
autocorr.diag(mod1_sim)
## b[1] b[2] sig
## Lag 0 1.000000000 1.000000000 1.000000000
## Lag 1 0.912075079 0.912348905 0.046469224
## Lag 5 0.633865707 0.632658273 0.004526959
## Lag 10 0.414809314 0.413648958 -0.007852637
## Lag 50 0.008074232 0.009449777 -0.007377316
autocorr.plot(mod1_sim)
effectiveSize(mod1_sim)
## b[1] b[2] sig
## 689.9072 687.6801 12389.1958
The gelman diagnostis and the plots suggest that the chains have converged. However, there is strong autocorrelation in the lags of the beta values resulting in a low effective size for both parameters. The values of the sigma parameter, on the other hand, are alright. In case of confidence intervals this would be an issue to deal with. This would imply running the chains more often and thinning them afterwards so as to retain values that show much less autocorrelation. However, as we will only look at posterior means we won’t apply any modifications.
summary(mod1_sim)
##
## Iterations = 1001:6000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 5000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## b[1] 37.246 1.8830 0.015375 0.07231
## b[2] -5.333 0.5595 0.004568 0.02153
## sig 3.132 0.3841 0.003136 0.00347
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## b[1] 33.553 35.991 37.251 38.512 40.940
## b[2] -6.422 -5.710 -5.330 -4.956 -4.245
## sig 2.492 2.857 3.097 3.366 3.978
summary(mod_0)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
As we can see, the values of the two models are pretty much the same. The model with the non-informative prior has the values Intercept = 37.3; Weight = -5.3; SE = 3.0. The Bayes results on the other hand are Intercept = 37.3; Weight = -5.3; SE = 3.1.
As in ordinary linear regression, model premises should be accounted for before interpreting the results.
#Design matrix; the n 1's get multiplied by the intercept afterwards
X = cbind(rep(1.0, data1_jags$n), data1_jags$wt)
(pm_params1 = colMeans(mod1_csim)) #posterior mean
## b[1] b[2] sig
## 37.245727 -5.332872 3.131824
#vector of predicted values from the model
yhat1 <- drop(X %*% pm_params1[1:2])
#Calculate residuals
resid1 <- data1_jags$y - yhat1
#Check independence
plot(resid1)
#Check linearity
plot(yhat1, resid1)
#Check normality
qqnorm(resid1)
As can be seen, some of the values scatter widely around zero. Furhtermore, the variance of the values seems to change as we move from the left to the right of the plot. The QQ-Plot shows that, especially at the higher end, there are outliers that deviate which violates the assumption of normally distributed residuals. To sum up, while this model might be a good starting point the residual diagnostics reveal some deficiencies. These could be accounted for by adding more covariates to the model or by changing the likelihood from a normal to a t-likelihood with thicker tails to handle outliers.