A light-weight example of Bayes regression

Introduction

In the following, a bayesian linear model will be presented using the built-in dataset mtcars. The model will be set up with the JAGS package. JAGS is an acronym that stands for Just Another Gibbs Sampler.

data(mtcars)
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

pairs(mtcars)

Exploring the data

Among others, there appears to be a linear relationship between the variables mpg (Miles per Gallon) and wt(weight). Let’s look at this relationship in more detail. In addition, the distributions of both variables are presented.

plot(mpg ~ wt, mtcars)

hist(mtcars$mpg)

hist(mtcars$wt)

Running an ordinary lineary regression

First, a default model is set up using the common function in R for linear regression models. From a bayesian perspective, this model can be regarded as having a non-informative prior.

mod_0 <- lm(mpg ~ wt, data = mtcars)
summary(mod_0)

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Specifying the Bayesian model

Next, a bayesian model will be set up. But first, let’s check, if there are any missing values in the dataset.

any(is.na(mtcars))

## [1] FALSE

There are no missing values in the dataset. The model is specified as follows.

library(rjags)

## Loading required package: coda

## Linked to JAGS 4.2.0

## Loaded modules: basemod,bugs

mod1_string <- " model{
    for(i in 1:n){
        #Likelihood
        y[i] ~ dnorm(mu[i], prec)
        mu[i] = b[1] + b[2]*wt[i]
                }            
    for(i in 1:2){
        #Prior for coefficients
        b[i] ~ dnorm(0.0, 1.0/1.0e6)
    }
        #Prior for left, unaccounted variance
        prec ~ dgamma(5/2.0, 5*10.0/2.0)
        sig2 = 1.0 / prec
        sig = sqrt(sig2)
}"

Setting up the model

Now, the model is set up. A seed is used so as to reproduce the results obtained. The parameters to be monitored are the betas and the left variance of the model. The initial values are randomly defined using the function rnorm and rgamma respectively. Finally, three independent Markov Chains are set up each starting off from the random values obtained .

set.seed(72)
data1_jags = list(y = mtcars$mpg, n = nrow(mtcars),
                  wt = mtcars$wt)

#parameters to observe
params1 = c("b", "sig")
#initial values
inits1 = function(){
    inits = list("b" = rnorm(2, mean = 0.0, sd = 100.0), "prec" = rgamma(1, shape = 1.0, rate = 1.0))
}
#Setting up three seperate chains, with different starting values for each chain
mod1 <- jags.model(textConnection(mod1_string), data = data1_jags, inits = inits1, n.chains = 3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 32
##    Unobserved stochastic nodes: 3
##    Total graph size: 145
## 
## Initializing model

Run the MCMC sampler

Next, a burn-in period of 1000 is specified. That is, the first 1000 iterations are discarded, as the chains usually need some time at the beginning to travel to the posterior distribution. Each chain is run 5000 times. Lastly, the chains are combined in one

#burn-in
update(mod1, 1000)

mod1_sim <- coda.samples(model = mod1, 
                         variable.names = params1,
                         n.iter = 5000)
#Combine multiple chains
mod1_csim = do.call(rbind, mod1_sim)

MCMC Diagnostics

Before looking at the results, it is important to check whether the chains have converged. There are a couple of diagnostics available for doing so.

#Convergence diagnostics
plot(mod1_sim)

#Gelman statistics looks okay
gelman.diag(mod1_sim)

## Potential scale reduction factors:
## 
##      Point est. Upper C.I.
## b[1]       1.01       1.02
## b[2]       1.01       1.02
## sig        1.00       1.00
## 
## Multivariate psrf
## 
## 1

#Check for autocorrelation
autocorr.diag(mod1_sim)

##               b[1]        b[2]          sig
## Lag 0  1.000000000 1.000000000  1.000000000
## Lag 1  0.912075079 0.912348905  0.046469224
## Lag 5  0.633865707 0.632658273  0.004526959
## Lag 10 0.414809314 0.413648958 -0.007852637
## Lag 50 0.008074232 0.009449777 -0.007377316

autocorr.plot(mod1_sim)

effectiveSize(mod1_sim)

##       b[1]       b[2]        sig 
##   689.9072   687.6801 12389.1958

The gelman diagnostis and the plots suggest that the chains have converged. However, there is strong autocorrelation in the lags of the beta values resulting in a low effective size for both parameters. The values of the sigma parameter, on the other hand, are alright. In case of confidence intervals this would be an issue to deal with. This would imply running the chains more often and thinning them afterwards so as to retain values that show much less autocorrelation. However, as we will only look at posterior means we won’t apply any modifications.

summary(mod1_sim)

## 
## Iterations = 1001:6000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##        Mean     SD Naive SE Time-series SE
## b[1] 37.246 1.8830 0.015375        0.07231
## b[2] -5.333 0.5595 0.004568        0.02153
## sig   3.132 0.3841 0.003136        0.00347
## 
## 2. Quantiles for each variable:
## 
##        2.5%    25%    50%    75%  97.5%
## b[1] 33.553 35.991 37.251 38.512 40.940
## b[2] -6.422 -5.710 -5.330 -4.956 -4.245
## sig   2.492  2.857  3.097  3.366  3.978

summary(mod_0)

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

As we can see, the values of the two models are pretty much the same. The model with the non-informative prior has the values Intercept = 37.3; Weight = -5.3; SE = 3.0. The Bayes results on the other hand are Intercept = 37.3; Weight = -5.3; SE = 3.1.

Residual diagnostics

As in ordinary linear regression, model premises should be accounted for before interpreting the results.

#Design matrix; the n 1's get multiplied by the intercept afterwards
X = cbind(rep(1.0, data1_jags$n), data1_jags$wt)

(pm_params1 = colMeans(mod1_csim)) #posterior mean

##      b[1]      b[2]       sig 
## 37.245727 -5.332872  3.131824

#vector of predicted values from the model
yhat1 <- drop(X %*% pm_params1[1:2])
#Calculate residuals
resid1 <- data1_jags$y - yhat1

#Check independence
plot(resid1)

#Check linearity
plot(yhat1, resid1)

#Check normality
qqnorm(resid1)

Conclusion

As can be seen, some of the values scatter widely around zero. Furhtermore, the variance of the values seems to change as we move from the left to the right of the plot. The QQ-Plot shows that, especially at the higher end, there are outliers that deviate which violates the assumption of normally distributed residuals. To sum up, while this model might be a good starting point the residual diagnostics reveal some deficiencies. These could be accounted for by adding more covariates to the model or by changing the likelihood from a normal to a t-likelihood with thicker tails to handle outliers.