Bayesian Inference

Bayesian Inference for Normal Linear Models

Prior and Posterior Distributions for Normal Linear Models

Example: Bayesian Linear Model for Mental Impairment

Mental <-read.table("http://stat4ds.rwth-aachen.de/data/Mental.dat", header=TRUE)
summary(lm(impair~life+ses, data=Mental)) #least squares


Call:
lm(formula = impair ~ life + ses, data = Mental)

Residuals:
   Min     1Q Median     3Q    Max 
-8.678 -2.494 -0.336  2.886 10.891 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 28.22981    2.17422  12.984 2.38e-15 ***
life         0.10326    0.03250   3.177  0.00300 ** 
ses         -0.09748    0.02908  -3.351  0.00186 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.556 on 37 degrees of freedom
Multiple R-squared:  0.3392,    Adjusted R-squared:  0.3034 
F-statistic: 9.495 on 2 and 37 DF,  p-value: 0.0004697

library(MCMCpack)

Loading required package: coda

Loading required package: MASS

##
## Markov Chain Monte Carlo Package (MCMCpack)

## Copyright (C) 2003-2023 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park

##
## Support provided by the U.S. National Science Foundation

## (Grants SES-0350646 and SES-0350613)
##

fit.bayes<-MCMCregress(impair~life + ses,mcmc=5000000,b0=0,B0=10^(-10),c0=10^(-10),d0=10^(-10),data=Mental)
summary(fit.bayes) # normal priors with means=b0, variances=1/B0


Iterations = 1001:5001000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 5e+06 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                Mean      SD  Naive SE Time-series SE
(Intercept) 28.22800 2.23535 9.997e-04      1.000e-03
life         0.10328 0.03342 1.495e-05      1.495e-05
ses         -0.09745 0.02991 1.338e-05      1.338e-05
sigma2      21.94605 5.40438 2.417e-03      2.613e-03

2. Quantiles for each variable:

                2.5%      25%      50%      75%    97.5%
(Intercept) 23.82520 26.74478 28.22777 29.70950 32.63355
life         0.03742  0.08114  0.10328  0.12541  0.16916
ses         -0.15642 -0.11725 -0.09745 -0.07766 -0.03847
sigma2      13.79814 18.12121 21.13905 24.86586 34.75493

mean(fit.bayes[,2]<=0);mean(fit.bayes[,3]>= 0)

[1] 0.0015066

[1] 0.0009274

Bayesian Approach to the Normal One-Way Layout

Plots for Regression Bands and Posterior Distributions

income <- read.table("http://stat4ds.rwth-aachen.de/data/Salaries.dat", header=T)

model.all <- lm(salary~years, data=income) # for all cases, ignoring gender
summary(model.all)


Call:
lm(formula = salary ~ years, data = income)

Residuals:
    Min      1Q  Median      3Q     Max 
-168.02  -68.41  -24.68   82.95  176.79 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1323.160     63.962   20.69   <2e-16 ***
years         96.730      2.969   32.58   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 100.3 on 24 degrees of freedom
Multiple R-squared:  0.9779,    Adjusted R-squared:  0.977 
F-statistic:  1062 on 1 and 24 DF,  p-value: < 2.2e-16

new <- data.frame(years = c(12,20,30))
CI_x0 <- cbind(new, predict(model.all, newdata=new, interval="confidence", level = 0.95))
CI_x0

  years      fit      lwr      upr
1    12 2483.914 2417.867 2549.961
2    20 3257.751 3217.018 3298.483
3    30 4225.046 4154.067 4296.025

PI_x0 <- cbind(new, predict(model.all, newdata=new, interval="prediction", level = 0.95))
PI_x0

  years      fit      lwr      upr
1    12 2483.914 2266.532 2701.297
2    20 3257.751 3046.677 3468.824
3    30 4225.046 4006.115 4443.977

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::select() masks MASS::select()

alpha <- 0.01
pred.dat <- predict(model.all, interval="prediction", level=1-alpha)

Warning in predict.lm(model.all, interval = "prediction", level = 1 - alpha): predictions on current data refer to _future_ responses

#----------------------------------------------------------------------------
# calculation of the entire regression line CI:
x <- income$years; n <- nrow(income)
cf <- sqrt(sum(model.all$residuals^2)/(n-2))*
sqrt(2*qf(1-alpha, 2, n-2))*sqrt((1/n)+(mean(x)-x)^2/(n*var(x)))
ci.regr1 <- model.all$fitted.values - cf
ci.regr2 <- model.all$fitted.values + cf
names(ci.regr1) <- "lwr.regr"; names(ci.regr2) <- "uwr.regr"
#----------------------------------------------------------------------------
newdata <- cbind(income, pred.dat, ci.regr1, ci.regr2)
ggplot(newdata, aes(years, salary)) +
geom_point() +
geom_smooth(method=lm, se=TRUE, level=1-alpha)+
geom_line(aes(y=lwr), color = "red", linetype = "dashed")+
geom_line(aes(y=upr), color = "red", linetype = "dashed")+
geom_line(aes(y=ci.regr1), color = "red")+
geom_line(aes(y=ci.regr2), color = "red")

`geom_smooth()` using formula = 'y ~ x'

# ggplot needs categorical variables used for shape/color to be factors
Gender <- factor(income$gender, labels=c("M","F"))
ggplot(income, aes(x=years, y=salary, color=Gender, shape=Gender)) +
geom_point() +
geom_smooth(method=lm, se=TRUE, aes(fill=Gender))+
scale_colour_manual(values = c("Blue", "Red"))+
scale_fill_manual(values=c("blue","red"))

`geom_smooth()` using formula = 'y ~ x'

library(MCMCpack)
fit.bayes <- MCMCregress(impair ~ life + ses, mcmc=5000000, b0=0, B0=10^(-10), c0=10^(-10), d0=10^(-10), data=Mental)
fit.bayes[1:3,] # first three rows of the derived mcmc object

     (Intercept)       life         ses   sigma2
[1,]    26.66740 0.13160292 -0.09251524 23.22411
[2,]    30.17502 0.10860963 -0.12223697 20.57522
[3,]    27.04690 0.06959574 -0.06056731 17.20253

densplot(fit.bayes[,2], xlab="Life events effect", ylab="Posterior pdf") # Figure A6.4

# Plot of the posterior pdf for the variance (not shown):
densplot(fit.bayes[,4], xlab=expression(sigma^2))

library(coda)
U <- nrow(fit.bayes); L <- U-499
beta1.post <- mcmc(fit.bayes[,2][L:U])
traceplot(beta1.post, main="Trace of life events effect")

Matrix Formulation of Linear Models

Unbiasedness and variance:

The Hat Matrix and the Leverage

Matrix Formulation of Bayesian Normal Linear Model

Quick Summarization

Exercises