Linear Regression using Bayesian Methods

If we have \(k\) different predictor variables \(x_1,x_2,...,x_k\), what is the primary advantage of fitting a joint linear model (multiple regression \(E(y)=\beta_0+\beta_1 x_1+...+\beta_k x_k\)) over fitting \(k\) simple linear regressions (\(E(y)=\beta_0+\beta_j x_j\)), one for each predictor?

Answer: Each coefficient in the multiple regression model accounts for the presence of the other predictors whereas the coefficients in the simple linear regressions do not. Moreover, the coefficient \(\beta_1\) measures the change in \(E(y)\) due to \(x_1\) while holding \(x_j\) constant for all other \(j\ne1\).

Data Analysis

As an example of linear regression, we’ll look at the Leinhardt data from the car package in R. The Leinhardt data frame has 105 rows and 4 columns. The observations are nations of the world around 1970.

library("car")

## Loading required package: carData

data("Leinhardt")
head(Leinhardt)

##           income infant   region oil
## Australia   3426   26.7     Asia  no
## Austria     3350   23.7   Europe  no
## Belgium     3346   17.0   Europe  no
## Canada      4751   16.8 Americas  no
## Denmark     5029   13.5   Europe  no
## Finland     3312   10.1   Europe  no

str(Leinhardt)

## 'data.frame':    105 obs. of  4 variables:
##  $ income: int  3426 3350 3346 4751 5029 3312 3403 5040 2009 2298 ...
##  $ infant: num  26.7 23.7 17 16.8 13.5 10.1 12.9 20.4 17.8 25.7 ...
##  $ region: Factor w/ 4 levels "Africa","Americas",..: 3 4 4 2 4 4 4 4 4 4 ...
##  $ oil   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

pairs(Leinhardt)

We’ll start with a simple linear regression model that relates infant mortality to per capita income.

plot(infant ~ income, data=Leinhardt)

hist(Leinhardt$infant)

hist(Leinhardt$income)

Leinhardt$loginfant = log(Leinhardt$infant)
Leinhardt$logincome = log(Leinhardt$income)

plot(loginfant ~ logincome, data=Leinhardt)

Since infant mortality and per capita income are positive and right-skewed quantities, we consider modeling them on the logarithmic scale. A linear model appears much more appropriate on this scale.

Modeling

The reference Bayesian analysis (with a noninformative prior) is available directly in R.

lmod = lm(loginfant ~ logincome, data=Leinhardt)
summary(lmod)

## 
## Call:
## lm(formula = loginfant ~ logincome, data = Leinhardt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.66694 -0.42779 -0.02649  0.30441  3.08415 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.14582    0.31654  22.575   <2e-16 ***
## logincome   -0.51179    0.05122  -9.992   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6867 on 99 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.5021, Adjusted R-squared:  0.4971 
## F-statistic: 99.84 on 1 and 99 DF,  p-value: < 2.2e-16

Modeling in `JAGS`

Now we’ll fit this model in JAGS. A few countries have missing values, and for simplicity, we will omit those.

library("rjags")

## Loading required package: coda

## Linked to JAGS 4.3.2

## Loaded modules: basemod,bugs

dat = na.omit(Leinhardt)

mod1_string = " model {
    for (i in 1:n) {
        y[i] ~ dnorm(mu[i], prec)
        mu[i] = b[1] + b[2]*log_income[i] 
    }
    
    for (i in 1:2) {
        b[i] ~ dnorm(0.0, 1.0/1.0e6)
    }
    
    prec ~ dgamma(5/2.0, 5*10.0/2.0)
    sig2 = 1.0 / prec
    sig = sqrt(sig2)
} "

set.seed(72)
data1_jags = list(y=dat$loginfant, n=nrow(dat), 
              log_income=dat$logincome)

params1 = c("b", "sig")

inits1 = function() {
    inits = list("b"=rnorm(2,0.0,100.0), "prec"=rgamma(1,1.0,1.0))
}

mod1 = jags.model(textConnection(mod1_string), data=data1_jags, inits=inits1, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 101
##    Unobserved stochastic nodes: 3
##    Total graph size: 404
## 
## Initializing model

update(mod1, 1000) # burn-in

mod1_sim = coda.samples(model=mod1,
                        variable.names=params1,
                        n.iter=5000)

mod1_csim = do.call(rbind, mod1_sim) # combine multiple chains

MCMC Convergence

Before we check the inferences from the model, we should perform convergence diagnostics for our Markov chains.

plot(mod1_sim)

gelman.diag(mod1_sim)

## Potential scale reduction factors:
## 
##      Point est. Upper C.I.
## b[1]       1.01       1.01
## b[2]       1.01       1.01
## sig        1.00       1.00
## 
## Multivariate psrf
## 
## 1

autocorr.diag(mod1_sim)

##              b[1]       b[2]           sig
## Lag 0  1.00000000 1.00000000  1.0000000000
## Lag 1  0.95378238 0.95392639  0.0345558066
## Lag 5  0.78935313 0.78809598 -0.0006694157
## Lag 10 0.62483819 0.62201800  0.0138352803
## Lag 50 0.06282677 0.06550381  0.0061967222

autocorr.plot(mod1_sim)

effectiveSize(mod1_sim)

##       b[1]       b[2]        sig 
##   354.8369   353.6982 12945.2556

We can get a posterior summary of the parameters in our model.

summary(mod1_sim)

## 
## Iterations = 1001:6000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean      SD  Naive SE Time-series SE
## b[1]  7.1764 0.45204 0.0036909      0.0242032
## b[2] -0.5167 0.07320 0.0005977      0.0039241
## sig   0.9710 0.06784 0.0005539      0.0005994
## 
## 2. Quantiles for each variable:
## 
##         2.5%     25%     50%     75%   97.5%
## b[1]  6.2628  6.8801  7.1695  7.4712  8.0734
## b[2] -0.6635 -0.5644 -0.5162 -0.4688 -0.3694
## sig   0.8500  0.9231  0.9670  1.0145  1.1144

Don’t forget that these results are for a regression model relating the logarithm of infant mortality to the logarithm of income.

Residual checks

Checking residuals (the difference between the response and the model’s prediction for that value) is important with linear models since residuals can reveal violations of the assumptions we made to specify the model. In particular, we are looking for any sign that the model is not linear, normally distributed, or that the observations are not independent (conditional on covariates).

First, let’s look at what would have happened if we fit the reference linear model to the un-transformed variables.

lmod0 = lm(infant ~ income, data=Leinhardt)
plot(resid(lmod0)) # to check independence (looks okay)

plot(predict(lmod0), resid(lmod0)) # to check for linearity, constant variance (looks bad)

qqnorm(resid(lmod0)) # to check Normality assumption (we want this to be a straight line)

Now let’s return to our model fit to the log-transformed variables. In a Bayesian model, we have distributions for residuals, but we’ll simplify and look only at the residuals evaluated at the posterior mean of the parameters.

X = cbind(rep(1.0, data1_jags$n), data1_jags$log_income)
head(X)

##      [,1]     [,2]
## [1,]    1 8.139149
## [2,]    1 8.116716
## [3,]    1 8.115521
## [4,]    1 8.466110
## [5,]    1 8.522976
## [6,]    1 8.105308

(pm_params1 = colMeans(mod1_csim)) # posterior means

##       b[1]       b[2]        sig 
##  7.1763560 -0.5166638  0.9710010

yhat1 = drop(X %*% pm_params1[1:2])
resid1 = data1_jags$y - yhat1
plot(resid1) # residuals against data index

plot(yhat1, resid1) # residuals against predicted values

qqnorm(resid1) # checking normality of residuals

plot(predict(lmod), resid(lmod)) # to compare with reference linear model

rownames(dat)[order(resid1, decreasing=TRUE)[1:5]] # which countries have the largest positive residuals?

## [1] "Saudi.Arabia" "Libya"        "Zambia"       "Brazil"       "Afganistan"

The residuals look pretty good here (no patterns, shapes) except for two strong outliers, Saudi Arabia and Libya. When outliers appear, it is a good idea to double check that they are not just errors in data entry. If the values are correct, you may reconsider whether these data points really are representative of the data you are trying to model. If you conclude that they are not (for example, they were recorded on different years), you may be able to justify dropping these data points from the data set.

If you conclude that the outliers are part of data and should not be removed, we have several modeling options to accommodate them. We will address these in the next segment.

Dealing with Outliers

In the previous segment, we saw two outliers in the model relating the logarithm of infant mortality to the logarithm of income. Here we will discuss options for when we conclude that these outliers belong in the data set.

Additional covariates

The first approach is to look for additional covariates that may be able to explain the outliers. For example, there could be a number of variables that provide information about infant mortality above and beyond what income provides.

Looking back at our data, there are two variables we haven’t used yet: region and oil. The oil variable indicates oil-exporting countries. Both Saudi Arabia and Libya are oil-exporting countries, so perhaps this might explain part of the anomaly.

library("rjags")

mod2_string = " model {
    for (i in 1:length(y)) {
        y[i] ~ dnorm(mu[i], prec)
        mu[i] = b[1] + b[2]*log_income[i] + b[3]*is_oil[i]
    }
    
    for (i in 1:3) {
        b[i] ~ dnorm(0.0, 1.0/1.0e6)
    }
    
    prec ~ dgamma(5/2.0, 5*10.0/2.0)
    sig = sqrt( 1.0 / prec )
} "


set.seed(73)
data2_jags = list(y=dat$loginfant, log_income=dat$logincome,
                  is_oil=as.numeric(dat$oil=="yes"))
data2_jags$is_oil

##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

params2 = c("b", "sig")

inits2 = function() {
    inits = list("b"=rnorm(3,0.0,100.0), "prec"=rgamma(1,1.0,1.0))
}

mod2 = jags.model(textConnection(mod2_string), data=data2_jags, inits=inits2, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 101
##    Unobserved stochastic nodes: 4
##    Total graph size: 507
## 
## Initializing model

update(mod2, 1e3) # burn-in

mod2_sim = coda.samples(model=mod2,
                        variable.names=params2,
                        n.iter=5e3)

mod2_csim = as.mcmc(do.call(rbind, mod2_sim)) # combine multiple chains

As usual, check the convergence diagnostics.

plot(mod2_sim)

gelman.diag(mod2_sim)

## Potential scale reduction factors:
## 
##      Point est. Upper C.I.
## b[1]       1.01       1.03
## b[2]       1.01       1.03
## b[3]       1.00       1.00
## sig        1.00       1.00
## 
## Multivariate psrf
## 
## 1.01

autocorr.diag(mod2_sim)

##              b[1]       b[2]         b[3]         sig
## Lag 0  1.00000000 1.00000000  1.000000000 1.000000000
## Lag 1  0.95239764 0.95328566  0.074087346 0.022168459
## Lag 5  0.78512036 0.78650774  0.014945880 0.021408494
## Lag 10 0.61762067 0.61954906  0.001532449 0.009553919
## Lag 50 0.07855918 0.07817641 -0.004843572 0.007344331

autocorr.plot(mod2_sim)

effectiveSize(mod2_sim)

##       b[1]       b[2]       b[3]        sig 
##   360.4470   358.6838 12392.7380 13225.3798

We can get a posterior summary of the parameters in our model.

summary(mod2_sim)

## 
## Iterations = 1001:6000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean      SD  Naive SE Time-series SE
## b[1]  7.1633 0.44332 0.0036197      0.0232489
## b[2] -0.5250 0.07174 0.0005858      0.0037700
## b[3]  0.7939 0.35418 0.0028919      0.0031834
## sig   0.9524 0.06691 0.0005463      0.0005841
## 
## 2. Quantiles for each variable:
## 
##          2.5%     25%     50%     75%   97.5%
## b[1]  6.30387  6.8563  7.1652  7.4604  8.0323
## b[2] -0.66618 -0.5730 -0.5256 -0.4757 -0.3865
## b[3]  0.09896  0.5563  0.7939  1.0269  1.4943
## sig   0.83228  0.9057  0.9484  0.9958  1.0938

It looks like there is a positive relationship between oil-production and log-infant mortality. Because these data are merely observational, we cannot say that oil-production causes an increase in infant mortality (indeed that most certainly isn’t the case), but we can say that they are positively correlated.

Now let’s check the residuals.

X2 = cbind(rep(1.0, data1_jags$n), data2_jags$log_income, data2_jags$is_oil)
head(X2)

##      [,1]     [,2] [,3]
## [1,]    1 8.139149    0
## [2,]    1 8.116716    0
## [3,]    1 8.115521    0
## [4,]    1 8.466110    0
## [5,]    1 8.522976    0
## [6,]    1 8.105308    0

(pm_params2 = colMeans(mod2_csim)) # posterior mean

##       b[1]       b[2]       b[3]        sig 
##  7.1633185 -0.5250452  0.7938880  0.9524315

yhat2 = drop(X2 %*% pm_params2[1:3])
resid2 = data2_jags$y - yhat2
plot(resid2) # residuals against data index

plot(yhat2, resid2) # residuals against predicted values

plot(yhat1, resid1) # revisit residuals from the first model

sd(resid2) # standard deviation of residuals

## [1] 0.6488911

These look much better, although the residuals for Saudi Arabia and Libya are still more than three standard deviations away from the mean of the residuals. We might consider adding the other covariate region, but instead let’s look at another option when we are faced with strong outliers.

\(t\) likelihood

Let’s consider changing the likelihood. The normal likelihood has thin tails (almost all of the probability is concentrated within the first few standard deviations from the mean). This does not accommodate outliers well. Consequently, models with the normal likelihood might be overly-influenced by outliers. Recall that the \(t\) distribution is similar to the normal distribution, but it has thicker tails which can accommodate outliers.

The \(t\) linear model might look something like this. Notice that the \(t\) distribution has three parameters, including a positive “degrees of freedom” parameter. The smaller the degrees of freedom, the heavier the tails of the distribution. We might fix the degrees of freedom to some number, or we can assign it a prior distribution.

mod3_string = " model {
    for (i in 1:length(y)) {
        y[i] ~ dt( mu[i], tau, df )
        mu[i] = b[1] + b[2]*log_income[i] + b[3]*is_oil[i]
    }
    
    for (i in 1:3) {
        b[i] ~ dnorm(0.0, 1.0/1.0e6)
    }
    
    df = nu + 2.0 # we want degrees of freedom > 2 to guarantee existence of mean and variance
    nu ~ dexp(1.0)
    
    tau ~ dgamma(5/2.0, 5*10.0/2.0) # tau is close to, but not equal to the precision
    sig = sqrt( 1.0 / tau * df / (df - 2.0) ) # standard deviation of errors
} "

We will leave it up to you to fit this model.

Compare models using Deviance Information Criterion

We have now proposed three different models. How do we compare their performance on our data? In the previous course, we discussed estimating parameters in models using the maximum likelihood method. Similarly, we can choose between competing models using the same idea.

We will use a quantity known as the deviance information criterion (DIC). It essentially calculates the posterior mean of the log-likelihood and adds a penalty for model complexity.

Let’s calculate the DIC for our first two models:

Simple linear regression on log-income:

dic.samples(mod1, n.iter=1e3)

## Mean deviance:  231.5 
## penalty 3.05 
## Penalized deviance: 234.5

Second model where we add oil production:

dic.samples(mod2, n.iter=1e3)

## Mean deviance:  225.1 
## penalty 3.957 
## Penalized deviance: 229.1

The first number is the Monte Carlo estimated posterior mean deviance, which equals \(−2 \space\times\) the log-likelihood (plus a constant that will be irrelevant for comparing models). Because of that \(−2\) factor, a smaller deviance means a higher likelihood.

Next, we are given a penalty for the complexity of our model. This penalty is necessary because we can always increase the likelihood of the model by making it more complex to fit the data exactly. We don’t want to do this because over-fit models generalize poorly. This penalty is roughly equal to the effective number of parameters in your model. You can see this here. With the first model, we had a variance parameter and two betas, for a total of three parameters. In the second model, we added one more beta for the oil effect.

We add these two quantities to get the DIC (the last number). The better-fitting model has a lower DIC value. In this case, the gains we receive in deviance by adding the is_oil covariate outweigh the penalty for adding an extra parameter. The final DIC for the second model is lower than for the first, so we would prefer using the second model.

We encourage you to explore different model specifications and compare their fit to the data using DIC. Wikipedia provides a good introduction to DIC and we can find more details about the JAGS implementation through the rjags package documentation by entering ?dic.samples in the R console.

Quizes

library("car")  # load the 'car' package
data("Anscombe")  # load the data set
?Anscombe  # read a description of the data
head(Anscombe)  # look at the first few lines of the data

##    education income young urban
## ME       189   2824 350.7   508
## NH       169   3259 345.9   564
## VT       230   3072 348.5   322
## MA       168   3835 335.3   846
## RI       180   3549 327.1   871
## CT       193   4256 341.0   774

pairs(Anscombe)  # scatter plots for each pair of variables

education_model <- lm(education~income+young+urban,data=Anscombe)
summary(education_model)

## 
## Call:
## lm(formula = education ~ income + young + urban, data = Anscombe)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -60.240 -15.738  -1.156  15.883  51.380 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.868e+02  6.492e+01  -4.418 5.82e-05 ***
## income       8.065e-02  9.299e-03   8.674 2.56e-11 ***
## young        8.173e-01  1.598e-01   5.115 5.69e-06 ***
## urban       -1.058e-01  3.428e-02  -3.086  0.00339 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.69 on 47 degrees of freedom
## Multiple R-squared:  0.6896, Adjusted R-squared:  0.6698 
## F-statistic: 34.81 on 3 and 47 DF,  p-value: 5.337e-12

library("rjags")

mod_string = " model {
    for (i in 1:length(education)) {
        education[i] ~ dnorm(mu[i], prec)
        mu[i] = b0 + b[1]*income[i] + b[2]*young[i] + b[3]*urban[i]
    }
    
    b0 ~ dnorm(0.0, 1.0/1.0e6)
    for (i in 1:3) {
        b[i] ~ dnorm(0.0, 1.0/1.0e6)
    }
    
    prec ~ dgamma(1.0/2.0, 1.0*1500.0/2.0)
        ## Initial guess of variance based on overall
        ## variance of education variable. Uses low prior
        ## effective sample size. Technically, this is not
        ## a true 'prior', but it is not very informative.
    sig2 = 1.0 / prec
    sig = sqrt(sig2)
} "

data_jags = as.list(Anscombe)

params1 = c("b0","b", "sig")

inits1 = function() {
    inits = list("b"=rnorm(3,0.0,100.0), "prec"=rgamma(1,1.0,1.0))
}

mod1 = jags.model(textConnection(mod_string), data=data_jags, inits=inits1, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 51
##    Unobserved stochastic nodes: 5
##    Total graph size: 422
## 
## Initializing model

update(mod1, 1000) # burn-in

mod1_sim = coda.samples(model=mod1,
                        variable.names=params1,
                        n.iter=5000)

mod1_csim = do.call(rbind, mod1_sim) # combine multiple chains

plot(mod1_sim)

gelman.diag(mod1_sim)

## Potential scale reduction factors:
## 
##      Point est. Upper C.I.
## b[1]       1.03       1.12
## b[2]       1.06       1.15
## b[3]       1.05       1.17
## b0         1.05       1.13
## sig        1.00       1.01
## 
## Multivariate psrf
## 
## 1.07

autocorr.diag(mod1_sim)

##             b[1]      b[2]      b[3]        b0        sig
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.00000000
## Lag 1  0.9831520 0.9936467 0.9720950 0.9948872 0.08072946
## Lag 5  0.9206416 0.9699671 0.8742759 0.9749790 0.05177640
## Lag 10 0.8506233 0.9433945 0.7765704 0.9517076 0.02808404
## Lag 50 0.5452206 0.7756046 0.3883893 0.7960094 0.01645752

autocorr.plot(mod1_sim)

effectiveSize(mod1_sim)

##       b[1]       b[2]       b[3]         b0        sig 
##  118.78532   45.70286  198.69623   38.43828 5231.71333

plot(education_model)

dic.samples(mod1, n.iter=1e5)

## Mean deviance:  480.9 
## penalty 5.226 
## Penalized deviance: 486.2

library("rjags")

mod_string_question_4a = " model {
    for (i in 1:length(education)) {
        education[i] ~ dnorm(mu[i], prec)
        mu[i] = b0 + b[1]*income[i] + b[2]*young[i]
    }
    
    b0 ~ dnorm(0.0, 1.0/1.0e6)
    for (i in 1:2) {
        b[i] ~ dnorm(0.0, 1.0/1.0e6)
    }
    
    prec ~ dgamma(1.0/2.0, 1.0*1500.0/2.0)
        ## Initial guess of variance based on overall
        ## variance of education variable. Uses low prior
        ## effective sample size. Technically, this is not
        ## a true 'prior', but it is not very informative.
    sig2 = 1.0 / prec
    sig = sqrt(sig2)
} "

data_jags = as.list(Anscombe)

params1 = c("b0","b", "sig")

inits1 = function() {
    inits = list("b"=rnorm(2,0.0,100.0), "prec"=rgamma(1,1.0,1.0))
}

mod_question_4a = jags.model(textConnection(mod_string_question_4a), data=data_jags, inits=inits1, n.chains=3)

## Warning in jags.model(textConnection(mod_string_question_4a), data = data_jags,
## : Unused variable "urban" in data

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 51
##    Unobserved stochastic nodes: 4
##    Total graph size: 320
## 
## Initializing model

update(mod_question_4a, 1000) # burn-in

mod_question_4a_sim = coda.samples(model=mod_question_4a,
                        variable.names=params1,
                        n.iter=5000)

mod_question_4a_csim = do.call(rbind, mod_question_4a_sim) # combine multiple chains

plot(mod_question_4a_sim)

gelman.diag(mod_question_4a_sim)

## Potential scale reduction factors:
## 
##      Point est. Upper C.I.
## b[1]       1.01       1.04
## b[2]       1.11       1.32
## b0         1.11       1.33
## sig        1.00       1.01
## 
## Multivariate psrf
## 
## 1.08

autocorr.diag(mod_question_4a_sim)

##             b[1]      b[2]        b0        sig
## Lag 0  1.0000000 1.0000000 1.0000000 1.00000000
## Lag 1  0.9681978 0.9936505 0.9950037 0.05120861
## Lag 5  0.8491241 0.9691617 0.9747177 0.02312660
## Lag 10 0.7239999 0.9401533 0.9497977 0.02793077
## Lag 50 0.2575059 0.7439478 0.7533292 0.01041833

autocorr.plot(mod_question_4a_sim)

effectiveSize(mod_question_4a_sim)

##       b[1]       b[2]         b0        sig 
##  242.88985   47.76755   38.91240 7510.24763

dic.samples(mod_question_4a, n.iter=1e5)

## Mean deviance:  489.2 
## penalty 4.159 
## Penalized deviance: 493.4

library("rjags")

mod_string_question_4b = " model {
    for (i in 1:length(education)) {
        education[i] ~ dnorm(mu[i], prec)
        mu[i] = b0 + b[1]*income[i] + b[2]*young[i]+ b[3]*young[i]*income[i]
    }
    
    b0 ~ dnorm(0.0, 1.0/1.0e6)
    for (i in 1:3) {
        b[i] ~ dnorm(0.0, 1.0/1.0e6)
    }
    
    prec ~ dgamma(1.0/2.0, 1.0*1500.0/2.0)
        ## Initial guess of variance based on overall
        ## variance of education variable. Uses low prior
        ## effective sample size. Technically, this is not
        ## a true 'prior', but it is not very informative.
    sig2 = 1.0 / prec
    sig = sqrt(sig2)
} "

data_jags = as.list(Anscombe)

params1 = c("b0","b", "sig")

inits1 = function() {
    inits = list("b"=rnorm(3,0.0,100.0), "prec"=rgamma(1,1.0,1.0))
}

mod_question_4b = jags.model(textConnection(mod_string_question_4b), data=data_jags, inits=inits1, n.chains=3)

## Warning in jags.model(textConnection(mod_string_question_4b), data = data_jags,
## : Unused variable "urban" in data

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 51
##    Unobserved stochastic nodes: 5
##    Total graph size: 372
## 
## Initializing model

update(mod_question_4b, 1000) # burn-in

mod_question_4b_sim = coda.samples(model=mod_question_4b,
                        variable.names=params1,
                        n.iter=5000)

mod_question_4b_csim = do.call(rbind, mod_question_4b_sim) # combine multiple chains

plot(mod_question_4b_sim)

gelman.diag(mod_question_4b_sim)

## Potential scale reduction factors:
## 
##      Point est. Upper C.I.
## b[1]      15.91       41.9
## b[2]      16.31       44.0
## b[3]      15.49       40.7
## b0        16.63       45.8
## sig        1.65        2.6
## 
## Multivariate psrf
## 
## 12.7

autocorr.diag(mod_question_4b_sim)

##             b[1]      b[2]      b[3]        b0       sig
## Lag 0  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 1  0.9953131 0.9965044 0.9956522 0.9962596 0.4630612
## Lag 5  0.9772698 0.9837085 0.9791110 0.9825277 0.4400860
## Lag 10 0.9571882 0.9696384 0.9588185 0.9675369 0.4223150
## Lag 50 0.8206555 0.8864046 0.8130327 0.8717996 0.2965933

autocorr.plot(mod_question_4b_sim)

effectiveSize(mod_question_4b_sim)

##      b[1]      b[2]      b[3]        b0       sig 
##  35.23262  23.44485  32.34437  25.48249 392.31325

dic.samples(mod_question_4b, n.iter=1e5)

## Mean deviance:  487.1 
## penalty 5.186 
## Penalized deviance: 492.3

Linear Regression using Bayesian Methods

Ken Wood

2023-07-25

Data Analysis

Modeling

Modeling in `JAGS`

MCMC Convergence

Residual checks

Dealing with Outliers

Additional covariates

\(t\) likelihood

Compare models using Deviance Information Criterion

Quizes

Linear Regression using Bayesian Methods

Ken Wood

2023-07-25

Data Analysis

Modeling

Modeling in JAGS

MCMC Convergence

Residual checks

Dealing with Outliers

Additional covariates

\(t\) likelihood

Compare models using Deviance Information Criterion

Quizes

Modeling in `JAGS`