Simulate linear regression

Andrew Ellis
12/09/2013

Generate predictor variable

Let's create a predictor variable \( x \), taking values from 0 to 5:

x <- seq(from = 0, 
         to = 5, 
         by = 0.2)
N <- length(x)

Add some noise

In our model, we assume that the response variable \( y \) depends linearly on \( x \), but with some added noise. Let's call the noise \( eps \).

\( eps \) will be drawn from a normal distribution with mean 0 and standard deviation 1:

\[ eps \sim N(0, 1) \]

eps <- rnorm(n = N, 
             mean = 0, 
             sd = 1)

Define intercept and slope

Now we need to define an intercept \( b.0 \) and a slope \( b.1 \). These are values that we will want to recover using Bayesian estimation:

b.0 <- 2
b.1 <- 0.75

Create repsonse varaible

Now we can combine all the above to create our response variable \( y \):

y <- b.0 + b.1 * x + eps

Plot y as a function of x

plot(x, y)

plot of chunk unnamed-chunk-5

Standard regression using lm()

First, let's perform a standard linear regression using lm():

fit <- lm(y ~ x)

We can obtain the parameter estimates using the coef() function:

coefs.lm <- coef(fit)
coefs.lm

(Intercept)           x 
     1.6004      0.8659

Standard regression using lm()

We can also obtain the confidence intervals:

confint(fit)

             2.5 % 97.5 %
(Intercept) 1.0393  2.162
x           0.6734  1.058

Estimate parameters using Bayesian estimation:

Now let's try to recover the parameters using Bayesian estimation:

First we need to load the R2jags package:

libs <- c("R2jags", "ggmcmc")
sapply(libs, require, character.only = TRUE)

R2jags ggmcmc 
  TRUE   TRUE

Jags model

Next, we define a model:

model.jags <- "
model {
    for (i in 1:N) {
        # likelihood
        y[i] ~ dnorm(mu[i], tau)
        mu[i] <- b.0 + b.1 * x[i]
        }

    # priors
    b.0 ~ dnorm(0.0, 1.0E-6)
    b.1 ~ dnorm(0.0, 1.0E-6)
    # sigma <- 1/sqrt(tau)
    tau ~ dgamma(1.0E-3, 1.0E-3)
}
"

Data

Now we need to pass the data to Jags in a list:

data.jags <- list(x = x, y = y, N = N)

Initial values

inits.jags <- function() {
    list(b.0 = rnorm(n=1, mean=0, sd=1),
         b.1 = rnorm(n=1, mean=0, sd=1))
}

Parameters to be monitored

parameters <- c("b.0", "b.1")

Call Jags

samples <- jags(data = data.jags,
                param = parameters,
                inits = inits.jags,
                model.file = textConnection(model.jags),
                n.chains = 1, 
                n.iter = 1000, 
                n.burnin = 10, 
                n.thin = 1,
                DIC = T)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 111

Initializing model

Diagnostics

samples.mcmc <- as.mcmc(samples)
samples.ggs <- ggs(samples.mcmc)

Retrieve parameters estimates


coefs.jags <- data.frame(b.0 = samples$BUGSoutput$sims.list$b.0, b.1 = samples$BUGSoutput$sims.list$b.1)

sapply(coefs.jags, mean)

  b.0   b.1 
1.607 0.864

Plot histogram

ggs_histogram(samples.ggs, family = "^b")

plot of chunk unnamed-chunk-17

Plot density

ggs_density(samples.ggs, family = "^b")

plot of chunk unnamed-chunk-18