Stat Computing Assignment

Loading packages

library(mcsm)
library(dplyr)
library(ggplot2)
library(rjags)
library(coda)
library(caret)

Obtaining the challenger data

Challenger data is obtained from MCSM package. We inspect the data.

data("challenger")
head(challenger)

  oring temp
1     1   53
2     1   57
3     1   58
4     1   63
5     0   66
6     0   67

And we make a plot.

ggplot(challenger, aes(x= temp, y= oring)) + geom_point()

Classical Logistic Regression

Oring failures seem to happen at lower temperatures and not in higher temperatures. So this data is suitable for Logistic Regression. We can try to fit sigmoid function to it.

Before fitting model we scale and center the explanatory variable templerature.

dat <- challenger
dat$temp <- scale(dat$temp, center = T, scale = T)
mod <- glm(oring~temp, data= dat, family = binomial)
summary(mod)$coeff

             Estimate Std. Error   z value   Pr(>|z|)
(Intercept) -1.107550  0.5796179 -1.910828 0.05602668
temp        -1.638391  0.7638331 -2.144959 0.03195610

So we obtain negative estimate for the slope coefficient as expected and it is significant at the 5% level.

We can see the performance of this model on the given dataset by predicting the outcome.The accuracy is 86 percent.

probs <- predict(mod, dat, type = "response")
pred <- ifelse(probs > 0.5, 1, 0)
(mean(pred == dat$oring)*100)%>%
        round(1) %>% 
        format(1) %>% 
        paste("%") %>% 
        cat(sep="\n")

87 %

Bayesian paradigm

In bayesian paradigm we obtain estimates of the intercept and slope coefficient of the Linear Model by sampling from the posterior via MCMC technique. For this we use JAGS. The model that we propose is Laplace prior for the slope coefficient and non informative prior (Normal distribution with large variance) for the intercept term.

dat <- challenger
X <- scale(dat[,-1], center=TRUE, scale=TRUE)

mod1_string <- "model {
  for (i in 1:length(y)){
    y[i] ~ dbern(p[i])
    logit(p[i]) = b0 + b1[1]*temp[i] 
  }
    b0 ~ dnorm(0.0, 1.0/25.0)
    b1[1] ~ ddexp(0.0, sqrt(2.0))
}"

To simulate from the posterior we take 5000 iterations in the Metropolis Hastings algorithm.

set.seed(123)

data_jags = list(y=dat$oring, temp=X[,1])
params = c("b0", "b1")

mod1 = jags.model(textConnection(mod1_string), data=data_jags, n.chains=3)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 23
   Unobserved stochastic nodes: 2
   Total graph size: 102

Initializing model

update(mod1, 1e3)

mod1_sim <- coda.samples(model=mod1, variable.names=params, n.iter=5e3)

mod1_csim = as.mcmc(do.call(rbind, mod1_sim))

Now we plot the traceplots and the posterior densities of the the slope (b1) and intercept (b0). We can see that most of the probability mass is below zero for both the slope and intercept as seen in the classical logistic regression.

The MCMC has considered 2000 iterations for burn-in.

plot(mod1_sim, ask = TRUE)

Here the slope coefficient has estimate of -1.25 which is little different from -1.6 obtained in classical logistic regression.

mean(unlist(as.vector(mod1_sim[,2]))) %>%
  round(2) %>% 
        format(2) %>%
  paste("Posterior mean of b1 =",.)%>% cat(sep = "\n")

Posterior mean of b1 = -1.24

Stat Computing Assignment

Shirshendu Mukherji

07/12/2020

Loading packages

Obtaining the challenger data

Classical Logistic Regression

Bayesian paradigm

End