Binomial Regression

Binomial Regression Model

For when response variable is binomially distributed (probabilities of k successes on n trials.)

for binomial response we need 2 sets of info about the response

y: successes n: number of trials

OR n-y: number of failures

Example: Challenger Disaster

This is a dataset examining the failure of the O-rings (important component of a space shuttle) under certain conditions

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(faraway)

Linking function in Binomial Regression

The link function is how the predictors are related to the response variable.

In binomial regression, the three common link functions are Logit, Probit and Complimentary Log-Log

They can be used in the GLM model to define how the response fits the predictor.

Logit link model

logitmod<-glm(cbind(damage,6-damage)~temp, family=binomial(), data=orings)
summary(logitmod)

## 
## Call:
## glm(formula = cbind(damage, 6 - damage) ~ temp, family = binomial(), 
##     data = orings)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9529  -0.7345  -0.4393  -0.2079   1.9565  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 11.66299    3.29626   3.538 0.000403 ***
## temp        -0.21623    0.05318  -4.066 4.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 38.898  on 22  degrees of freedom
## Residual deviance: 16.912  on 21  degrees of freedom
## AIC: 33.675
## 
## Number of Fisher Scoring iterations: 6

Probit link model

probitmod<-glm(cbind(damage,6-damage)~temp, family=binomial(link=probit),orings)
summary(probitmod)

## 
## Call:
## glm(formula = cbind(damage, 6 - damage) ~ temp, family = binomial(link = probit), 
##     data = orings)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0134  -0.7761  -0.4467  -0.1581   1.9983  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  5.59145    1.71055   3.269  0.00108 ** 
## temp        -0.10580    0.02656  -3.984 6.79e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 38.898  on 22  degrees of freedom
## Residual deviance: 18.131  on 21  degrees of freedom
## AIC: 34.893
## 
## Number of Fisher Scoring iterations: 6

Note how the difference in the link functions changes the dynamics of the fit to the data

plot (damage/6 ~ temp, orings, xlim=c(25,85),
ylim=c(0,1),
 xlab="Temperature", ylab="Prob of damage")
x <- seq(25,85,1)
lines(x,ilogit(11.6630−0.2162*x))
lines(x, pnorm(5.5915-0.1058*x), lty=2)

Inference

Much like Poisson or Logistic regression. Models are compared to a saturated model (i.e. a perfect fit to the data) as opposed to a null model. This changes the way we interpret our p-values

pchisq(deviance(logitmod), df.residual(logitmod),lower=FALSE)

## [1] 0.7164099

recall:

H_0: our model (less complex)

H_A: saturated model (more complex)

since the P-value is HIGH, we say that it fits the data well

Confidence Intervals

Another method of interpreting our model. If the confidence interval includes zero, then you could reject the model and build something else

library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

confint(logitmod)

## Waiting for profiling to be done...

##                 2.5 %    97.5 %
## (Intercept)  5.575195 18.737598
## temp        -0.332657 -0.120179

Binomial Regression

2022-03-30