For when response variable is binomially distributed (probabilities of k successes on n trials.)
for binomial response we need 2 sets of info about the response
y: successes n: number of trials
OR n-y: number of failures
This is a dataset examining the failure of the O-rings (important component of a space shuttle) under certain conditions
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(faraway)
The link function is how the predictors are related to the response variable.
In binomial regression, the three common link functions are Logit, Probit and Complimentary Log-Log
They can be used in the GLM model to define how the response fits the predictor.
logitmod<-glm(cbind(damage,6-damage)~temp, family=binomial(), data=orings)
summary(logitmod)
##
## Call:
## glm(formula = cbind(damage, 6 - damage) ~ temp, family = binomial(),
## data = orings)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9529 -0.7345 -0.4393 -0.2079 1.9565
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 11.66299 3.29626 3.538 0.000403 ***
## temp -0.21623 0.05318 -4.066 4.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 38.898 on 22 degrees of freedom
## Residual deviance: 16.912 on 21 degrees of freedom
## AIC: 33.675
##
## Number of Fisher Scoring iterations: 6
probitmod<-glm(cbind(damage,6-damage)~temp, family=binomial(link=probit),orings)
summary(probitmod)
##
## Call:
## glm(formula = cbind(damage, 6 - damage) ~ temp, family = binomial(link = probit),
## data = orings)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0134 -0.7761 -0.4467 -0.1581 1.9983
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.59145 1.71055 3.269 0.00108 **
## temp -0.10580 0.02656 -3.984 6.79e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 38.898 on 22 degrees of freedom
## Residual deviance: 18.131 on 21 degrees of freedom
## AIC: 34.893
##
## Number of Fisher Scoring iterations: 6
Note how the difference in the link functions changes the dynamics of the fit to the data
plot (damage/6 ~ temp, orings, xlim=c(25,85),
ylim=c(0,1),
xlab="Temperature", ylab="Prob of damage")
x <- seq(25,85,1)
lines(x,ilogit(11.6630−0.2162*x))
lines(x, pnorm(5.5915-0.1058*x), lty=2)
Much like Poisson or Logistic regression. Models are compared to a saturated model (i.e. a perfect fit to the data) as opposed to a null model. This changes the way we interpret our p-values
pchisq(deviance(logitmod), df.residual(logitmod),lower=FALSE)
## [1] 0.7164099
recall:
H_0: our model (less complex)
H_A: saturated model (more complex)
since the P-value is HIGH, we say that it fits the data well
Another method of interpreting our model. If the confidence interval includes zero, then you could reject the model and build something else
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
confint(logitmod)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 5.575195 18.737598
## temp -0.332657 -0.120179