Logistic Regression Analysis
x1 <- c(3.7,3.5,1.25,0.75,.8,.7,.6,1.1,.9,.9,.8,.55,.6,1.4,0.75,2.3, 3.2,0.85,1.7,1.8,0.4,0.95,1.35,1.5,1.6,.6,1.8,0.95,1.9,1.6, 2.7,2.35,1.1,1.1,1.2,.8,.95,.75,1.3)
x2 <- c(.83,1.09,2.5,1.5,3.2,3.5,.75,1.7,.75,.45,.57,2.75,3,2.33,3.75, 1.64,1.6,1.42,1.06,1.8,2,1.36,1.35,1.36,1.78,1.5,1.5, 1.9,.95,.4,.75,.03,1.83,2.2,2,3.33,1.9,1.9,1.63)
y <- c(1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1,0,0,0,0,1,0,1,0,1,0,1, 0,0,1,1,1,0,0,1)
(a) Fit a logistic regression model
data <- data.frame(y, x1, x2)
fit <- glm(y~., data = data, family = binomial)
summary(fit)
##
## Call:
## glm(formula = y ~ ., family = binomial, data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.5536 3.2419 -2.947 0.00321 **
## x1 3.8907 1.4319 2.717 0.00658 **
## x2 2.6561 0.9166 2.898 0.00376 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 54.040 on 38 degrees of freedom
## Residual deviance: 29.733 on 36 degrees of freedom
## AIC: 35.733
##
## Number of Fisher Scoring iterations: 6
All three variables are significant and the dispersion parameter is taken to be 1.
(b) Perform diagnostics
anova(fit)
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: y
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 38 54.040
## x1 1 7.0505 37 46.989 0.007925 **
## x2 1 17.2560 36 29.733 3.267e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-values correspond to chi-squared tests of the hierarchical models, starting from a null model with only the intercept, then adding x1 to it, eventually resulting in the model with intercept + x1 + x2. The chi-squared tests are comparing the residual deviance with the corresponding limiting \(\chi^2\) distribution. Both variables are significant as suggested by the chi-squared test.
Next, we consider Cook’s distance to detect the effect of deleting an observation in the fitted model. It gives indication of the influence of each observation to the fit; values larger than 1 are usually problematic.
plot(cooks.distance(fit))
From the plot of Cook’s distance, there are no influential points.
(c) Interpret results
The odds of vasoconstriction increase exp(3.89) times per unit of volume and exp(2.66) times per unit of rate of inspired air.
(d) Fit the models with other possible link functions and compare the fitted models
fit2 <- glm(y~x1+x2, family=binomial(link="probit"))
fit3 <- glm(y~x1+x2, family=binomial(link="cloglog"))
c(AIC(fit), AIC(fit2), AIC(fit3))
## [1] 35.73341 35.97487 33.39087
From AIC values, the cloglog link gives the best performance.