Logistic Regression Analysis

To study the effect of volume (x1) and rate (x2) of air inspired by human subjects on the occurrence or not (Y = 1 or 0) of transient vasoconstriction response in the skin of the fingers, 39 observations on these variables were obtained in a laboratory. The data is given below.

x1 <- c(3.7,3.5,1.25,0.75,.8,.7,.6,1.1,.9,.9,.8,.55,.6,1.4,0.75,2.3, 3.2,0.85,1.7,1.8,0.4,0.95,1.35,1.5,1.6,.6,1.8,0.95,1.9,1.6, 2.7,2.35,1.1,1.1,1.2,.8,.95,.75,1.3)

x2 <- c(.83,1.09,2.5,1.5,3.2,3.5,.75,1.7,.75,.45,.57,2.75,3,2.33,3.75, 1.64,1.6,1.42,1.06,1.8,2,1.36,1.35,1.36,1.78,1.5,1.5, 1.9,.95,.4,.75,.03,1.83,2.2,2,3.33,1.9,1.9,1.63)

y <- c(1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1,0,0,0,0,1,0,1,0,1,0,1, 0,0,1,1,1,0,0,1)

(a) Fit a logistic regression model

data <- data.frame(y, x1, x2)
fit <- glm(y~., data = data, family = binomial)
summary(fit)
## 
## Call:
## glm(formula = y ~ ., family = binomial, data = data)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -9.5536     3.2419  -2.947  0.00321 **
## x1            3.8907     1.4319   2.717  0.00658 **
## x2            2.6561     0.9166   2.898  0.00376 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 54.040  on 38  degrees of freedom
## Residual deviance: 29.733  on 36  degrees of freedom
## AIC: 35.733
## 
## Number of Fisher Scoring iterations: 6

All three variables are significant and the dispersion parameter is taken to be 1.

(b) Perform diagnostics

anova(fit)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: y
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                    38     54.040              
## x1    1   7.0505        37     46.989  0.007925 ** 
## x2    1  17.2560        36     29.733 3.267e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-values correspond to chi-squared tests of the hierarchical models, starting from a null model with only the intercept, then adding x1 to it, eventually resulting in the model with intercept + x1 + x2. The chi-squared tests are comparing the residual deviance with the corresponding limiting \(\chi^2\) distribution. Both variables are significant as suggested by the chi-squared test.

Next, we consider Cook’s distance to detect the effect of deleting an observation in the fitted model. It gives indication of the influence of each observation to the fit; values larger than 1 are usually problematic.

plot(cooks.distance(fit))

From the plot of Cook’s distance, there are no influential points.

(c) Interpret results

The odds of vasoconstriction increase exp(3.89) times per unit of volume and exp(2.66) times per unit of rate of inspired air.

(d) Fit the models with other possible link functions and compare the fitted models

fit2 <- glm(y~x1+x2, family=binomial(link="probit"))

fit3 <- glm(y~x1+x2, family=binomial(link="cloglog"))

c(AIC(fit), AIC(fit2), AIC(fit3))
## [1] 35.73341 35.97487 33.39087

From AIC values, the cloglog link gives the best performance.