Logistic regression 1: Prepare data/specify model/read results

References

An R Companion to Applied Regression John Fox http://socserv.mcmaster.ca/jfox/Books/Companion/index.html
Logistic Regssion: A Self-Learning Text (3rd). Kleinbaum et al. http://www.sph.emory.edu/~dkleinb/logreg2.htm

Dataset: low birthweight dataset

http://www.umass.edu/statdata/statdata/data/lowbwt.txt

SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition. These data are copyrighted by John Wiley & Sons Inc. and must be acknowledged and used accordingly. Data were collected at Baystate Medical Center, Springfield, Massachusetts during 1986.

Load dataset

## Fix cases 10 and 39 to make them identical to Dr. Orav's dataset
lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4)
lbw[c(10,39),"BWT"] <- c(2655,3035)

## Change variable names to lower cases
names(lbw) <- tolower(names(lbw))
head(lbw)

  id low age lwt race smoke ptl ht ui ftv  bwt
1 85   0  19 182    2     0   0  0  1   0 2523
2 86   0  33 155    3     0   0  0  0   3 2551
3 87   0  20 105    1     1   0  0  0   1 2557
4 88   0  21 108    1     1   0  0  1   2 2594
5 89   0  18 107    1     1   0  0  1   0 2600
6 91   0  21 124    3     0   0  0  0   0 2622

Recode data

## Recoding
lbw <- within(lbw, {

    ## Relabel race
    race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

    ## Categorize ftv (frequency of visit)
    ftv.cat  <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
    ftv.cat  <- relevel(ftv.cat, ref = "Normal")

    ## Dichotomize ptl
    preterm  <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))

    ## Categorize smoke ht ui
    smoke    <- factor(smoke, levels = 0:1, labels = c("No","Yes"))
    ht       <- factor(ht, levels = 0:1, labels = c("No","Yes"))
    ui       <- factor(ui, levels = 0:1, labels = c("No","Yes"))
})

Fit a model

logit.full <- glm(low ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, family = binomial)
logit.null <- glm(low ~ 1, data = lbw, family = binomial)

See result object

logit.full


Call:  glm(formula = low ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + 
    preterm, family = binomial, data = lbw)

Coefficients:
  (Intercept)            age            lwt       smokeYes          htYes          uiYes    ftv.catNone  
       0.4461        -0.0355        -0.0150         0.7681         1.8067         0.7252         0.2582  
  ftv.catMany  race.catBlack  race.catOther      preterm1+  
       0.8033         1.1906         0.7530         1.2528  

Degrees of Freedom: 188 Total (i.e. Null);  178 Residual
Null Deviance:      235 
Residual Deviance: 196  AIC: 218

See summary

summary(logit.full)


Call:
glm(formula = low ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + 
    preterm, family = binomial, data = lbw)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.684  -0.810  -0.502   0.816   2.187  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)   
(Intercept)    0.44611    1.26160    0.35   0.7236   
age           -0.03551    0.03842   -0.92   0.3553   
lwt           -0.01497    0.00697   -2.15   0.0317 * 
smokeYes       0.76813    0.42380    1.81   0.0699 . 
htYes          1.80673    0.71053    2.54   0.0110 * 
uiYes          0.72520    0.46427    1.56   0.1183   
ftv.catNone    0.25824    0.39945    0.65   0.5180   
ftv.catMany    0.80332    0.71606    1.12   0.2619   
race.catBlack  1.19059    0.53674    2.22   0.0265 * 
race.catOther  0.75304    0.45949    1.64   0.1012   
preterm1+      1.25279    0.46763    2.68   0.0074 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 234.67  on 188  degrees of freedom
Residual deviance: 195.50  on 178  degrees of freedom
AIC: 217.5

Number of Fisher Scoring iterations: 4

Odds ratios

exp(coef(logit.full))

  (Intercept)           age           lwt      smokeYes         htYes         uiYes   ftv.catNone   ftv.catMany 
       1.5622        0.9651        0.9851        2.1557        6.0905        2.0651        1.2946        2.2329 
race.catBlack race.catOther     preterm1+ 
       3.2890        2.1234        3.5001

Confidence intervals

confint(logit.full)

                 2.5 %    97.5 %
(Intercept)   -2.00843  2.966732
age           -0.11268  0.038664
lwt           -0.02948 -0.001962
smokeYes      -0.05784  1.614551
htYes          0.44858  3.286032
uiYes         -0.19642  1.637396
ftv.catNone   -0.52029  1.054060
ftv.catMany   -0.64396  2.209601
race.catBlack  0.14082  2.262345
race.catOther -0.14050  1.672051
preterm1+      0.34686  2.193097

exp(confint(logit.full))

               2.5 % 97.5 %
(Intercept)   0.1342 19.428
age           0.8934  1.039
lwt           0.9710  0.998
smokeYes      0.9438  5.026
htYes         1.5661 26.737
uiYes         0.8217  5.142
ftv.catNone   0.5943  2.869
ftv.catMany   0.5252  9.112
race.catBlack 1.1512  9.606
race.catOther 0.8689  5.323
preterm1+     1.4146  8.963

Odds ratios using epicalc::logistic.display()

library(epicalc)
logistic.display(logit.full)


Logistic regression predicting low 

                     crude OR(95%CI)    adj. OR(95%CI)     P(Wald's test) P(LR-test)
age (cont. var.)     0.95 (0.89,1.01)   0.97 (0.9,1.04)    0.355          0.351     

lwt (cont. var.)     0.99 (0.97,1)      0.99 (0.97,1)      0.032          0.023     

smoke: Yes vs No     2.02 (1.08,3.78)   2.16 (0.94,4.95)   0.07           0.068     

ht: Yes vs No        3.37 (1.02,11.09)  6.09 (1.51,24.52)  0.011          0.009     

ui: Yes vs No        2.58 (1.14,5.83)   2.07 (0.83,5.13)   0.118          0.122     

ftv.cat: ref.=Normal                                                      0.514     
   None              1.84 (0.95,3.59)   1.29 (0.59,2.83)   0.518                    
   Many              2.34 (0.66,8.28)   2.23 (0.55,9.09)   0.262                    

race.cat: ref.=White                                                      0.055     
   Black             2.33 (0.94,5.77)   3.29 (1.15,9.42)   0.027                    
   Other             1.89 (0.96,3.74)   2.12 (0.86,5.23)   0.101                    

preterm: 1+ vs 0     4.32 (1.92,9.73)   3.5 (1.4,8.75)     0.007          0.007     

Log-likelihood = -97.7515
No. of observations = 189
AIC value = 217.5031

Analysis of Deviance Table (type I, Sequential F-test equivalent)

anova(logit.full, test = "LRT")

Analysis of Deviance Table

Model: binomial, link: logit

Response: low

Terms added sequentially (first to last)

         Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
NULL                       188        235            
age       1     2.76       187        232   0.0966 . 
lwt       1     4.79       186        227   0.0286 * 
smoke     1     4.24       185        223   0.0394 * 
ht        1     7.20       184        216   0.0073 **
ui        1     3.91       183        212   0.0481 * 
ftv.cat   2     1.65       181        210   0.4373   
race.cat  2     7.26       179        203   0.0265 * 
preterm   1     7.36       178        196   0.0067 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis of Deviance Table (type III, Marginal F-test equivalent)

library(car)
Anova(logit.full, type = 3, test = "LR")

Analysis of Deviance Table (Type III tests)

Response: low
         LR Chisq Df Pr(>Chisq)   
age          0.87  1     0.3513   
lwt          5.15  1     0.0233 * 
smoke        3.32  1     0.0683 . 
ht           6.80  1     0.0091 **
ui           2.40  1     0.1215   
ftv.cat      1.33  2     0.5141   
race.cat     5.79  2     0.0554 . 
preterm      7.36  1     0.0067 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Effect displays

Effect of a given predictor after other variables are typical values.

library(effects)
plot(allEffects(logit.full), ylim = c(0,1))

plot of chunk unnamed-chunk-12

More complex model for interaction plot

The model itself is not very meaningful. This is for demonstration purpose only.

logit.full.int <- glm(low ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw, family = binomial)
Anova(logit.full.int, type = 3, test = "LR")

Analysis of Deviance Table (Type III tests)

Response: low
                 LR Chisq Df Pr(>Chisq)   
age                  0.41  1     0.5233   
lwt                  0.23  1     0.6325   
smoke                3.37  1     0.0665 . 
ht                   6.23  1     0.0126 * 
ui                   3.30  1     0.0691 . 
ftv.cat             10.55  2     0.0051 **
race.cat             3.45  2     0.1777   
preterm              4.27  1     0.0388 * 
age:lwt              0.00  1     0.9991   
age:ftv.cat         10.62  2     0.0049 **
race.cat:preterm     0.06  2     0.9698   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Continuous x Continuous interaction

plot(effect("age:lwt", logit.full.int))

plot of chunk unnamed-chunk-14

plot(effect("age:lwt", logit.full.int, xlevels = list(lwt = c(100,150,200,250))), multiline = T)

plot of chunk unnamed-chunk-14

Continuous x Categorical interaction

plot(effect("age:ftv.cat", logit.full.int), multiline = TRUE)

plot of chunk unnamed-chunk-15

Categorical x Categorical interaction

Use x.var and z.var to override the default X-axis variable and grouping variable assignment. The default is the predictor with the largest number of levels or values for x.var, which is not useful in categorical x categorical interactions.

plot(effect(c("race.cat*preterm"), logit.full.int), multiline = TRUE)

plot of chunk unnamed-chunk-16

plot(effect(c("race.cat*preterm"), logit.full.int), x.var = "preterm", z.var = "race.cat", multiline = TRUE)

plot of chunk unnamed-chunk-16