Assessment of performance of prediction models with PredictABEL

References

Introduction

The performance of a predictin model can be assessed in several measures.

Overall measures incorporate both calibration and discrimination, for example \( R^2 \) is a such a measure. Discrimination is the model’s ability to separate those with and without events. ROC analysis assess discrimination. Net reclassification improvement and integrated discrimination improvement are other measures of discrimination. Calibration is the agreement between observed outcomes and model’s prediction. Calibration plot and Hosmer-Lemeshow goodness-of-fit test assess calibration.

Load data

These data come from Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition. These data are copyrighted by John Wiley & Sons Inc. and must be acknowledged and used accordingly. ( http://www.umass.edu/statdata/statdata/data/lowbwt.txt ).

## Fix cases 10 and 39 to make them identical to Dr. Orav's dataset
lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4)
lbw[c(10,39),"BWT"] <- c(2655,3035)

## Change variable names to lower cases
names(lbw) <- tolower(names(lbw))

## Recoding
lbw <- within(lbw, {

    ## Relabel race
    racecat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

    ## Categorize ftv (frequency of visit)
    ftvcat  <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
    ftvcat  <- relevel(ftvcat, ref = "Normal")

    ## Dichotomize ptl
    preterm  <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))
})

## Categorize smoke ht ui
lbw[,c("smoke","ht","ui")] <-
    lapply(lbw[,c("smoke","ht","ui")],
           function(var) {
               var <- factor(var, levels = 0:1, labels = c("No","Yes"))
           })

summary(lbw)
       id           low             age            lwt           race      smoke          ptl          ht     
 Min.   :  4   Min.   :0.000   Min.   :14.0   Min.   : 80   Min.   :1.00   No :115   Min.   :0.000   No :177  
 1st Qu.: 68   1st Qu.:0.000   1st Qu.:19.0   1st Qu.:110   1st Qu.:1.00   Yes: 74   1st Qu.:0.000   Yes: 12  
 Median :123   Median :0.000   Median :23.0   Median :121   Median :1.00             Median :0.000            
 Mean   :121   Mean   :0.312   Mean   :23.2   Mean   :130   Mean   :1.85             Mean   :0.196            
 3rd Qu.:176   3rd Qu.:1.000   3rd Qu.:26.0   3rd Qu.:140   3rd Qu.:3.00             3rd Qu.:0.000            
 Max.   :226   Max.   :1.000   Max.   :45.0   Max.   :250   Max.   :3.00             Max.   :3.000            
   ui           ftv             bwt       preterm     ftvcat     racecat  
 No :161   Min.   :0.000   Min.   : 709   0 :159   Normal: 77   White:96  
 Yes: 28   1st Qu.:0.000   1st Qu.:2414   1+: 30   None  :100   Black:26  
           Median :0.000   Median :2977            Many  : 12   Other:67  
           Mean   :0.794   Mean   :2945                                   
           3rd Qu.:1.000   3rd Qu.:3475                                   
           Max.   :6.000   Max.   :4990                                   

Fit logistic regression models

Outcome is dichotmous indicator of low birthweight of a newborn. Three models were created.

logistic.model.list <-
    list(null        = glm(low ~ 1,             data = lbw, family = binomial),
         lwt         = glm(low ~ lwt,           data = lbw, family = binomial),
         lwt.racecat = glm(low ~ lwt + racecat, data = lbw, family = binomial)
         )

Incorporate predicted values in dataset

lbw <- within(lbw, {
    ## Obtain fitted values from two models
    fitted.lwt             <- fitted(logistic.model.list[["lwt"]])
    fitted.lwt.racecat     <- fitted(logistic.model.list[["lwt.racecat"]])

    ## Changes in predicted probability
    change                 <- factor(sign(fitted.lwt.racecat - fitted.lwt),
                                     levels = c(-1,0,1), labels = c("Down","Unchanged","Up"))

    ## Mark classification
    fitted.lwt.pos         <- as.numeric(fitted.lwt         >= 0.4)
    fitted.lwt.racecat.pos <- as.numeric(fitted.lwt.racecat >= 0.4)

    ## Changes in classification
    reclass               <- factor(sign(fitted.lwt.racecat.pos - fitted.lwt.pos),
                                    levels = c(-1,0,1), labels = c("Down","Unchanged","Up"))
})

ROC analysis using pROC package

Addition of another predictor usually increase AUC by a small margin.

library(pROC)

## Create two ROC curves
roc.lwt         <- roc(low ~ fitted.lwt, data = lbw)
roc.lwt.racecat <- roc(low ~ fitted.lwt.racecat, data = lbw)

## Plot ROCs
junk <- plot(roc.lwt,         lty = 1, col = 1, add = F, legacy.axes = T)
junk <- plot(roc.lwt.racecat, lty = 2, col = 2, add = T)
legend(0.4, 0.2, lty = 1:2, col = 1:2, legend = c("lwt","lwt.racecat"), bty = "n")

plot of chunk unnamed-chunk-5


## Test the increment
roc.test(roc.lwt, roc.lwt.racecat)

    DeLong's test for two correlated ROC curves

data:  roc.lwt and roc.lwt.racecat
Z = -0.9669, p-value = 0.3336
alternative hypothesis: true difference in AUC is not equal to 0
sample estimates:
AUC of roc1 AUC of roc2 
     0.6131      0.6473 
## Best cutoff by closest to topleft method
coords(roc = roc.lwt , x= "best" , input = "threshold", best.method = "closest.topleft")
  threshold specificity sensitivity 
     0.3422      0.6846      0.4915 

Discrimination boxplot and slopes using PredictABEL package

The separation of two parallel boxplots are better in the two-predictor model on the right. The discrimination slope is the slope of the line connecting the means in the non-case and cases.

library(PredictABEL)
layout(matrix(1:2, ncol = 2))
slope.lwt <-
    plotDiscriminationBox(data = lbw, cOutcome = 2, plottitle = "lwt",
                          predrisk = fitted(logistic.model.list[["lwt"]]),
                          labels   = c("Without disease", "With disease"))

slope.lwt.racecat <-
    plotDiscriminationBox(data = lbw, cOutcome = 2, plottitle = "lwt.racecat",
                          predrisk = fitted(logistic.model.list[["lwt.racecat"]]),
                          labels   = c("Without disease", "With disease"))

plot of chunk unnamed-chunk-7


c(slope.lwt = slope.lwt, slope.lwt.racecat = slope.lwt.racecat)
$slope.lwt.Discrim_Slope
[1] 0.031

$slope.lwt.racecat.Discrim_Slope
[1] 0.058

The discrimination slope is the slope formed between the mean predicted probabilities in those who with an without disease. The difference in the slopes from two different models is the IDI.

NRI and IDI using PredictABEL package

Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI).

NRI is defined as, \( NRI = [P(up|D = 1) − P(down|D = 1)] − [P(up|D = 0) − P(down|D = 0)] \)

Only reclassification across the given cutoff is considered in categorical NRI, any upward or downward movement is considered in continuous NRI.

This is effectively, sum of good reclassification proportion minus sum of bad reclassification.

For the categorical NRI, the cutoff was arbitrary set at 0.4 for demonstration purpose.

reclassification(data = lbw, cOutcome = 2,
                 predrisk1 = fitted(logistic.model.list[["lwt"]]),
                 predrisk2 = fitted(logistic.model.list[["lwt.racecat"]]),
                 cutoff = c(0, 0.4, 1)
                 )
 _________________________________________

     Reclassification table    
 _________________________________________

 Outcome: absent 

             Updated Model
Initial Model [0,0.4) [0.4,1]  % reclassified
      [0,0.4)     104      16              13
      [0.4,1]       5       5              50


 Outcome: present 

             Updated Model
Initial Model [0,0.4) [0.4,1]  % reclassified
      [0,0.4)      34      16              32
      [0.4,1]       3       6              33


 Combined Data 

             Updated Model
Initial Model [0,0.4) [0.4,1]  % reclassified
      [0,0.4)     138      32              19
      [0.4,1]       8      11              42
 _________________________________________

 NRI(Categorical) [95% CI]: 0.1357 [ -0.0138 - 0.2853 ] ; p-value: 0.0753 
 NRI(Continuous) [95% CI]: 0.3588 [ 0.0572 - 0.6604 ] ; p-value: 0.0197 
 IDI [95% CI]: 0.0277 [ 0.0028 - 0.0526 ] ; p-value: 0.02938 

## Categorical NRI manual calculation
(16 - 3) / (34 + 16 + 3 + 6) - (16 - 5) / (104 + 16 + 5 + 5)
[1] 0.1357
## Implementation as a function
catNriElements <- with(lbw, by(fitted.lwt.racecat.pos - fitted.lwt.pos, low,
                                function(x) {
                                    ## Probability of up/down/unchanged
                                    propChanges <- prop.table(table(sign(x)))
                                    ## up - down
                                    propChanges["1"] - propChanges["-1"]
                                }))
catNriElements["1"] - catNriElements["0"]
     1 
0.1357 

## Continuous NRI: [(up - down)/(up + down + unchanged) in low = 1] - [(up - down)/(up + down + unchanged) in low = 0] in terms of predicted probability changes
xtabs(~ change +low, lbw)
           low
change       0  1
  Down      74 23
  Unchanged  0  0
  Up        56 36
(36 - 23) / (23 + 36) - (56 - 74) / (74 + 56)
[1] 0.3588
## Implementation as a function
contNriElements <- with(lbw, by(fitted.lwt.racecat - fitted.lwt, low,
                                function(x) {
                                    ## Probability of up/down/unchanged
                                    propChanges <- prop.table(table(sign(x)))
                                    ## up - down
                                    propChanges["1"] - propChanges["-1"]
                                }))
contNriElements["1"] - contNriElements["0"]
     1 
0.3588 

## IDI: Compare mean predicted probability among low = 1 and low = 0, and subtract
idiElements <- with(lbw, by(fitted.lwt.racecat - fitted.lwt, low, mean))
idiElements["1"] - idiElements["0"]
      1 
0.02769 

Graphical representation of continuous NRI

For non-cases (outcome = 0), decreasing predicted probability is good. For cases (outcome = 1), increasing predicted probability is good.

library(reshape2)
lbw.melt <- melt(lbw[,c("id","low","change","reclass","fitted.lwt","fitted.lwt.racecat")],
                 id.vars = c("id","low","change","reclass"))

library(ggplot2)
ggplot(lbw.melt, aes(x = variable, y = value, group = id, color = change)) +
    geom_point() +
    geom_line() +
    facet_grid(. ~ low) +
    scale_color_manual(name = "Reclassification",
                       values = c("Up"="red", "Down"="blue"),
                       limit = c("Up","Down")) +
    scale_x_discrete(name = "Models") +
    scale_y_continuous(name = "Predicted probability", limit = c(0,1)) +
    labs(title = "Faceted by outcome")

plot of chunk unnamed-chunk-9

Graphical representation of categorical NRI

For non-cases (outcome = 0), down-reclassification (blue) is good. For cases (outcome = 1), up-reclassification (red) is good. The other way around is incorrect reclassification. Patients indicated with gray lines experienced no reclassification.

ggplot(lbw.melt, aes(x = variable, y = value, group = id, color = reclass)) +
    geom_point() +
    geom_line() +
    facet_grid(. ~ low) +
    scale_color_manual(name = "Reclassification",
                       values = c("Up"="red", "Down"="blue", "Unchanged"="gray"),
                       limit = c("Up","Unchanged","Down")) +
    scale_x_discrete(name = "Models") +
    scale_y_continuous(name = "Predicted probability", breaks = c(0, 0.4, 1), limit = c(0,1)) +
    labs(title = "Faceted by outcome")

plot of chunk unnamed-chunk-10

Calibration plot and Hosmer-Lemeshow goodness of fit test statistics using PredictABEL package

plotCalibration(data = lbw, cOutcome = 2, predRisk = fitted(logistic.model.list[[3]]), groups = 10)

plot of chunk unnamed-chunk-11

$Table_HLtest
               total meanpred meanobs predicted observed
[0.0589,0.168)    19    0.125   0.105      2.37        2
[0.1681,0.225)    21    0.202   0.190      4.25        4
[0.2255,0.255)    17    0.237   0.294      4.03        5
[0.2548,0.274)    19    0.264   0.211      5.01        4
[0.2738,0.299)    19    0.285   0.421      5.42        8
[0.2987,0.338)    19    0.321   0.316      6.11        6
[0.3383,0.372)    23    0.358   0.261      8.23        6
[0.3716,0.415)    15    0.390   0.200      5.85        3
[0.4152,0.483)    20    0.443   0.600      8.85       12
[0.4829,0.597]    17    0.523   0.529      8.89        9

$Chi_square
[1] 7.619

$df
[1] 8

$p_value
[1] 0.4715