The performance of a predictin model can be assessed in several measures.
Overall measures incorporate both calibration and discrimination, for example \( R^2 \) is a such a measure. Discrimination is the model’s ability to separate those with and without events. ROC analysis assess discrimination. Net reclassification improvement and integrated discrimination improvement are other measures of discrimination. Calibration is the agreement between observed outcomes and model’s prediction. Calibration plot and Hosmer-Lemeshow goodness-of-fit test assess calibration.
These data come from Hosmer and Lemeshow (2000) Applied Logistic Regression: Second Edition. These data are copyrighted by John Wiley & Sons Inc. and must be acknowledged and used accordingly. ( http://www.umass.edu/statdata/statdata/data/lowbwt.txt ).
## Fix cases 10 and 39 to make them identical to Dr. Orav's dataset
lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4)
lbw[c(10,39),"BWT"] <- c(2655,3035)
## Change variable names to lower cases
names(lbw) <- tolower(names(lbw))
## Recoding
lbw <- within(lbw, {
## Relabel race
racecat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
## Categorize ftv (frequency of visit)
ftvcat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
ftvcat <- relevel(ftvcat, ref = "Normal")
## Dichotomize ptl
preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))
})
## Categorize smoke ht ui
lbw[,c("smoke","ht","ui")] <-
lapply(lbw[,c("smoke","ht","ui")],
function(var) {
var <- factor(var, levels = 0:1, labels = c("No","Yes"))
})
summary(lbw)
id low age lwt race smoke ptl ht
Min. : 4 Min. :0.000 Min. :14.0 Min. : 80 Min. :1.00 No :115 Min. :0.000 No :177
1st Qu.: 68 1st Qu.:0.000 1st Qu.:19.0 1st Qu.:110 1st Qu.:1.00 Yes: 74 1st Qu.:0.000 Yes: 12
Median :123 Median :0.000 Median :23.0 Median :121 Median :1.00 Median :0.000
Mean :121 Mean :0.312 Mean :23.2 Mean :130 Mean :1.85 Mean :0.196
3rd Qu.:176 3rd Qu.:1.000 3rd Qu.:26.0 3rd Qu.:140 3rd Qu.:3.00 3rd Qu.:0.000
Max. :226 Max. :1.000 Max. :45.0 Max. :250 Max. :3.00 Max. :3.000
ui ftv bwt preterm ftvcat racecat
No :161 Min. :0.000 Min. : 709 0 :159 Normal: 77 White:96
Yes: 28 1st Qu.:0.000 1st Qu.:2414 1+: 30 None :100 Black:26
Median :0.000 Median :2977 Many : 12 Other:67
Mean :0.794 Mean :2945
3rd Qu.:1.000 3rd Qu.:3475
Max. :6.000 Max. :4990
Outcome is dichotmous indicator of low birthweight of a newborn. Three models were created.
logistic.model.list <-
list(null = glm(low ~ 1, data = lbw, family = binomial),
lwt = glm(low ~ lwt, data = lbw, family = binomial),
lwt.racecat = glm(low ~ lwt + racecat, data = lbw, family = binomial)
)
lbw <- within(lbw, {
## Obtain fitted values from two models
fitted.lwt <- fitted(logistic.model.list[["lwt"]])
fitted.lwt.racecat <- fitted(logistic.model.list[["lwt.racecat"]])
## Changes in predicted probability
change <- factor(sign(fitted.lwt.racecat - fitted.lwt),
levels = c(-1,0,1), labels = c("Down","Unchanged","Up"))
## Mark classification
fitted.lwt.pos <- as.numeric(fitted.lwt >= 0.4)
fitted.lwt.racecat.pos <- as.numeric(fitted.lwt.racecat >= 0.4)
## Changes in classification
reclass <- factor(sign(fitted.lwt.racecat.pos - fitted.lwt.pos),
levels = c(-1,0,1), labels = c("Down","Unchanged","Up"))
})
Addition of another predictor usually increase AUC by a small margin.
library(pROC)
## Create two ROC curves
roc.lwt <- roc(low ~ fitted.lwt, data = lbw)
roc.lwt.racecat <- roc(low ~ fitted.lwt.racecat, data = lbw)
## Plot ROCs
junk <- plot(roc.lwt, lty = 1, col = 1, add = F, legacy.axes = T)
junk <- plot(roc.lwt.racecat, lty = 2, col = 2, add = T)
legend(0.4, 0.2, lty = 1:2, col = 1:2, legend = c("lwt","lwt.racecat"), bty = "n")
## Test the increment
roc.test(roc.lwt, roc.lwt.racecat)
DeLong's test for two correlated ROC curves
data: roc.lwt and roc.lwt.racecat
Z = -0.9669, p-value = 0.3336
alternative hypothesis: true difference in AUC is not equal to 0
sample estimates:
AUC of roc1 AUC of roc2
0.6131 0.6473
## Best cutoff by closest to topleft method
coords(roc = roc.lwt , x= "best" , input = "threshold", best.method = "closest.topleft")
threshold specificity sensitivity
0.3422 0.6846 0.4915
The separation of two parallel boxplots are better in the two-predictor model on the right. The discrimination slope is the slope of the line connecting the means in the non-case and cases.
library(PredictABEL)
layout(matrix(1:2, ncol = 2))
slope.lwt <-
plotDiscriminationBox(data = lbw, cOutcome = 2, plottitle = "lwt",
predrisk = fitted(logistic.model.list[["lwt"]]),
labels = c("Without disease", "With disease"))
slope.lwt.racecat <-
plotDiscriminationBox(data = lbw, cOutcome = 2, plottitle = "lwt.racecat",
predrisk = fitted(logistic.model.list[["lwt.racecat"]]),
labels = c("Without disease", "With disease"))
c(slope.lwt = slope.lwt, slope.lwt.racecat = slope.lwt.racecat)
$slope.lwt.Discrim_Slope
[1] 0.031
$slope.lwt.racecat.Discrim_Slope
[1] 0.058
The discrimination slope is the slope formed between the mean predicted probabilities in those who with an without disease. The difference in the slopes from two different models is the IDI.
Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI).
NRI is defined as, \( NRI = [P(up|D = 1) − P(down|D = 1)] − [P(up|D = 0) − P(down|D = 0)] \)
Only reclassification across the given cutoff is considered in categorical NRI, any upward or downward movement is considered in continuous NRI.
This is effectively, sum of good reclassification proportion minus sum of bad reclassification.
For the categorical NRI, the cutoff was arbitrary set at 0.4 for demonstration purpose.
reclassification(data = lbw, cOutcome = 2,
predrisk1 = fitted(logistic.model.list[["lwt"]]),
predrisk2 = fitted(logistic.model.list[["lwt.racecat"]]),
cutoff = c(0, 0.4, 1)
)
_________________________________________
Reclassification table
_________________________________________
Outcome: absent
Updated Model
Initial Model [0,0.4) [0.4,1] % reclassified
[0,0.4) 104 16 13
[0.4,1] 5 5 50
Outcome: present
Updated Model
Initial Model [0,0.4) [0.4,1] % reclassified
[0,0.4) 34 16 32
[0.4,1] 3 6 33
Combined Data
Updated Model
Initial Model [0,0.4) [0.4,1] % reclassified
[0,0.4) 138 32 19
[0.4,1] 8 11 42
_________________________________________
NRI(Categorical) [95% CI]: 0.1357 [ -0.0138 - 0.2853 ] ; p-value: 0.0753
NRI(Continuous) [95% CI]: 0.3588 [ 0.0572 - 0.6604 ] ; p-value: 0.0197
IDI [95% CI]: 0.0277 [ 0.0028 - 0.0526 ] ; p-value: 0.02938
## Categorical NRI manual calculation
(16 - 3) / (34 + 16 + 3 + 6) - (16 - 5) / (104 + 16 + 5 + 5)
[1] 0.1357
## Implementation as a function
catNriElements <- with(lbw, by(fitted.lwt.racecat.pos - fitted.lwt.pos, low,
function(x) {
## Probability of up/down/unchanged
propChanges <- prop.table(table(sign(x)))
## up - down
propChanges["1"] - propChanges["-1"]
}))
catNriElements["1"] - catNriElements["0"]
1
0.1357
## Continuous NRI: [(up - down)/(up + down + unchanged) in low = 1] - [(up - down)/(up + down + unchanged) in low = 0] in terms of predicted probability changes
xtabs(~ change +low, lbw)
low
change 0 1
Down 74 23
Unchanged 0 0
Up 56 36
(36 - 23) / (23 + 36) - (56 - 74) / (74 + 56)
[1] 0.3588
## Implementation as a function
contNriElements <- with(lbw, by(fitted.lwt.racecat - fitted.lwt, low,
function(x) {
## Probability of up/down/unchanged
propChanges <- prop.table(table(sign(x)))
## up - down
propChanges["1"] - propChanges["-1"]
}))
contNriElements["1"] - contNriElements["0"]
1
0.3588
## IDI: Compare mean predicted probability among low = 1 and low = 0, and subtract
idiElements <- with(lbw, by(fitted.lwt.racecat - fitted.lwt, low, mean))
idiElements["1"] - idiElements["0"]
1
0.02769
Graphical representation of continuous NRI
For non-cases (outcome = 0), decreasing predicted probability is good. For cases (outcome = 1), increasing predicted probability is good.
library(reshape2)
lbw.melt <- melt(lbw[,c("id","low","change","reclass","fitted.lwt","fitted.lwt.racecat")],
id.vars = c("id","low","change","reclass"))
library(ggplot2)
ggplot(lbw.melt, aes(x = variable, y = value, group = id, color = change)) +
geom_point() +
geom_line() +
facet_grid(. ~ low) +
scale_color_manual(name = "Reclassification",
values = c("Up"="red", "Down"="blue"),
limit = c("Up","Down")) +
scale_x_discrete(name = "Models") +
scale_y_continuous(name = "Predicted probability", limit = c(0,1)) +
labs(title = "Faceted by outcome")
Graphical representation of categorical NRI
For non-cases (outcome = 0), down-reclassification (blue) is good. For cases (outcome = 1), up-reclassification (red) is good. The other way around is incorrect reclassification. Patients indicated with gray lines experienced no reclassification.
ggplot(lbw.melt, aes(x = variable, y = value, group = id, color = reclass)) +
geom_point() +
geom_line() +
facet_grid(. ~ low) +
scale_color_manual(name = "Reclassification",
values = c("Up"="red", "Down"="blue", "Unchanged"="gray"),
limit = c("Up","Unchanged","Down")) +
scale_x_discrete(name = "Models") +
scale_y_continuous(name = "Predicted probability", breaks = c(0, 0.4, 1), limit = c(0,1)) +
labs(title = "Faceted by outcome")
plotCalibration(data = lbw, cOutcome = 2, predRisk = fitted(logistic.model.list[[3]]), groups = 10)
$Table_HLtest
total meanpred meanobs predicted observed
[0.0589,0.168) 19 0.125 0.105 2.37 2
[0.1681,0.225) 21 0.202 0.190 4.25 4
[0.2255,0.255) 17 0.237 0.294 4.03 5
[0.2548,0.274) 19 0.264 0.211 5.01 4
[0.2738,0.299) 19 0.285 0.421 5.42 8
[0.2987,0.338) 19 0.321 0.316 6.11 6
[0.3383,0.372) 23 0.358 0.261 8.23 6
[0.3716,0.415) 15 0.390 0.200 5.85 3
[0.4152,0.483) 20 0.443 0.600 8.85 12
[0.4829,0.597] 17 0.523 0.529 8.89 9
$Chi_square
[1] 7.619
$df
[1] 8
$p_value
[1] 0.4715