Background

A new study came out claiming stents increase mortality and it set indian twitterverse ablaze. You can read the study here.

The authors ought to be congratulated for carrying a study examining indian stent database.considering lack of dedicated registry and appropriate use criteria here, as well as open-sourcing the dataset

While stents definitely have limited benefit in chronic stable angina(mild freedom from angina in stable single vessel disease,no mortality benefit). Stents have definite mortality benefit in unstable patients- Acute coronary syndromes(STEMI and high risk NSTEMI ) and 2 vessel stable Angina.

Given the fact that 50% of cohort was acute coronary syndrome, i was surprised at these findings.

i and many others on twitter were intringued thinking that higher number of stents indeed might be a proxy for severe disease because though the thrombogenic potential of stents is well known , 15 year [long term RCT data of stents in stable angina)[https://www.nejm.org/doi/full/10.1056/nejmoa1505532] has failed to show a higher mortality(has not reduced it as well) with symptomatic benefit.

Also there are clear cut benefits of stents in Acute MI .

I was also surprised that in one of the figures while the total number of stents did not attain statistical significance in multivariate model to account for confounders, it was claimed as a predictor of mortality.

library(knitr)
include_graphics('stenter.png')

I communicated with authors who were generous with comments and explained that the explanatory model was secondary objective of the study(primary being comparison of DES n BMS) and they used a random forest model which does not generated traditional effect sizes and also on univariate analysis stent number was indeed associated with mortality(however it was not on adjusting for confounders)

However i remain in doubt because explanatory figure indicates a tradtional statistical model was indeed used(chi square n p values) which failed to show statistical significance for number of stents in multivariate model on adjusting and hence authors used the black box random forest variable importance plot for their assertion.

The authors said they used random forest model because of -a) non linear effect b)imbalance. Howver in my opinion it is a weak criticism as in medicine we need explanatory models as well and imbalance and non linear issues can be taken care of by adding penalised logistic regression and highe rorder splines for explanatory variables.

While i was a big fan of AI/ML methods, and have used them to develop models for prediction of mortality in hemorrhagic stroke, i have changed my mind on the subject and believe in spirit of parsimony and a simple logistic regression with appropriate modifications can do the job as well with additional advantage of giving effect sizes.

you can read the take down of ML models by Frank Harrell here

Also of note, in MI the databases often dont measure really granular details like vitals well and often have coding errors, so IMO while relying on them for stent utilization might be a good idea, realizing on them to predict outcomes is surely a hit and miss.

Aims

Visualisation of Explanatory variables in patients with PCI
Comparison of ML/AI algorithms with logisitc regression

Visualise

Lets read the data

library(tidyverse)
stents=read_csv('https://doi.org/10.1371/journal.pone.0196830.s006')
stents11= stents %>% mutate(status_death=ifelse(stents$status_death==0,"Alive","Dead")) %>% select(CAD_History,status_death,Age_tmnt,Sex_tmnt,total_length,total_number_of_stents,Site_ID,year_ptca,
         smoke,Diabetes,Hypertension,Employed_status,min_min_width,Education_level) %>% drop_na()

we see mortality by number of stents

knitr::kable(stents11 %>% mutate(status_death=as.factor(status_death),stents=as.factor(total_number_of_stents ))%>% group_by(status_death,stents) %>% summarise(n=n()) %>% 
      ungroup() %>% 
    group_by(stents) %>% 
      mutate(per=paste0(round(n/sum(n)*100, 2), "%")) %>% arrange(stents))

status_death	stents	n	per
Alive	1	2324	96.23%
Dead	1	91	3.77%
Alive	2	1439	94.73%
Dead	2	80	5.27%
Alive	3	298	93.42%
Dead	3	21	6.58%
Alive	4	45	91.84%
Dead	4	4	8.16%
Alive	5	5	83.33%
Dead	5	1	16.67%

So indeed mortality rises with number of stents placed,but as discussed before it can be reflective of sicker patients as well, and its true value can be discovered only after adjusting for confounders.

Let us analyse by site

stents11 %>% mutate(status_death=as.factor(status_death),site=as.factor(Site_ID) )%>% group_by(status_death,site) %>% count() %>% ungroup %>% 
      # rename(status_death=as.factor(status_death),year=as.factor(year_ptca)) %>% 
      ggplot(aes(x = status_death, y = n, fill = site)) +
      geom_bar(position = "fill" ,stat="identity") +coord_flip()

we see there is no difference in percent of alive and dead dpending upon site

Let us analyse by CAD history-

knitr::kable(stents11 %>% mutate(status_death=as.factor(status_death),history=as.factor(CAD_History) )%>% group_by(status_death,history) %>% summarise(n=n())%>% 
      mutate(freq = n / sum(n))
    )

status_death	history	n	freq
Alive	No	3517	0.8555096
Alive	Yes	594	0.1444904
Dead	No	182	0.9238579
Dead	Yes	15	0.0761421

we see a very weird finding that those with history of CAD are less likely to die in the database

Lets plot it

stents11 %>% mutate(status_death=as.factor(status_death),history=as.factor(CAD_History) )%>% group_by(status_death,history) %>% count() %>% ungroup %>% 
      # rename(status_death=as.factor(status_death),year=as.factor(year_ptca)) %>% 
      ggplot(aes(x = status_death, y = n, fill = history)) +
      geom_bar(position = "fill" ,stat="identity") +coord_flip()

Lets visualise mortality by year

  stents11 %>% mutate(status_death=as.factor(status_death),year=as.factor(year_ptca) )%>% group_by(status_death,year_ptca) %>% count() %>% group_by(year_ptca) %>% 
    mutate(freq = n / sum(n)) %>% arrange(year_ptca)

## # A tibble: 10 x 4
## # Groups:   year_ptca [5]
##    status_death year_ptca     n   freq
##    <fct>            <int> <int>  <dbl>
##  1 Alive             2012   645 0.949 
##  2 Dead              2012    35 0.0515
##  3 Alive             2013  1117 0.956 
##  4 Dead              2013    52 0.0445
##  5 Alive             2014   607 0.951 
##  6 Dead              2014    31 0.0486
##  7 Alive             2015  1009 0.957 
##  8 Dead              2015    45 0.0427
##  9 Alive             2016   733 0.956 
## 10 Dead              2016    34 0.0443

we see there is average 1 year mortality of 4.4-5% in PCI while it is around 1.7-2% in USA. But it depends on a lot of factor and in my opinion cant be attributed to PCI alone.

stents11 %>% mutate(status_death=as.factor(status_death),year=as.factor(year_ptca) )%>% group_by(status_death,year_ptca) %>% count() %>% ungroup %>% #%>% 
   # rename(status_death=as.factor(status_death),year=as.factor(year_ptca)) %>% 
  ggplot(aes(x = year_ptca, y = n, fill = status_death)) +
  geom_bar(position = "fill" ,stat="identity") +coord_flip()

Lets plot utilisation

knitr::include_graphics('stentra.png')

we see while totalstent utilisation has remained constant , DES have overtaken BMS due to fall in cost.

Ok now lets visualise models

Models

Lets assess performance of traditional logistic regression model with explanatory variables used by the paper. Age,Sex,total_length,total_number_of_stents,Site_ID,year_ptca, Diabetes,Hypertension,Employed_status,min_min_width,Education_level

library(randomForest)
library(caret)

  fitControl <- trainControl(method = "repeatedcv",number = 5,repeats = 5,
                             ## Estimate class probabilities 
                             classProbs = TRUE,
                             ## Evaluate performance using 
                             ## the following function
                             summaryFunction = twoClassSummary)
  
  modelLR <- train(status_death ~., # Survived is a function of the variables we decided to include
                    data = stents11, # Use the trainSet dataframe as the training data
                    method = "glm",# Use the "logistic regression" algorithm
                    family = "binomial",
                    metric = "ROC",
                    trControl = fitControl
  )
  
  modelLR

## Generalized Linear Model 
## 
## 4308 samples
##   13 predictors
##    2 classes: 'Alive', 'Dead' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 3447, 3446, 3447, 3446, 3446, 3446, ... 
## Resampling results:
## 
##   ROC        Sens  Spec
##   0.6175299  1     0

we see traditional logistic regression model has an AUROC of 0.61 (0.5 is as good as tossing a coin),So it is not a good performance as we would expect a good model to have at least 0.8 or above.

lets see summary of model

summary(modelLR)

## 
## Call:
## NULL
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.6473  -0.3432  -0.2782  -0.2223   2.8348  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                111.129747 119.377654   0.931    0.352    
## CAD_HistoryYes              -0.571956   0.275627  -2.075    0.038 *  
## Age_tmnt                     0.040256   0.008036   5.010 5.45e-07 ***
## Sex_tmntMale                 0.074827   0.185619   0.403    0.687    
## total_length                 0.003753   0.005720   0.656    0.512    
## total_number_of_stents       0.216882   0.160762   1.349    0.177    
## `Site_IDOut of Mumbai`       0.003958   0.170019   0.023    0.981    
## year_ptca                   -0.058032   0.059308  -0.978    0.328    
## `smokeNon Smoker`           -0.127865   0.215976  -0.592    0.554    
## `smokePast Smoker`           0.032775   0.272804   0.120    0.904    
## DiabetesYes                  0.144985   0.157838   0.919    0.358    
## HypertensionYes             -0.318571   0.157391  -2.024    0.043 *  
## Employed_statusYes          -0.181647   0.179349  -1.013    0.311    
## min_min_width                0.031934   0.184056   0.173    0.862    
## Education_levelSome_Higher  -0.048303   0.156358  -0.309    0.757    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1600.4  on 4307  degrees of freedom
## Residual deviance: 1539.2  on 4293  degrees of freedom
## AIC: 1569.2
## 
## Number of Fisher Scoring iterations: 6

we see while Age,CAD History and Hypertension ar related with mortality,total number of stents is not.

Lets see if we can get better model effect by using non linear effect on age which is an important variable

 library(rms)
stents12=stents11 %>% mutate(status_death=ifelse(status_death=="Dead",1,0))
  
ddd <- datadist(stents12)
  options( datadist = "ddd" )

  f4=lrm(status_death~rcs(Age_tmnt,4)+rcs(total_length,3)+total_number_of_stents+CAD_History+smoke+
           Diabetes+Hypertension+Employed_status+min_min_width+Education_level+Site_ID,data=stents11)
  print(anova(f4))

##                 Wald Statistics          Response: status_death 
## 
##  Factor                 Chi-Square d.f. P     
##  Age_tmnt               27.50       3   <.0001
##   Nonlinear              1.36       2   0.5062
##  total_length            2.09       2   0.3511
##   Nonlinear              1.68       1   0.1944
##  total_number_of_stents  1.66       1   0.1977
##  CAD_History             4.58       1   0.0324
##  smoke                   0.77       2   0.6819
##  Diabetes                0.80       1   0.3705
##  Hypertension            4.28       1   0.0385
##  Employed_status         0.96       1   0.3262
##  min_min_width           0.05       1   0.8166
##  Education_level         0.04       1   0.8366
##  Site_ID                 0.12       1   0.7271
##  TOTAL NONLINEAR         2.99       3   0.3937
##  TOTAL                  59.78      15   <.0001

plot(anova(f4))

we see Age ,CAD Hisry, Hypertension are important on splines.

Lets visualise performance

f5=update(f4, x=TRUE , y=TRUE)
  ##bootstrap-validate
  kk_rcs=validate(f5, B=200, digits=2)
   
  CalculateAucFromDxy <- function(validate) {
    ## Test if the object is correct
    stopifnot(class(validate) == "validate")
    
    ## Calculate AUCs from Dxy's
    aucs <- (validate["Dxy", c("index.orig","training","test","optimism","index.corrected")])/2 + 0.5
    
    ## Get n
    n <- validate["Dxy", c("n")]
    
    ## Combine as result
    res <- rbind(validate, AUC = c(aucs, n))
    
    ## Fix optimism
    res["AUC","optimism"] <- res["AUC","optimism"] - 0.5
    
    ## Return results
    res
  }
   knitr::kable(CalculateAucFromDxy(kk_rcs))

	index.orig	training	test	optimism	index.corrected	n
Dxy	0.3121648	0.3451806	0.2828722	0.0623084	0.2498565	200
R2	0.0468070	0.0578431	0.0380714	0.0197716	0.0270354	200
Intercept	0.0000000	0.0000000	-0.5451435	0.5451435	-0.5451435	200
Slope	1.0000000	1.0000000	0.8111273	0.1888727	0.8111273	200
Emax	0.0000000	0.0000000	0.1650013	0.1650013	0.1650013	200
D	0.0143981	0.0179074	0.0116519	0.0062555	0.0081425	200
U	-0.0004643	-0.0004643	0.0007468	-0.0012110	0.0007468	200
Q	0.0148624	0.0183717	0.0109051	0.0074666	0.0073958	200
B	0.0429553	0.0428236	0.0431470	-0.0003234	0.0432788	200
g	0.6796081	0.7708733	0.6169520	0.1539213	0.5256868	200
gp	0.0278185	0.0305516	0.0250552	0.0054964	0.0223221	200
AUC	0.6560824	0.6725903	0.6414361	0.0311542	0.6249282	200

we see AUC rise slightly to 0.622 after corrected.

Lets add penalization to account for imbalance

options(scipen=999)
penalty <- pentrace(f5, penalty = 0:30)
  mod1_pen <- update(f5, penalty=penalty$penalty)

  
  kk1=validate(mod1_pen, B=200, digits=2)
  knitr::kable(CalculateAucFromDxy(kk1))

	index.orig	training	test	optimism	index.corrected	n
Dxy	0.3110610	0.3415548	0.2930288	0.0485260	0.2625349	200
R2	0.0394375	0.0465180	0.0391848	0.0073332	0.0321043	200
Intercept	0.0000000	0.0000000	0.0262773	-0.0262773	0.0262773	200
Slope	1.0000000	1.0000000	1.0050002	-0.0050002	1.0050002	200
Emax	0.0000000	0.0000000	0.0067797	0.0067797	0.0067797	200
D	0.0133709	0.0142400	0.0120014	0.0022386	0.0111322	200
U	-0.0004643	-0.0004643	-0.0000470	-0.0004173	-0.0000470	200
Q	0.0138351	0.0147043	0.0120483	0.0026559	0.0111792	200
B	0.0429987	0.0425399	0.0430957	-0.0005559	0.0435546	200
g	0.5426337	0.5815942	0.5774444	0.0041498	0.5384839	200
gp	0.0240655	0.0253774	0.0253127	0.0000647	0.0240007	200
AUC	0.6555305	0.6707774	0.6465144	0.0242630	0.6312675	200
we see maxim	um AUROC achi	eved is 0.63

Lets now see performance of superstar AI algorithms

Random forest first

modelRF <- train(status_death ~., # Survived is a function of the variables we decided to include
                    data = stents11, # Use the trainSet dataframe as the training data
                    method = "rf",# Use the "random forest" algorithm
                    metric = "ROC",
                    trControl = fitControl)

library(knitr)
include_graphics('rf.png')

we see AUROC is 0.60. , below logistic regression.

Lets see explanatory variables-

varImpPlot(modelRF$finalModel)

library(knitr)
include_graphics('rf1.png')

while we get variable importance, we dont get effect sizes or statistical significance.

Lets see naive bayes now

modelNB <- train(status_death ~., # Survived is a function of the variables we decided to include
                    data = stentsml, # Use the trainSet dataframe as the training data
                    method = "nb",# Use the "naive bayes" algorithm
                    metric = "ROC",
                    trControl = fitControl
  )

library(knitr)
include_graphics('nb.png')

Here also AUROC is same with similar predictor variables

Thus logistic regression outperforms Random forest and naive bayes if performed with suitable optimisation accounting for non-linear veffects and penalization and also gives us effect sizes.

Key takeaway

1.Model of reality(cognitive) model must take priority over method and machine algorithms

2.Logistic regression is fast and can perform as better as ML algorithms or even better in small data set upto 5-10000 sample sizes(till there are many complex interactions)

3.Logistic regression is an explanatory model and hence can help us see effect sizes and avoid erroneous mistakes like stent number is associated with mortality just based on variable importance plot

4.EHR Databases while,great for indicating utilisation n audits (the purposes they were made for) are poor for outcome predction as they dont reflect Key clinical variables like(BP,killip class,pulse) eg and hence should be taken with a pinch of salt

5.While there have been genuine advances in Deep learning models, prediction algorithms are overhyped and good model n good data trumps everything

6.We should welcome the call for auditing and stenting registries in india and use of appropriate utisation criteria to benchmark the hospitals

Comparison of logistic regression and ML/AI models for mortality prediction on indian stent utilisation database

Anupam kumar Singh, MD

13 June 2018

Background

Aims

Visualise

Models

Key takeaway