Background

A new study came out claiming stents increase mortality and it set indian twitterverse ablaze. You can read the study here.

The authors ought to be congratulated for carrying a study examining indian stent database.considering lack of dedicated registry and appropriate use criteria here, as well as open-sourcing the dataset

While stents definitely have limited benefit in chronic stable angina(mild freedom from angina in stable single vessel disease,no mortality benefit). Stents have definite mortality benefit in unstable patients- Acute coronary syndromes(STEMI and high risk NSTEMI ) and 2 vessel stable Angina.

Given the fact that 50% of cohort was acute coronary syndrome, i was surprised at these findings.

i and many others on twitter were intringued thinking that higher number of stents indeed might be a proxy for severe disease because though the thrombogenic potential of stents is well known , 15 year [long term RCT data of stents in stable angina)[https://www.nejm.org/doi/full/10.1056/nejmoa1505532] has failed to show a higher mortality(has not reduced it as well) with symptomatic benefit.

Also there are clear cut benefits of stents in Acute MI .

I was also surprised that in one of the figures while the total number of stents did not attain statistical significance in multivariate model to account for confounders, it was claimed as a predictor of mortality.

library(knitr)
include_graphics('stenter.png')

I communicated with authors who were generous with comments and explained that the explanatory model was secondary objective of the study(primary being comparison of DES n BMS) and they used a random forest model which does not generated traditional effect sizes and also on univariate analysis stent number was indeed associated with mortality(however it was not on adjusting for confounders)

However i remain in doubt because explanatory figure indicates a tradtional statistical model was indeed used(chi square n p values) which failed to show statistical significance for number of stents in multivariate model on adjusting and hence authors used the black box random forest variable importance plot for their assertion.

The authors said they used random forest model because of -a) non linear effect b)imbalance. Howver in my opinion it is a weak criticism as in medicine we need explanatory models as well and imbalance and non linear issues can be taken care of by adding penalised logistic regression and highe rorder splines for explanatory variables.

While i was a big fan of AI/ML methods, and have used them to develop models for prediction of mortality in hemorrhagic stroke, i have changed my mind on the subject and believe in spirit of parsimony and a simple logistic regression with appropriate modifications can do the job as well with additional advantage of giving effect sizes.

you can read the take down of ML models by Frank Harrell here

Also of note, in MI the databases often dont measure really granular details like vitals well and often have coding errors, so IMO while relying on them for stent utilization might be a good idea, realizing on them to predict outcomes is surely a hit and miss.

Aims

  1. Visualisation of Explanatory variables in patients with PCI
  2. Comparison of ML/AI algorithms with logisitc regression

Visualise

Lets read the data

library(tidyverse)
stents=read_csv('https://doi.org/10.1371/journal.pone.0196830.s006')
stents11= stents %>% mutate(status_death=ifelse(stents$status_death==0,"Alive","Dead")) %>% select(CAD_History,status_death,Age_tmnt,Sex_tmnt,total_length,total_number_of_stents,Site_ID,year_ptca,
         smoke,Diabetes,Hypertension,Employed_status,min_min_width,Education_level) %>% drop_na()

we see mortality by number of stents

knitr::kable(stents11 %>% mutate(status_death=as.factor(status_death),stents=as.factor(total_number_of_stents ))%>% group_by(status_death,stents) %>% summarise(n=n()) %>% 
      ungroup() %>% 
    group_by(stents) %>% 
      mutate(per=paste0(round(n/sum(n)*100, 2), "%")) %>% arrange(stents))
status_death stents n per
Alive 1 2324 96.23%
Dead 1 91 3.77%
Alive 2 1439 94.73%
Dead 2 80 5.27%
Alive 3 298 93.42%
Dead 3 21 6.58%
Alive 4 45 91.84%
Dead 4 4 8.16%
Alive 5 5 83.33%
Dead 5 1 16.67%

So indeed mortality rises with number of stents placed,but as discussed before it can be reflective of sicker patients as well, and its true value can be discovered only after adjusting for confounders.

Let us analyse by site

stents11 %>% mutate(status_death=as.factor(status_death),site=as.factor(Site_ID) )%>% group_by(status_death,site) %>% count() %>% ungroup %>% 
      # rename(status_death=as.factor(status_death),year=as.factor(year_ptca)) %>% 
      ggplot(aes(x = status_death, y = n, fill = site)) +
      geom_bar(position = "fill" ,stat="identity") +coord_flip()

we see there is no difference in percent of alive and dead dpending upon site

Let us analyse by CAD history-

knitr::kable(stents11 %>% mutate(status_death=as.factor(status_death),history=as.factor(CAD_History) )%>% group_by(status_death,history) %>% summarise(n=n())%>% 
      mutate(freq = n / sum(n))
    )
status_death history n freq
Alive No 3517 0.8555096
Alive Yes 594 0.1444904
Dead No 182 0.9238579
Dead Yes 15 0.0761421

we see a very weird finding that those with history of CAD are less likely to die in the database

Lets plot it

stents11 %>% mutate(status_death=as.factor(status_death),history=as.factor(CAD_History) )%>% group_by(status_death,history) %>% count() %>% ungroup %>% 
      # rename(status_death=as.factor(status_death),year=as.factor(year_ptca)) %>% 
      ggplot(aes(x = status_death, y = n, fill = history)) +
      geom_bar(position = "fill" ,stat="identity") +coord_flip()

Lets visualise mortality by year

  stents11 %>% mutate(status_death=as.factor(status_death),year=as.factor(year_ptca) )%>% group_by(status_death,year_ptca) %>% count() %>% group_by(year_ptca) %>% 
    mutate(freq = n / sum(n)) %>% arrange(year_ptca)
## # A tibble: 10 x 4
## # Groups:   year_ptca [5]
##    status_death year_ptca     n   freq
##    <fct>            <int> <int>  <dbl>
##  1 Alive             2012   645 0.949 
##  2 Dead              2012    35 0.0515
##  3 Alive             2013  1117 0.956 
##  4 Dead              2013    52 0.0445
##  5 Alive             2014   607 0.951 
##  6 Dead              2014    31 0.0486
##  7 Alive             2015  1009 0.957 
##  8 Dead              2015    45 0.0427
##  9 Alive             2016   733 0.956 
## 10 Dead              2016    34 0.0443

we see there is average 1 year mortality of 4.4-5% in PCI while it is around 1.7-2% in USA. But it depends on a lot of factor and in my opinion cant be attributed to PCI alone.

stents11 %>% mutate(status_death=as.factor(status_death),year=as.factor(year_ptca) )%>% group_by(status_death,year_ptca) %>% count() %>% ungroup %>% #%>% 
   # rename(status_death=as.factor(status_death),year=as.factor(year_ptca)) %>% 
  ggplot(aes(x = year_ptca, y = n, fill = status_death)) +
  geom_bar(position = "fill" ,stat="identity") +coord_flip()

Lets plot utilisation

knitr::include_graphics('stentra.png')

we see while totalstent utilisation has remained constant , DES have overtaken BMS due to fall in cost.

Ok now lets visualise models

Models

Lets assess performance of traditional logistic regression model with explanatory variables used by the paper. Age,Sex,total_length,total_number_of_stents,Site_ID,year_ptca, Diabetes,Hypertension,Employed_status,min_min_width,Education_level

library(randomForest)
library(caret)

  fitControl <- trainControl(method = "repeatedcv",number = 5,repeats = 5,
                             ## Estimate class probabilities 
                             classProbs = TRUE,
                             ## Evaluate performance using 
                             ## the following function
                             summaryFunction = twoClassSummary)
  
  modelLR <- train(status_death ~., # Survived is a function of the variables we decided to include
                    data = stents11, # Use the trainSet dataframe as the training data
                    method = "glm",# Use the "logistic regression" algorithm
                    family = "binomial",
                    metric = "ROC",
                    trControl = fitControl
  )
  
  modelLR
## Generalized Linear Model 
## 
## 4308 samples
##   13 predictors
##    2 classes: 'Alive', 'Dead' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 3447, 3446, 3447, 3446, 3446, 3446, ... 
## Resampling results:
## 
##   ROC        Sens  Spec
##   0.6175299  1     0

we see traditional logistic regression model has an AUROC of 0.61 (0.5 is as good as tossing a coin),So it is not a good performance as we would expect a good model to have at least 0.8 or above.

lets see summary of model

summary(modelLR)
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.6473  -0.3432  -0.2782  -0.2223   2.8348  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                111.129747 119.377654   0.931    0.352    
## CAD_HistoryYes              -0.571956   0.275627  -2.075    0.038 *  
## Age_tmnt                     0.040256   0.008036   5.010 5.45e-07 ***
## Sex_tmntMale                 0.074827   0.185619   0.403    0.687    
## total_length                 0.003753   0.005720   0.656    0.512    
## total_number_of_stents       0.216882   0.160762   1.349    0.177    
## `Site_IDOut of Mumbai`       0.003958   0.170019   0.023    0.981    
## year_ptca                   -0.058032   0.059308  -0.978    0.328    
## `smokeNon Smoker`           -0.127865   0.215976  -0.592    0.554    
## `smokePast Smoker`           0.032775   0.272804   0.120    0.904    
## DiabetesYes                  0.144985   0.157838   0.919    0.358    
## HypertensionYes             -0.318571   0.157391  -2.024    0.043 *  
## Employed_statusYes          -0.181647   0.179349  -1.013    0.311    
## min_min_width                0.031934   0.184056   0.173    0.862    
## Education_levelSome_Higher  -0.048303   0.156358  -0.309    0.757    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1600.4  on 4307  degrees of freedom
## Residual deviance: 1539.2  on 4293  degrees of freedom
## AIC: 1569.2
## 
## Number of Fisher Scoring iterations: 6

we see while Age,CAD History and Hypertension ar related with mortality,total number of stents is not.

Lets see if we can get better model effect by using non linear effect on age which is an important variable

 library(rms)
stents12=stents11 %>% mutate(status_death=ifelse(status_death=="Dead",1,0))
  
ddd <- datadist(stents12)
  options( datadist = "ddd" )
  f4=lrm(status_death~rcs(Age_tmnt,4)+rcs(total_length,3)+total_number_of_stents+CAD_History+smoke+
           Diabetes+Hypertension+Employed_status+min_min_width+Education_level+Site_ID,data=stents11)
  print(anova(f4))
##                 Wald Statistics          Response: status_death 
## 
##  Factor                 Chi-Square d.f. P     
##  Age_tmnt               27.50       3   <.0001
##   Nonlinear              1.36       2   0.5062
##  total_length            2.09       2   0.3511
##   Nonlinear              1.68       1   0.1944
##  total_number_of_stents  1.66       1   0.1977
##  CAD_History             4.58       1   0.0324
##  smoke                   0.77       2   0.6819
##  Diabetes                0.80       1   0.3705
##  Hypertension            4.28       1   0.0385
##  Employed_status         0.96       1   0.3262
##  min_min_width           0.05       1   0.8166
##  Education_level         0.04       1   0.8366
##  Site_ID                 0.12       1   0.7271
##  TOTAL NONLINEAR         2.99       3   0.3937
##  TOTAL                  59.78      15   <.0001
plot(anova(f4))

we see Age ,CAD Hisry, Hypertension are important on splines.

Lets visualise performance

f5=update(f4, x=TRUE , y=TRUE)
  ##bootstrap-validate
  kk_rcs=validate(f5, B=200, digits=2)
   
  CalculateAucFromDxy <- function(validate) {
    ## Test if the object is correct
    stopifnot(class(validate) == "validate")
    
    ## Calculate AUCs from Dxy's
    aucs <- (validate["Dxy", c("index.orig","training","test","optimism","index.corrected")])/2 + 0.5
    
    ## Get n
    n <- validate["Dxy", c("n")]
    
    ## Combine as result
    res <- rbind(validate, AUC = c(aucs, n))
    
    ## Fix optimism
    res["AUC","optimism"] <- res["AUC","optimism"] - 0.5
    
    ## Return results
    res
  }
   knitr::kable(CalculateAucFromDxy(kk_rcs))
index.orig training test optimism index.corrected n
Dxy 0.3121648 0.3451806 0.2828722 0.0623084 0.2498565 200
R2 0.0468070 0.0578431 0.0380714 0.0197716 0.0270354 200
Intercept 0.0000000 0.0000000 -0.5451435 0.5451435 -0.5451435 200
Slope 1.0000000 1.0000000 0.8111273 0.1888727 0.8111273 200
Emax 0.0000000 0.0000000 0.1650013 0.1650013 0.1650013 200
D 0.0143981 0.0179074 0.0116519 0.0062555 0.0081425 200
U -0.0004643 -0.0004643 0.0007468 -0.0012110 0.0007468 200
Q 0.0148624 0.0183717 0.0109051 0.0074666 0.0073958 200
B 0.0429553 0.0428236 0.0431470 -0.0003234 0.0432788 200
g 0.6796081 0.7708733 0.6169520 0.1539213 0.5256868 200
gp 0.0278185 0.0305516 0.0250552 0.0054964 0.0223221 200
AUC 0.6560824 0.6725903 0.6414361 0.0311542 0.6249282 200

we see AUC rise slightly to 0.622 after corrected.

Lets add penalization to account for imbalance

options(scipen=999)
penalty <- pentrace(f5, penalty = 0:30)
  mod1_pen <- update(f5, penalty=penalty$penalty)

  
  kk1=validate(mod1_pen, B=200, digits=2)
  knitr::kable(CalculateAucFromDxy(kk1))
index.orig training test optimism index.corrected n
Dxy 0.3110610 0.3415548 0.2930288 0.0485260 0.2625349 200
R2 0.0394375 0.0465180 0.0391848 0.0073332 0.0321043 200
Intercept 0.0000000 0.0000000 0.0262773 -0.0262773 0.0262773 200
Slope 1.0000000 1.0000000 1.0050002 -0.0050002 1.0050002 200
Emax 0.0000000 0.0000000 0.0067797 0.0067797 0.0067797 200
D 0.0133709 0.0142400 0.0120014 0.0022386 0.0111322 200
U -0.0004643 -0.0004643 -0.0000470 -0.0004173 -0.0000470 200
Q 0.0138351 0.0147043 0.0120483 0.0026559 0.0111792 200
B 0.0429987 0.0425399 0.0430957 -0.0005559 0.0435546 200
g 0.5426337 0.5815942 0.5774444 0.0041498 0.5384839 200
gp 0.0240655 0.0253774 0.0253127 0.0000647 0.0240007 200
AUC 0.6555305 0.6707774 0.6465144 0.0242630 0.6312675 200
we see maxim um AUROC achi eved is 0.63

Lets now see performance of superstar AI algorithms

Random forest first

modelRF <- train(status_death ~., # Survived is a function of the variables we decided to include
                    data = stents11, # Use the trainSet dataframe as the training data
                    method = "rf",# Use the "random forest" algorithm
                    metric = "ROC",
                    trControl = fitControl)
library(knitr)
include_graphics('rf.png')

we see AUROC is 0.60. , below logistic regression.

Lets see explanatory variables-

varImpPlot(modelRF$finalModel)
library(knitr)
include_graphics('rf1.png')

while we get variable importance, we dont get effect sizes or statistical significance.

Lets see naive bayes now

modelNB <- train(status_death ~., # Survived is a function of the variables we decided to include
                    data = stentsml, # Use the trainSet dataframe as the training data
                    method = "nb",# Use the "naive bayes" algorithm
                    metric = "ROC",
                    trControl = fitControl
  )
library(knitr)
include_graphics('nb.png')

Here also AUROC is same with similar predictor variables

Thus logistic regression outperforms Random forest and naive bayes if performed with suitable optimisation accounting for non-linear veffects and penalization and also gives us effect sizes.

Key takeaway

1.Model of reality(cognitive) model must take priority over method and machine algorithms

2.Logistic regression is fast and can perform as better as ML algorithms or even better in small data set upto 5-10000 sample sizes(till there are many complex interactions)

3.Logistic regression is an explanatory model and hence can help us see effect sizes and avoid erroneous mistakes like stent number is associated with mortality just based on variable importance plot

4.EHR Databases while,great for indicating utilisation n audits (the purposes they were made for) are poor for outcome predction as they dont reflect Key clinical variables like(BP,killip class,pulse) eg and hence should be taken with a pinch of salt

5.While there have been genuine advances in Deep learning models, prediction algorithms are overhyped and good model n good data trumps everything

6.We should welcome the call for auditing and stenting registries in india and use of appropriate utisation criteria to benchmark the hospitals