Exploratory Data Analysis

library(nsqipColonICG)
library(ggplot2)
library(magrittr)
library(dplyr)
library(ggridges)
library(viridis)
library(hrbrthemes)
library(vcd)
library(tidyr)

Single variable plots

Index plots

These plots demonstrate no weird discrepancies in the outcome variables. By tabling them:

To get a clearer idea of how to the composite outcome breaks down:

It seems like the vast majorty of adverse outcomes are due to organ-space SSI. Most of those patients with an organ-space SSI seem to have an associated leak. Also interesting that of people with a leak, most get an organ-space SSI. We can looks at this more closely:

It seems approximately 74% of patients that have a leak will develop an organ-space SSI. This is good background for our hypothesis. If we can reduce rates of leak with ICG, we should theoretically reduce rates of organ-space SSI. We will see if this pans out in the data.

Given this is a contigency table, we can run a Chi-square test for independence:

This significant p-value suggests we can reject the \(H_0\) hypothesis that anastomotic leaks are independent of organ-space SSIs. This makes clinical sense.

Explanatory variables

This section is to get an overview of the potential explanatory variables in our data and look for any discrepancies or outliers in our data. It should also generate additional questions.

Age

Let’s see if age is related to rates of anastomotic leak and organ-space SSI.

The distributions do not seem markedly different. Perhaps boxplots will convice you:

Is age related to the choice to use ICG?

It would appear not.

Nonetheless, a simple unadjusted regression of age on the outcomes suggest increasing age has a very small but significant protective effect.

Pre-operative albumin

Pre-operative albumin seems likely to have an effect on anastomotic leaks.

First, we can look at the distrubtion of pre-operative albumin:

This seems fairly negatively skewed. We can check skewness:

Indeed it is. A Q-Q plot demonstrates this graphically:

We may be able to reduce the skew by transforming the variable.

A square transformation seems to reduce the skewness quite markedly, with the exception of the extreme values:

We can regress the albumin squared in an unadjusted model.

x <- glm(col_anastomotic ~ pralbum^2, family = binomial(link = "logit"), data = df)
summary(x)
#> 
#> Call:
#> glm(formula = col_anastomotic ~ pralbum^2, family = binomial(link = "logit"), 
#>     data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.4212  -0.2750  -0.2566  -0.2395   2.9171  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -2.02613    0.08566  -23.65   <2e-16 ***
#> pralbum     -0.35146    0.02307  -15.24   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 35248  on 117658  degrees of freedom
#> Residual deviance: 35028  on 117657  degrees of freedom
#>   (49801 observations deleted due to missingness)
#> AIC: 35032
#> 
#> Number of Fisher Scoring iterations: 6
exp(coef(x))
#> (Intercept)     pralbum 
#>   0.1318453   0.7036626

x <- glm(orgspcssi ~ pralbum^2, family = binomial(link = "logit"), data = df)
summary(x)
#> 
#> Call:
#> glm(formula = orgspcssi ~ pralbum^2, family = binomial(link = "logit"), 
#>     data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.6485  -0.3396  -0.3051  -0.2740   3.2386  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -0.90366    0.06753  -13.38   <2e-16 ***
#> pralbum     -0.54878    0.01854  -29.60   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 47343  on 117658  degrees of freedom
#> Residual deviance: 46524  on 117657  degrees of freedom
#>   (49801 observations deleted due to missingness)
#> AIC: 46528
#> 
#> Number of Fisher Scoring iterations: 6
exp(coef(x))
#> (Intercept)     pralbum 
#>   0.4050849   0.5776557

This makes sense. A 1 unit increase in albumin squared results in 30% decrease in the odds of having an anastomotic leak and nearly 50% decrease in the odds of an organ-space surgical site infection. This will definitely be in our match and model.

Diabetes

Diabetes seems like another likely risk factor for leak and infection.

Despite these rates seeming close, there is enough of a sample size to result in a significant Chi-square test. Diabetes will definitely go in our match and our model.

The same is true as diabetes relates to organ-space SSI.

Insulin

Insulin is obviously highly correlated to the diagnosis of diabetes and it is likely there is an interaction between diabetes and insulin. We should check for an interaction effect:

The rows refer to diabetes and the columns to insulin. The values are the rates of anastomotic leak/organ-space SSI. This makes it clear that insulin contributes predominantly to the rise in organ-space SSI seen in patients with diabetes. The rate of organ-space SSI in patients with diabetes but not taking insulin is similar to the rate of organ-space SSI in patients without diabetes.

We will need to either just use insulin or include an interaction term in our model.

Height/Weight

It is probably easiest to deal with height and weight as BMI. First we can check the distribution:

Seems to be right skewed:

We can correct positive skewness with a log transformation or square root transformation:

We can regress the log-transformed BMI in an unadjusted model.

x <- glm(col_anastomotic ~ log(nsqipBileSpill::bmi(height, weight)), family = binomial(link = "logit"), data = df)
summary(x)
#> 
#> Call:
#> glm(formula = col_anastomotic ~ log(nsqipBileSpill::bmi(height, 
#>     weight)), family = binomial(link = "logit"), data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.2691  -0.2606  -0.2596  -0.2586   2.6275  
#> 
#> Coefficients:
#>                                          Estimate Std. Error z value Pr(>|z|)
#> (Intercept)                              -3.55000    0.20508 -17.311   <2e-16
#> log(nsqipBileSpill::bmi(height, weight))  0.05332    0.06145   0.868    0.386
#>                                             
#> (Intercept)                              ***
#> log(nsqipBileSpill::bmi(height, weight))    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 47977  on 164780  degrees of freedom
#> Residual deviance: 47976  on 164779  degrees of freedom
#>   (2679 observations deleted due to missingness)
#> AIC: 47980
#> 
#> Number of Fisher Scoring iterations: 6
exp(coef(x))
#>                              (Intercept) 
#>                               0.02872473 
#> log(nsqipBileSpill::bmi(height, weight)) 
#>                               1.05476223

x <- glm(orgspcssi ~ log(nsqipBileSpill::bmi(height, weight)), family = binomial(link = "logit"), data = df)
summary(x)
#> 
#> Call:
#> glm(formula = orgspcssi ~ log(nsqipBileSpill::bmi(height, weight)), 
#>     family = binomial(link = "logit"), data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.3394  -0.3116  -0.3077  -0.3034   2.5585  
#> 
#> Coefficients:
#>                                          Estimate Std. Error z value Pr(>|z|)
#> (Intercept)                               -2.4307     0.1753 -13.863  < 2e-16
#> log(nsqipBileSpill::bmi(height, weight))  -0.1789     0.0527  -3.394  0.00069
#>                                             
#> (Intercept)                              ***
#> log(nsqipBileSpill::bmi(height, weight)) ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 61788  on 164780  degrees of freedom
#> Residual deviance: 61776  on 164779  degrees of freedom
#>   (2679 observations deleted due to missingness)
#> AIC: 61780
#> 
#> Number of Fisher Scoring iterations: 5
exp(coef(x))
#>                              (Intercept) 
#>                               0.08797511 
#> log(nsqipBileSpill::bmi(height, weight)) 
#>                               0.83622534

Smoking

Smoking seems a likely candidate as a good explanatory variable.

I’d like to control for packs, but the data is missing for most of our records:

Alcohol

Alcohol would be a good predictor I think, but unfortunately the data is missing for most of our records:

Pre-operative functional status

fnstatus2 is almost always related to poor outcomes.

Given the relatively low numbers of totally dependent (which will get even smaller after inclusion/exclusion criteria and the match), I almost always have to reduce this variable to a boolean (Independent vs. not independent):

We will include dependent in our model. Alternatively, these patients could excluded but we should be able to match and then adjust for the effect.

Pre-operative ventilator use

There are not many patients on ventilators. But I would guess its a significant risk factor.

Turns out this seems to be the biggest risk factor we’ve explored so far. Let’s explore any relationship between ICG and pre-operative ventilator use. Are surgeons using ICG more in patients on a ventilator, maybe because they have a high a priori suspicision?

Actually, ICG was only used on one ventilator-dependent patient. Unfortunately we won’t be able to make any inferences about the utility of ICG in this population.

ventilat will definitely go in our model and be matched for (unless you exclude these patients).

Other variables

To speed up the rest, I’ll make a function and test all exposures simultaneously. This is only for categorical exposures because I am running Chi-square tests (so not age, height, weight, etc. We can test those separately):

exposure n nExposureFALSE outcomeExposureFALSE nExposureTRUE outcomeExposureTRUE p sig
icg 167460 160781 0.0335415 1099 0.0272975 0.2880172
insulin 167460 153832 0.0332814 8332 0.0376860 0.0318039 *
smoke 167460 135637 0.0308874 27500 0.0468000 0.0000000 ***
dependent 166778 157470 0.0331137 3915 0.0492976 0.0000000 ***
ventilat 167460 160995 0.0332895 921 0.0716612 0.0000000 ***
hxcopd 167460 153845 0.0329688 8370 0.0436081 0.0000002 ***
hxchf 167460 160182 0.0333420 1753 0.0484883 0.0005837 ***
hxmi 294 139 0.0544218 147 0.0544218 1.0000000
hxangina 294 139 0.0544218 147 0.0544218 1.0000000
hypermed 167460 84206 0.0327153 80406 0.0343507 0.0651112
renafail 167460 161232 0.0333929 658 0.0607903 0.0001510 ***
dialysis 167460 160438 0.0332614 1502 0.0599201 0.0000000 ***
discancr 167460 152543 0.0325050 9792 0.0495302 0.0000000 ***
wndinf 167460 159536 0.0330331 2474 0.0646726 0.0000000 ***
steroid 167460 149773 0.0324679 12661 0.0461259 0.0000000 ***
wtloss 167460 155186 0.0326148 7042 0.0536779 0.0000000 ***
bleeddis 167460 155976 0.0327190 6208 0.0538015 0.0000000 ***
transfus 167460 158064 0.0331117 3983 0.0494602 0.0000000 ***
chemo 294 139 0.0544218 147 0.0544218 1.0000000
radio 232 112 0.0344828 116 0.0344828 1.0000000
prsepis 167460 151319 0.0316018 11203 0.0599839 0.0000000 ***
emergncy 167460 147145 0.0315713 15518 0.0523908 0.0000000 ***
lap 167434 46539 0.0478538 118556 0.0275819 0.0000000 ***
col_steroid 165708 151466 0.0327039 9121 0.0462669 0.0000000 ***
col_mech_bowel_prep 144430 49963 0.0452322 92100 0.0265689 0.0000000 ***
col_oral_antibiotic 146697 77603 0.0417608 65712 0.0234813 0.0000000 ***
col_chemo 165764 152056 0.0324026 8616 0.0522284 0.0000000 ***
col_emergent 6608 897 0.0567823 5657 0.0549761 0.8816393
cancer 167460 91816 0.0345321 72360 0.0321448 0.0074578 **
exposure n nExposureFALSE outcomeExposureFALSE nExposureTRUE outcomeExposureTRUE p sig
icg 167460 158564 0.0468680 1099 0.0436761 0.6690174
insulin 167460 151749 0.0463715 8332 0.0559289 0.0000639 ***
smoke 167460 133859 0.0435910 27500 0.0634182 0.0000000 ***
dependent 166778 155348 0.0461431 3915 0.0720307 0.0000000 ***
ventilat 167460 158810 0.0464095 921 0.1259501 0.0000000 ***
hxcopd 167460 151749 0.0461437 8370 0.0602151 0.0000000 ***
hxchf 167460 157972 0.0466788 1753 0.0627496 0.0018663 **
hxmi 294 140 0.0476190 147 0.0476190 1.0000000
hxangina 294 140 0.0476190 147 0.0476190 1.0000000
hypermed 167460 82852 0.0482689 80406 0.0453076 0.0043230 **
renafail 167460 159031 0.0465882 658 0.1124620 0.0000000 ***
dialysis 167460 158252 0.0464334 1502 0.0925433 0.0000000 ***
discancr 167460 150600 0.0448284 9792 0.0793505 0.0000000 ***
wndinf 167460 157425 0.0458281 2474 0.1147939 0.0000000 ***
steroid 167460 147940 0.0443091 12661 0.0778769 0.0000000 ***
wtloss 167460 153121 0.0454874 7042 0.0778188 0.0000000 ***
bleeddis 167460 153895 0.0456242 6208 0.0786082 0.0000000 ***
transfus 167460 155918 0.0462389 3983 0.0718052 0.0000000 ***
chemo 294 140 0.0476190 147 0.0476190 1.0000000
radio 232 113 0.0258621 116 0.0258621 1.0000000
prsepis 167460 149815 0.0412270 11203 0.1252343 0.0000000 ***
emergncy 167460 145561 0.0419963 15518 0.0943421 0.0000000 ***
lap 167434 45233 0.0745734 118556 0.0354179 0.0000000 ***
col_steroid 165708 149524 0.0451059 9121 0.0792676 0.0000000 ***
col_mech_bowel_prep 144430 48689 0.0695777 92100 0.0353529 0.0000000 ***
col_oral_antibiotic 146697 76117 0.0601099 65712 0.0326729 0.0000000 ***
col_chemo 165764 150037 0.0452503 8616 0.0763695 0.0000000 ***
cancer 167460 90103 0.0525447 72360 0.0393588 0.0000000 ***

It is clear that many of these exposures are related to both of these outcomes. Given that organ-space SSI and anastomotic leak are inextricably linked (leak is almost surely a significant risk factor for organ-space SSI), it doesn’t really make sense to look at the impact of ICG on both outcomes. This doesn’t make sense clinically. No one would use ICG to reduce rates of organ-space SSI, they would use it to reduce rates of anastomotic leak. To avoid this weird interaction between your two outcomes, I would narrow the question to a single outcome - does ICG reduce rates of anastomotic leak? I think this is clearer study and will be much simpler for readers to understand. It also aligns with the ICG use case.

From here on out, I will be considering anastomotic leak as the only outcome.

There are a lot of variables in the above analysis that are significant but there are probably some interactions, so we should dig into it. This is definitely non-exhaustive and the decision to investigate various variables or interaction effects requires some clinical knowledge, so if you think I have missed something, please let me know.

Colon cancer vs. oral antibiotics/mechanical preparation

I figure that patients with cancer are getting elective surgery, so they have time for oral antibiotics and mechanical preparation. This should explain why cancer seems to be associated with lower rates of anastomotic leak.

Interestingly, it seems like cancer has a benefit in the absence of oral antibiotic prep and that the oral antibiotic preparation acutally mitigates the negative effect of not having cancer.

The same seems to be true for mechanical bowel preparation.

Cancer vs. emergent surgery

Maybe the “benefit” of cancer is present because actually most patients with an indication of cancer are having surgery electively rather than emergently.

Unfortunately, emergency vs. elective surgery does not seem to explain the interaction.

Cancer vs. steroids

Maybe steroid use is higher in non-cancer cases (such as for IBD) and relatively low in cancer cases. This might lead to higher rates of breakdown in the non-cancer cases and a relative protective effect of cancer.

Still no good explanation.

Cancer vs. chemotherapy

Cancer vs. disseminated cancer

Well I’m not sure why cancer seems to have a protective effect. Maybe you have other ideas. I think it might be simply due to what the other indications are:

The other conditions are pretty nasty conditions that probably lead to inflammation or devascularation at the anastomotic site. Cancer is a relatively well localized lesion with other healthy tissue at the anastomosis. Maybe this explains why cancer has a relative protective effect.

Mechanical bowel preparation and oral antibiotic

There is certainly an interaction to account for here. Patients that have both antibiotic and mechanical prepration had the lowest rates of anastomotic leak and patients that had neither had the highest. This interaction will definitely need to be controlled for.

Renal failure and dialysis

There is an interaction to account for here. Patients that have both acute renal failure and dialysis have the highest rates. Patients with neither have the lowest. Patients having one or the other are in-between at relatively equal rates. This makes clinical sense, which is reassuring. This interaction will definitely need to be controlled for.