library(nsqipColonICG)
library(ggplot2)
library(magrittr)
library(dplyr)
library(ggridges)
library(viridis)
library(hrbrthemes)
library(vcd)
library(tidyr)
df %>% mutate(comp = df$orgspcssi | df$wndinfd | df$col_anastomotic) %>%
ggplot() +
geom_bar(aes(x = comp))These plots demonstrate no weird discrepancies in the outcome variables. By tabling them:
table(df$col_anastomotic)
#>
#> FALSE TRUE
#> 161850 5610
table(df$orgspcssi)
#>
#> FALSE TRUE
#> 159615 7845
table(df$wndinfd)
#>
#> FALSE TRUE
#> 166195 1265
table(df$orgspcssi | df$wndinfd | df$col_anastomotic)
#>
#> FALSE TRUE
#> 157191 10269To get a clearer idea of how to the composite outcome breaks down:
df %>%
select(orgspcssi, wndinfd, col_anastomotic) %>%
group_by(orgspcssi, wndinfd, col_anastomotic) %>%
summarise(count = n())
#> `summarise()` regrouping output by 'orgspcssi', 'wndinfd' (override with `.groups` argument)
#> # A tibble: 8 x 4
#> # Groups: orgspcssi, wndinfd [4]
#> orgspcssi wndinfd col_anastomotic count
#> <lgl> <lgl> <lgl> <int>
#> 1 FALSE FALSE FALSE 157191
#> 2 FALSE FALSE TRUE 1268
#> 3 FALSE TRUE FALSE 955
#> 4 FALSE TRUE TRUE 201
#> 5 TRUE FALSE FALSE 3662
#> 6 TRUE FALSE TRUE 4074
#> 7 TRUE TRUE FALSE 42
#> 8 TRUE TRUE TRUE 67It seems like the vast majorty of adverse outcomes are due to organ-space SSI. Most of those patients with an organ-space SSI seem to have an associated leak. Also interesting that of people with a leak, most get an organ-space SSI. We can looks at this more closely:
df %>%
select(orgspcssi, col_anastomotic) %>%
group_by(col_anastomotic, orgspcssi) %>%
summarise(count = n()) %>%
mutate(rate = count/sum(count))
#> `summarise()` regrouping output by 'col_anastomotic' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: col_anastomotic [2]
#> col_anastomotic orgspcssi count rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 158146 0.977
#> 2 FALSE TRUE 3704 0.0229
#> 3 TRUE FALSE 1469 0.262
#> 4 TRUE TRUE 4141 0.738It seems approximately 74% of patients that have a leak will develop an organ-space SSI. This is good background for our hypothesis. If we can reduce rates of leak with ICG, we should theoretically reduce rates of organ-space SSI. We will see if this pans out in the data.
Given this is a contigency table, we can run a Chi-square test for independence:
chisq.test(table(df$col_anastomotic, df$orgspcssi))
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: table(df$col_anastomotic, df$orgspcssi)
#> X-squared = 62106, df = 1, p-value < 2.2e-16This significant p-value suggests we can reject the \(H_0\) hypothesis that anastomotic leaks are independent of organ-space SSIs. This makes clinical sense.
This section is to get an overview of the potential explanatory variables in our data and look for any discrepancies or outliers in our data. It should also generate additional questions.
df %>%
group_by(icg, col_anastomotic) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n))
#> `summarise()` regrouping output by 'icg' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: icg [2]
#> icg col_anastomotic n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 160781 0.966
#> 2 FALSE TRUE 5580 0.0335
#> 3 TRUE FALSE 1069 0.973
#> 4 TRUE TRUE 30 0.0273This table suggests a lower rate of patients have an anastomotic leak when using ICG.
A quick Chi-square test of independence:
chisq.test(table(df$icg, df$col_anastomotic))
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: table(df$icg, df$col_anastomotic)
#> X-squared = 1.1289, df = 1, p-value = 0.288There is not yet enough to reject the \(H_0\). We can check for organ-space surgical site infection too.
df %>% group_by(icg, orgspcssi) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n))
#> `summarise()` regrouping output by 'icg' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: icg [2]
#> icg orgspcssi n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 158564 0.953
#> 2 FALSE TRUE 7797 0.0469
#> 3 TRUE FALSE 1051 0.956
#> 4 TRUE TRUE 48 0.0437These rates look pretty close too. We can check a quick Chi-square:
chisq.test(table(df$icg, df$orgspcssi))
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: table(df$icg, df$orgspcssi)
#> X-squared = 0.18275, df = 1, p-value = 0.669There is not yet enough to reject the \(H_0\).
Let’s see if age is related to rates of anastomotic leak and organ-space SSI.
df %>%
ggplot(aes(x = age, y = col_anastomotic, fill = ..x..)) +
geom_density_ridges_gradient() +
scale_fill_viridis(option = "B") +
theme(legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8))
#> Picking joint bandwidth of 1.82
df %>%
ggplot(aes(x = age, y = orgspcssi, fill = ..x..)) +
geom_density_ridges_gradient() +
scale_fill_viridis(option = "B") +
theme(legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8))
#> Picking joint bandwidth of 1.78The distributions do not seem markedly different. Perhaps boxplots will convice you:
df %>%
ggplot(aes(x = age, y = col_anastomotic, group = col_anastomotic)) +
geom_boxplot()
#> Warning: Removed 1 rows containing non-finite values (stat_boxplot).
df %>%
ggplot(aes(x = age, y = orgspcssi, group = orgspcssi)) +
geom_boxplot()
#> Warning: Removed 1 rows containing non-finite values (stat_boxplot).Is age related to the choice to use ICG?
df %>%
ggplot(aes(x = age, y = icg, group = icg)) +
geom_boxplot()
#> Warning: Removed 1 rows containing non-finite values (stat_boxplot).It would appear not.
x <- glm(col_anastomotic ~ age, data = df, family = binomial(link = "logit"))
summary(x)
#>
#> Call:
#> glm(formula = col_anastomotic ~ age, family = binomial(link = "logit"),
#> data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.2828 -0.2651 -0.2598 -0.2550 2.6459
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.1311534 0.0565792 -55.341 < 2e-16 ***
#> age -0.0037610 0.0009002 -4.178 2.94e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 49135 on 167458 degrees of freedom
#> Residual deviance: 49118 on 167457 degrees of freedom
#> (1 observation deleted due to missingness)
#> AIC: 49122
#>
#> Number of Fisher Scoring iterations: 6
exp(coef(x))
#> (Intercept) age
#> 0.0436674 0.9962460x <- glm(orgspcssi ~ age, data = df, family = binomial(link = "logit"))
summary(x)
#>
#> Call:
#> glm(formula = orgspcssi ~ age, family = binomial(link = "logit"),
#> data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.3795 -0.3215 -0.3051 -0.2908 2.5824
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -2.4187676 0.0466476 -51.85 <2e-16 ***
#> age -0.0097715 0.0007559 -12.93 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 63342 on 167458 degrees of freedom
#> Residual deviance: 63177 on 167457 degrees of freedom
#> (1 observation deleted due to missingness)
#> AIC: 63181
#>
#> Number of Fisher Scoring iterations: 6
exp(coef(x))
#> (Intercept) age
#> 0.08903127 0.99027604Nonetheless, a simple unadjusted regression of age on the outcomes suggest increasing age has a very small but significant protective effect.
Pre-operative albumin seems likely to have an effect on anastomotic leaks.
First, we can look at the distrubtion of pre-operative albumin:
This seems fairly negatively skewed. We can check skewness:
Indeed it is. A Q-Q plot demonstrates this graphically:
We may be able to reduce the skew by transforming the variable.
moments::skewness(df$pralbum^(1/3), na.rm = TRUE)
#> [1] -1.305736
moments::skewness(df$pralbum^2, na.rm = TRUE)
#> [1] 0.200398A square transformation seems to reduce the skewness quite markedly, with the exception of the extreme values:
df %>%
ggplot(aes(x = pralbum^2, y = col_anastomotic, group = col_anastomotic)) +
geom_boxplot()
#> Warning: Removed 49801 rows containing non-finite values (stat_boxplot).
df %>%
ggplot(aes(x = pralbum^2, y = orgspcssi, group = orgspcssi)) +
geom_boxplot()
#> Warning: Removed 49801 rows containing non-finite values (stat_boxplot).We can regress the albumin squared in an unadjusted model.
x <- glm(col_anastomotic ~ pralbum^2, family = binomial(link = "logit"), data = df)
summary(x)
#>
#> Call:
#> glm(formula = col_anastomotic ~ pralbum^2, family = binomial(link = "logit"),
#> data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.4212 -0.2750 -0.2566 -0.2395 2.9171
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -2.02613 0.08566 -23.65 <2e-16 ***
#> pralbum -0.35146 0.02307 -15.24 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 35248 on 117658 degrees of freedom
#> Residual deviance: 35028 on 117657 degrees of freedom
#> (49801 observations deleted due to missingness)
#> AIC: 35032
#>
#> Number of Fisher Scoring iterations: 6
exp(coef(x))
#> (Intercept) pralbum
#> 0.1318453 0.7036626
x <- glm(orgspcssi ~ pralbum^2, family = binomial(link = "logit"), data = df)
summary(x)
#>
#> Call:
#> glm(formula = orgspcssi ~ pralbum^2, family = binomial(link = "logit"),
#> data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.6485 -0.3396 -0.3051 -0.2740 3.2386
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.90366 0.06753 -13.38 <2e-16 ***
#> pralbum -0.54878 0.01854 -29.60 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 47343 on 117658 degrees of freedom
#> Residual deviance: 46524 on 117657 degrees of freedom
#> (49801 observations deleted due to missingness)
#> AIC: 46528
#>
#> Number of Fisher Scoring iterations: 6
exp(coef(x))
#> (Intercept) pralbum
#> 0.4050849 0.5776557This makes sense. A 1 unit increase in albumin squared results in 30% decrease in the odds of having an anastomotic leak and nearly 50% decrease in the odds of an organ-space surgical site infection. This will definitely be in our match and model.
Diabetes seems like another likely risk factor for leak and infection.
df %>%
group_by(diabetes, col_anastomotic) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'diabetes' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: diabetes [2]
#> diabetes col_anastomotic n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 137318 0.967
#> 2 FALSE TRUE 4675 0.0329
#> 3 TRUE FALSE 24532 0.963
#> 4 TRUE TRUE 935 0.0367
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 9.4633, df = 1, p-value = 0.002096Despite these rates seeming close, there is enough of a sample size to result in a significant Chi-square test. Diabetes will definitely go in our match and our model.
df %>%
group_by(diabetes, orgspcssi) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'diabetes' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: diabetes [2]
#> diabetes orgspcssi n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 135404 0.954
#> 2 FALSE TRUE 6589 0.0464
#> 3 TRUE FALSE 24211 0.951
#> 4 TRUE TRUE 1256 0.0493
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 4.0443, df = 1, p-value = 0.04432The same is true as diabetes relates to organ-space SSI.
df %>%
group_by(insulin, col_anastomotic) %>%
mutate(insulin = tidyr::replace_na(insulin, FALSE)) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'insulin' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: insulin [2]
#> insulin col_anastomotic n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 153832 0.967
#> 2 FALSE TRUE 5296 0.0333
#> 3 TRUE FALSE 8018 0.962
#> 4 TRUE TRUE 314 0.0377
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 4.609, df = 1, p-value = 0.0318
df %>%
group_by(insulin, orgspcssi) %>%
mutate(insulin = tidyr::replace_na(insulin, FALSE)) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'insulin' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: insulin [2]
#> insulin orgspcssi n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 151749 0.954
#> 2 FALSE TRUE 7379 0.0464
#> 3 TRUE FALSE 7866 0.944
#> 4 TRUE TRUE 466 0.0559
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 15.983, df = 1, p-value = 6.39e-05Insulin is obviously highly correlated to the diagnosis of diabetes and it is likely there is an interaction between diabetes and insulin. We should check for an interaction effect:
tapply(df$col_anastomotic, list(df$diabetes, tidyr::replace_na(df$insulin, FALSE)), mean)
#> FALSE TRUE
#> FALSE 0.03292416 NA
#> TRUE 0.03624161 0.03768603
tapply(df$orgspcssi, list(df$diabetes, tidyr::replace_na(df$insulin, FALSE)), mean)
#> FALSE TRUE
#> FALSE 0.04640370 NA
#> TRUE 0.04610446 0.05592895The rows refer to diabetes and the columns to insulin. The values are the rates of anastomotic leak/organ-space SSI. This makes it clear that insulin contributes predominantly to the rise in organ-space SSI seen in patients with diabetes. The rate of organ-space SSI in patients with diabetes but not taking insulin is similar to the rate of organ-space SSI in patients without diabetes.
We will need to either just use insulin or include an interaction term in our model.
It is probably easiest to deal with height and weight as BMI. First we can check the distribution:
df %>%
mutate(bmi = nsqipBileSpill::bmi(height, weight)) %>%
ggplot(aes(x = bmi)) +
geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 2679 rows containing non-finite values (stat_bin).Seems to be right skewed:
We can correct positive skewness with a log transformation or square root transformation:
moments::skewness(log(nsqipBileSpill::bmi(df$height, df$weight)), na.rm = TRUE)
#> [1] 0.2802827
moments::skewness(sqrt(nsqipBileSpill::bmi(df$height, df$weight)), na.rm = TRUE)
#> [1] 0.6944194qqnorm(log(nsqipBileSpill::bmi(df$height, df$weight)))
qqline(log(nsqipBileSpill::bmi(df$height, df$weight)))df %>%
mutate(bmi = log(nsqipBileSpill::bmi(height, weight))) %>%
ggplot(aes(x = bmi, y = col_anastomotic, group = col_anastomotic)) +
geom_boxplot()
#> Warning: Removed 2679 rows containing non-finite values (stat_boxplot).
df %>%
mutate(bmi = log(nsqipBileSpill::bmi(height, weight))) %>%
ggplot(aes(x = bmi, y = orgspcssi, group = orgspcssi)) +
geom_boxplot()
#> Warning: Removed 2679 rows containing non-finite values (stat_boxplot).We can regress the log-transformed BMI in an unadjusted model.
x <- glm(col_anastomotic ~ log(nsqipBileSpill::bmi(height, weight)), family = binomial(link = "logit"), data = df)
summary(x)
#>
#> Call:
#> glm(formula = col_anastomotic ~ log(nsqipBileSpill::bmi(height,
#> weight)), family = binomial(link = "logit"), data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.2691 -0.2606 -0.2596 -0.2586 2.6275
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.55000 0.20508 -17.311 <2e-16
#> log(nsqipBileSpill::bmi(height, weight)) 0.05332 0.06145 0.868 0.386
#>
#> (Intercept) ***
#> log(nsqipBileSpill::bmi(height, weight))
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 47977 on 164780 degrees of freedom
#> Residual deviance: 47976 on 164779 degrees of freedom
#> (2679 observations deleted due to missingness)
#> AIC: 47980
#>
#> Number of Fisher Scoring iterations: 6
exp(coef(x))
#> (Intercept)
#> 0.02872473
#> log(nsqipBileSpill::bmi(height, weight))
#> 1.05476223
x <- glm(orgspcssi ~ log(nsqipBileSpill::bmi(height, weight)), family = binomial(link = "logit"), data = df)
summary(x)
#>
#> Call:
#> glm(formula = orgspcssi ~ log(nsqipBileSpill::bmi(height, weight)),
#> family = binomial(link = "logit"), data = df)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.3394 -0.3116 -0.3077 -0.3034 2.5585
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -2.4307 0.1753 -13.863 < 2e-16
#> log(nsqipBileSpill::bmi(height, weight)) -0.1789 0.0527 -3.394 0.00069
#>
#> (Intercept) ***
#> log(nsqipBileSpill::bmi(height, weight)) ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 61788 on 164780 degrees of freedom
#> Residual deviance: 61776 on 164779 degrees of freedom
#> (2679 observations deleted due to missingness)
#> AIC: 61780
#>
#> Number of Fisher Scoring iterations: 5
exp(coef(x))
#> (Intercept)
#> 0.08797511
#> log(nsqipBileSpill::bmi(height, weight))
#> 0.83622534Smoking seems a likely candidate as a good explanatory variable.
df %>%
group_by(smoke, col_anastomotic) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'smoke' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: smoke [2]
#> smoke col_anastomotic n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 135637 0.969
#> 2 FALSE TRUE 4323 0.0309
#> 3 TRUE FALSE 26213 0.953
#> 4 TRUE TRUE 1287 0.0468
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 179.25, df = 1, p-value < 2.2e-16
df %>%
group_by(smoke, orgspcssi) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'smoke' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: smoke [2]
#> smoke orgspcssi n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 133859 0.956
#> 2 FALSE TRUE 6101 0.0436
#> 3 TRUE FALSE 25756 0.937
#> 4 TRUE TRUE 1744 0.0634
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 201.91, df = 1, p-value < 2.2e-16I’d like to control for packs, but the data is missing for most of our records:
Alcohol would be a good predictor I think, but unfortunately the data is missing for most of our records:
fnstatus2 is almost always related to poor outcomes.
Given the relatively low numbers of totally dependent (which will get even smaller after inclusion/exclusion criteria and the match), I almost always have to reduce this variable to a boolean (Independent vs. not independent):
df %>%
mutate(dependent = fnstatus2 != "Independent") %>%
drop_na(dependent) %>%
group_by(dependent, col_anastomotic) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'dependent' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: dependent [2]
#> dependent col_anastomotic n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 157470 0.967
#> 2 FALSE TRUE 5393 0.0331
#> 3 TRUE FALSE 3722 0.951
#> 4 TRUE TRUE 193 0.0493
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 30.434, df = 1, p-value = 3.454e-08
df %>%
mutate(dependent = fnstatus2 != "Independent") %>%
drop_na(dependent) %>%
group_by(dependent, orgspcssi) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'dependent' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: dependent [2]
#> dependent orgspcssi n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 155348 0.954
#> 2 FALSE TRUE 7515 0.0461
#> 3 TRUE FALSE 3633 0.928
#> 4 TRUE TRUE 282 0.0720
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 56.912, df = 1, p-value = 4.558e-14We will include dependent in our model. Alternatively, these patients could excluded but we should be able to match and then adjust for the effect.
There are not many patients on ventilators. But I would guess its a significant risk factor.
df %>%
group_by(ventilat, col_anastomotic) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'ventilat' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: ventilat [2]
#> ventilat col_anastomotic n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 160995 0.967
#> 2 FALSE TRUE 5544 0.0333
#> 3 TRUE FALSE 855 0.928
#> 4 TRUE TRUE 66 0.0717
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 40.475, df = 1, p-value = 1.991e-10
df %>%
group_by(ventilat, orgspcssi) %>%
summarise(n = n()) %>%
mutate(rate = n/sum(n)) %>%
chi()
#> `summarise()` regrouping output by 'ventilat' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups: ventilat [2]
#> ventilat orgspcssi n rate
#> <lgl> <lgl> <int> <dbl>
#> 1 FALSE FALSE 158810 0.954
#> 2 FALSE TRUE 7729 0.0464
#> 3 TRUE FALSE 805 0.874
#> 4 TRUE TRUE 116 0.126
#>
#> Pearson's Chi-squared test with Yates' continuity correction
#>
#> data: .
#> X-squared = 128, df = 1, p-value < 2.2e-16Turns out this seems to be the biggest risk factor we’ve explored so far. Let’s explore any relationship between ICG and pre-operative ventilator use. Are surgeons using ICG more in patients on a ventilator, maybe because they have a high a priori suspicision?
Actually, ICG was only used on one ventilator-dependent patient. Unfortunately we won’t be able to make any inferences about the utility of ICG in this population.
ventilat will definitely go in our model and be matched for (unless you exclude these patients).
To speed up the rest, I’ll make a function and test all exposures simultaneously. This is only for categorical exposures because I am running Chi-square tests (so not age, height, weight, etc. We can test those separately):
testexposure <- function(df, exposure, outcome) {
exposure <- rlang::ensym(exposure)
outcome <- rlang::ensym(outcome)
x <- df %>%
drop_na(!!exposure) %>%
group_by(!!exposure, !!outcome) %>%
summarise(n = n()) %>%
pull(n) %>%
matrix(2, 2) %>%
chisq.test()
sig <- if(x$p.value < 0.001) {"***"} else if(x$p.value < 0.01) {"**"} else if(x$p.value < 0.05) {"*"} else {""}
tibble(exposure = deparse(substitute(!!exposure)),
n = sum(x$observed),
nExposureFALSE = sum(x$observed[1]),
outcomeExposureFALSE = x$observed[2,1]/sum(x$observed[,1]),
nExposureTRUE = sum(x$observed[,2]),
outcomeExposureTRUE = x$observed[2,2]/sum(x$observed[,2]),
p = x$p.value,
sig = sig)
}
testexposures <- function(df, ..., outcome) {
exposures <- rlang::ensyms(...)
outcome <- rlang::ensym(outcome)
purrr::map_dfr(exposures, function(x) {
df %>%
testexposure(!!x, !!outcome)
})
}
df %>%
mutate(dependent = fnstatus2 != "Independent",
insulin = tidyr::replace_na(insulin, FALSE),
lap = col_approach != "Open",
cancer = grepl("Colon cancer", col_indication)) %>%
testexposures(icg, insulin, smoke, dependent, ventilat, hxcopd, hxchf, hxmi, hxangina, hypermed, renafail, dialysis, discancr, wndinf, steroid, wtloss, bleeddis, transfus, chemo, radio, prsepis, emergncy, lap, col_steroid, col_mech_bowel_prep, col_oral_antibiotic, col_chemo, col_emergent, cancer, outcome = col_anastomotic) %>%
knitr::kable()| exposure | n | nExposureFALSE | outcomeExposureFALSE | nExposureTRUE | outcomeExposureTRUE | p | sig |
|---|---|---|---|---|---|---|---|
| icg | 167460 | 160781 | 0.0335415 | 1099 | 0.0272975 | 0.2880172 | |
| insulin | 167460 | 153832 | 0.0332814 | 8332 | 0.0376860 | 0.0318039 | * |
| smoke | 167460 | 135637 | 0.0308874 | 27500 | 0.0468000 | 0.0000000 | *** |
| dependent | 166778 | 157470 | 0.0331137 | 3915 | 0.0492976 | 0.0000000 | *** |
| ventilat | 167460 | 160995 | 0.0332895 | 921 | 0.0716612 | 0.0000000 | *** |
| hxcopd | 167460 | 153845 | 0.0329688 | 8370 | 0.0436081 | 0.0000002 | *** |
| hxchf | 167460 | 160182 | 0.0333420 | 1753 | 0.0484883 | 0.0005837 | *** |
| hxmi | 294 | 139 | 0.0544218 | 147 | 0.0544218 | 1.0000000 | |
| hxangina | 294 | 139 | 0.0544218 | 147 | 0.0544218 | 1.0000000 | |
| hypermed | 167460 | 84206 | 0.0327153 | 80406 | 0.0343507 | 0.0651112 | |
| renafail | 167460 | 161232 | 0.0333929 | 658 | 0.0607903 | 0.0001510 | *** |
| dialysis | 167460 | 160438 | 0.0332614 | 1502 | 0.0599201 | 0.0000000 | *** |
| discancr | 167460 | 152543 | 0.0325050 | 9792 | 0.0495302 | 0.0000000 | *** |
| wndinf | 167460 | 159536 | 0.0330331 | 2474 | 0.0646726 | 0.0000000 | *** |
| steroid | 167460 | 149773 | 0.0324679 | 12661 | 0.0461259 | 0.0000000 | *** |
| wtloss | 167460 | 155186 | 0.0326148 | 7042 | 0.0536779 | 0.0000000 | *** |
| bleeddis | 167460 | 155976 | 0.0327190 | 6208 | 0.0538015 | 0.0000000 | *** |
| transfus | 167460 | 158064 | 0.0331117 | 3983 | 0.0494602 | 0.0000000 | *** |
| chemo | 294 | 139 | 0.0544218 | 147 | 0.0544218 | 1.0000000 | |
| radio | 232 | 112 | 0.0344828 | 116 | 0.0344828 | 1.0000000 | |
| prsepis | 167460 | 151319 | 0.0316018 | 11203 | 0.0599839 | 0.0000000 | *** |
| emergncy | 167460 | 147145 | 0.0315713 | 15518 | 0.0523908 | 0.0000000 | *** |
| lap | 167434 | 46539 | 0.0478538 | 118556 | 0.0275819 | 0.0000000 | *** |
| col_steroid | 165708 | 151466 | 0.0327039 | 9121 | 0.0462669 | 0.0000000 | *** |
| col_mech_bowel_prep | 144430 | 49963 | 0.0452322 | 92100 | 0.0265689 | 0.0000000 | *** |
| col_oral_antibiotic | 146697 | 77603 | 0.0417608 | 65712 | 0.0234813 | 0.0000000 | *** |
| col_chemo | 165764 | 152056 | 0.0324026 | 8616 | 0.0522284 | 0.0000000 | *** |
| col_emergent | 6608 | 897 | 0.0567823 | 5657 | 0.0549761 | 0.8816393 | |
| cancer | 167460 | 91816 | 0.0345321 | 72360 | 0.0321448 | 0.0074578 | ** |
df %>%
mutate(dependent = fnstatus2 != "Independent",
insulin = tidyr::replace_na(insulin, FALSE),
lap = col_approach != "Open",
cancer = grepl("Colon cancer", col_indication)) %>%
testexposures(icg, insulin, smoke, dependent, ventilat, hxcopd, hxchf, hxmi, hxangina, hypermed, renafail, dialysis, discancr, wndinf, steroid, wtloss, bleeddis, transfus, chemo, radio, prsepis, emergncy, lap, col_steroid, col_mech_bowel_prep, col_oral_antibiotic, col_chemo, cancer, outcome = orgspcssi) %>%
knitr::kable()| exposure | n | nExposureFALSE | outcomeExposureFALSE | nExposureTRUE | outcomeExposureTRUE | p | sig |
|---|---|---|---|---|---|---|---|
| icg | 167460 | 158564 | 0.0468680 | 1099 | 0.0436761 | 0.6690174 | |
| insulin | 167460 | 151749 | 0.0463715 | 8332 | 0.0559289 | 0.0000639 | *** |
| smoke | 167460 | 133859 | 0.0435910 | 27500 | 0.0634182 | 0.0000000 | *** |
| dependent | 166778 | 155348 | 0.0461431 | 3915 | 0.0720307 | 0.0000000 | *** |
| ventilat | 167460 | 158810 | 0.0464095 | 921 | 0.1259501 | 0.0000000 | *** |
| hxcopd | 167460 | 151749 | 0.0461437 | 8370 | 0.0602151 | 0.0000000 | *** |
| hxchf | 167460 | 157972 | 0.0466788 | 1753 | 0.0627496 | 0.0018663 | ** |
| hxmi | 294 | 140 | 0.0476190 | 147 | 0.0476190 | 1.0000000 | |
| hxangina | 294 | 140 | 0.0476190 | 147 | 0.0476190 | 1.0000000 | |
| hypermed | 167460 | 82852 | 0.0482689 | 80406 | 0.0453076 | 0.0043230 | ** |
| renafail | 167460 | 159031 | 0.0465882 | 658 | 0.1124620 | 0.0000000 | *** |
| dialysis | 167460 | 158252 | 0.0464334 | 1502 | 0.0925433 | 0.0000000 | *** |
| discancr | 167460 | 150600 | 0.0448284 | 9792 | 0.0793505 | 0.0000000 | *** |
| wndinf | 167460 | 157425 | 0.0458281 | 2474 | 0.1147939 | 0.0000000 | *** |
| steroid | 167460 | 147940 | 0.0443091 | 12661 | 0.0778769 | 0.0000000 | *** |
| wtloss | 167460 | 153121 | 0.0454874 | 7042 | 0.0778188 | 0.0000000 | *** |
| bleeddis | 167460 | 153895 | 0.0456242 | 6208 | 0.0786082 | 0.0000000 | *** |
| transfus | 167460 | 155918 | 0.0462389 | 3983 | 0.0718052 | 0.0000000 | *** |
| chemo | 294 | 140 | 0.0476190 | 147 | 0.0476190 | 1.0000000 | |
| radio | 232 | 113 | 0.0258621 | 116 | 0.0258621 | 1.0000000 | |
| prsepis | 167460 | 149815 | 0.0412270 | 11203 | 0.1252343 | 0.0000000 | *** |
| emergncy | 167460 | 145561 | 0.0419963 | 15518 | 0.0943421 | 0.0000000 | *** |
| lap | 167434 | 45233 | 0.0745734 | 118556 | 0.0354179 | 0.0000000 | *** |
| col_steroid | 165708 | 149524 | 0.0451059 | 9121 | 0.0792676 | 0.0000000 | *** |
| col_mech_bowel_prep | 144430 | 48689 | 0.0695777 | 92100 | 0.0353529 | 0.0000000 | *** |
| col_oral_antibiotic | 146697 | 76117 | 0.0601099 | 65712 | 0.0326729 | 0.0000000 | *** |
| col_chemo | 165764 | 150037 | 0.0452503 | 8616 | 0.0763695 | 0.0000000 | *** |
| cancer | 167460 | 90103 | 0.0525447 | 72360 | 0.0393588 | 0.0000000 | *** |
It is clear that many of these exposures are related to both of these outcomes. Given that organ-space SSI and anastomotic leak are inextricably linked (leak is almost surely a significant risk factor for organ-space SSI), it doesn’t really make sense to look at the impact of ICG on both outcomes. This doesn’t make sense clinically. No one would use ICG to reduce rates of organ-space SSI, they would use it to reduce rates of anastomotic leak. To avoid this weird interaction between your two outcomes, I would narrow the question to a single outcome - does ICG reduce rates of anastomotic leak? I think this is clearer study and will be much simpler for readers to understand. It also aligns with the ICG use case.
From here on out, I will be considering anastomotic leak as the only outcome.
There are a lot of variables in the above analysis that are significant but there are probably some interactions, so we should dig into it. This is definitely non-exhaustive and the decision to investigate various variables or interaction effects requires some clinical knowledge, so if you think I have missed something, please let me know.
I figure that patients with cancer are getting elective surgery, so they have time for oral antibiotics and mechanical preparation. This should explain why cancer seems to be associated with lower rates of anastomotic leak.
interactions <- function(outcome, exposure1, exposure2) {
x <- tapply(outcome, list(exposure1, exposure2), mean)
names(dimnames(x)) <- c(deparse(substitute(exposure1)), deparse(substitute(exposure2)))
x
}
interactions(df$col_anastomotic, grepl("Colon cancer", df$col_indication), df$col_oral_antibiotic)
#> df$col_oral_antibiotic
#> grepl("Colon cancer", df$col_indication) FALSE TRUE
#> FALSE 0.04305001 0.02352713
#> TRUE 0.03996812 0.02342507Interestingly, it seems like cancer has a benefit in the absence of oral antibiotic prep and that the oral antibiotic preparation acutally mitigates the negative effect of not having cancer.
interactions(df$col_anastomotic, grepl("Colon cancer", df$col_indication), df$col_mech_bowel_prep)
#> df$col_mech_bowel_prep
#> grepl("Colon cancer", df$col_indication) FALSE TRUE
#> FALSE 0.04670531 0.02605949
#> TRUE 0.04276215 0.02715016The same seems to be true for mechanical bowel preparation.
Maybe the “benefit” of cancer is present because actually most patients with an indication of cancer are having surgery electively rather than emergently.
interactions(df$col_anastomotic, grepl("Colon cancer", df$col_indication), df$emergncy)
#> df$emergncy
#> grepl("Colon cancer", df$col_indication) FALSE TRUE
#> FALSE 0.03182400 0.05286812
#> TRUE 0.03126809 0.05060976
interactions(df$col_anastomotic, grepl("Colon cancer", df$col_indication), df$electsurg)
#> df$electsurg
#> grepl("Colon cancer", df$col_indication) FALSE TRUE
#> FALSE 0.05246723 0.02838033
#> TRUE 0.04220976 0.02991308Unfortunately, emergency vs. elective surgery does not seem to explain the interaction.
Maybe steroid use is higher in non-cancer cases (such as for IBD) and relatively low in cancer cases. This might lead to higher rates of breakdown in the non-cancer cases and a relative protective effect of cancer.
interactions(df$col_anastomotic, grepl("Colon cancer", df$col_indication), df$col_steroid)
#> df$col_steroid
#> grepl("Colon cancer", df$col_indication) FALSE TRUE
#> FALSE 0.03333917 0.04684535
#> TRUE 0.03193649 0.03918723Still no good explanation.
interactions(df$col_anastomotic, grepl("Colon cancer", df$col_indication), df$discancr)
#> df$discancr
#> grepl("Colon cancer", df$col_indication) FALSE TRUE
#> FALSE 0.03403294 0.05570776
#> TRUE 0.03031286 0.04775059Well I’m not sure why cancer seems to have a protective effect. Maybe you have other ideas. I think it might be simply due to what the other indications are:
unique(df$col_indication)
#> [1] "Colon cancer" "Chrohn's disease"
#> [3] "Other" "Acute diverticulitis"
#> [5] "Chronic diverticular disease" "Ulcerative colitis"
#> [7] "Non-malignant polyp" "Colon cancer w/ obstruction"
#> [9] "Volvulus" "Bleeding"
#> [11] NA "Enterocolitis"The other conditions are pretty nasty conditions that probably lead to inflammation or devascularation at the anastomotic site. Cancer is a relatively well localized lesion with other healthy tissue at the anastomosis. Maybe this explains why cancer has a relative protective effect.
interactions(df$col_anastomotic, df$col_oral_antibiotic, df$col_mech_bowel_prep)
#> df$col_mech_bowel_prep
#> df$col_oral_antibiotic FALSE TRUE
#> FALSE 0.04695034 0.03399189
#> TRUE 0.03489771 0.02215336There is certainly an interaction to account for here. Patients that have both antibiotic and mechanical prepration had the lowest rates of anastomotic leak and patients that had neither had the highest. This interaction will definitely need to be controlled for.
interactions(df$col_anastomotic, df$renafail, df$dialysis)
#> df$dialysis
#> df$renafail FALSE TRUE
#> FALSE 0.03320101 0.05791506
#> TRUE 0.05543237 0.07246377There is an interaction to account for here. Patients that have both acute renal failure and dialysis have the highest rates. Patients with neither have the lowest. Patients having one or the other are in-between at relatively equal rates. This makes clinical sense, which is reassuring. This interaction will definitely need to be controlled for.