Setup
Load Packages
We start by loading the necessary packages for our analysis.
Data Import and Cleaning
We import the RHC dataset and clean it by renaming columns and transforming variables.
# Read the CSV file, clean column names, and transform variables
rhc <-
read_csv(here("data/rhc.csv")) %>% # Read the data
clean_names() %>% # Clean column names
transmute(
age, # Keep age as is
sex = as.factor(sex), # Convert sex to factor
cat1 = as.factor(cat1), # Convert cat1 to factor
meanbp1, # Keep meanbp1 as is
death = dth30, # Rename dth30 to death
swang1 = as.factor(swang1) # Convert swang1 to factor
) %>%
mutate(death = if_else(death == "Yes", "Died", "Did not die")) # Recode death variableSummary Statistics
Let’s take a look at the summary statistics of the cleaned data.
| Characteristic | N = 5,7351 |
|---|---|
| age | 64 (50, 74) |
| sex | |
| Female | 2,543 (44%) |
| Male | 3,192 (56%) |
| cat1 | |
| ARF | 2,490 (43%) |
| CHF | 456 (8.0%) |
| Cirrhosis | 224 (3.9%) |
| Colon Cancer | 7 (0.1%) |
| Coma | 436 (7.6%) |
| COPD | 457 (8.0%) |
| Lung Cancer | 39 (0.7%) |
| MOSF w/Malignancy | 399 (7.0%) |
| MOSF w/Sepsis | 1,227 (21%) |
| meanbp1 | 63 (50, 115) |
| death | |
| Did not die | 3,817 (67%) |
| Died | 1,918 (33%) |
| swang1 | |
| No RHC | 3,551 (62%) |
| RHC | 2,184 (38%) |
| 1 Median (IQR); n (%) | |
Data Preparation for Regression
We prepare the data for logistic regression by selecting relevant variables and recoding the ‘death’ variable.
Logistic Regression Model
We fit a logistic regression model to predict the likelihood of death based on several predictors.
Regression Table
A decent regression table for interpretation.
| Characteristic | OR1 | 95% CI1 | p-value |
|---|---|---|---|
| swang1 | |||
| No RHC | — | — | |
| RHC | 1.44 | 1.27, 1.63 | <0.001 |
| age | 1.02 | 1.01, 1.02 | <0.001 |
| sex | |||
| Female | — | — | |
| Male | 1.11 | 0.99, 1.25 | 0.070 |
| meanbp1 | 1.0 | 0.99, 1.00 | <0.001 |
| cat1 | |||
| ARF | — | — | |
| CHF | 0.31 | 0.23, 0.41 | <0.001 |
| Cirrhosis | 2.05 | 1.54, 2.73 | <0.001 |
| Colon Cancer | 0.40 | 0.02, 2.37 | 0.4 |
| Coma | 5.26 | 4.21, 6.60 | <0.001 |
| COPD | 0.50 | 0.39, 0.65 | <0.001 |
| Lung Cancer | 1.60 | 0.81, 3.06 | 0.2 |
| MOSF w/Malignancy | 2.83 | 2.27, 3.53 | <0.001 |
| MOSF w/Sepsis | 1.10 | 0.95, 1.28 | 0.2 |
| 1 OR = Odds Ratio, CI = Confidence Interval | |||
Model Report
A nice package for gettin a quick text summary of the model.
## We fitted a logistic model (estimated using ML) to predict death with swang1,
## age, sex, meanbp1 and cat1 (formula: death ~ swang1 + age + sex + meanbp1 +
## cat1). The model's explanatory power is weak (Tjur's R2 = 0.11). The model's
## intercept, corresponding to swang1 = No RHC, age = 0, sex = Female, meanbp1 = 0
## and cat1 = ARF, is at -1.66 (95% CI [-1.96, -1.36], p < .001). Within this
## model:
##
## - The effect of swang1 [RHC] is statistically significant and positive (beta =
## 0.36, 95% CI [0.24, 0.49], p < .001; Std. beta = 0.36, 95% CI [0.24, 0.49])
## - The effect of age is statistically significant and positive (beta = 0.02, 95%
## CI [0.01, 0.02], p < .001; Std. beta = 0.27, 95% CI [0.21, 0.33])
## - The effect of sex [Male] is statistically non-significant and positive (beta
## = 0.11, 95% CI [-8.74e-03, 0.23], p = 0.070; Std. beta = 0.11, 95% CI
## [-8.74e-03, 0.23])
## - The effect of meanbp1 is statistically significant and negative (beta =
## -5.09e-03, 95% CI [-6.73e-03, -3.47e-03], p < .001; Std. beta = -0.19, 95% CI
## [-0.26, -0.13])
## - The effect of cat1 [CHF] is statistically significant and negative (beta =
## -1.17, 95% CI [-1.47, -0.88], p < .001; Std. beta = -1.17, 95% CI [-1.47,
## -0.88])
## - The effect of cat1 [Cirrhosis] is statistically significant and positive
## (beta = 0.72, 95% CI [0.43, 1.00], p < .001; Std. beta = 0.72, 95% CI [0.43,
## 1.00])
## - The effect of cat1 [Colon Cancer] is statistically non-significant and
## negative (beta = -0.92, 95% CI [-3.87, 0.86], p = 0.396; Std. beta = -0.92, 95%
## CI [-3.87, 0.86])
## - The effect of cat1 [Coma] is statistically significant and positive (beta =
## 1.66, 95% CI [1.44, 1.89], p < .001; Std. beta = 1.66, 95% CI [1.44, 1.89])
## - The effect of cat1 [COPD] is statistically significant and negative (beta =
## -0.69, 95% CI [-0.95, -0.43], p < .001; Std. beta = -0.69, 95% CI [-0.95,
## -0.43])
## - The effect of cat1 [Lung Cancer] is statistically non-significant and
## positive (beta = 0.47, 95% CI [-0.21, 1.12], p = 0.159; Std. beta = 0.47, 95%
## CI [-0.21, 1.12])
## - The effect of cat1 [MOSF w/Malignancy] is statistically significant and
## positive (beta = 1.04, 95% CI [0.82, 1.26], p < .001; Std. beta = 1.04, 95% CI
## [0.82, 1.26])
## - The effect of cat1 [MOSF w/Sepsis] is statistically non-significant and
## positive (beta = 0.10, 95% CI [-0.05, 0.25], p = 0.209; Std. beta = 0.10, 95%
## CI [-0.05, 0.25])
##
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald z-distribution approximation.
R Summary
R’s built-in summary function. Not very nice output though.
##
## Call:
## glm(formula = death ~ swang1 + age + sex + meanbp1 + cat1, family = binomial,
## data = rhc_for_reg)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8844 -0.8972 -0.6556 1.1881 2.3543
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.6609844 0.1536829 -10.808 < 2e-16 ***
## swang1RHC 0.3617283 0.0636420 5.684 1.32e-08 ***
## age 0.0162078 0.0018522 8.751 < 2e-16 ***
## sexMale 0.1088107 0.0600403 1.812 0.0699 .
## meanbp1 -0.0050916 0.0008314 -6.124 9.10e-10 ***
## cat1CHF -1.1666531 0.1487717 -7.842 4.44e-15 ***
## cat1Cirrhosis 0.7192002 0.1457946 4.933 8.10e-07 ***
## cat1Colon Cancer -0.9214230 1.0862500 -0.848 0.3963
## cat1Coma 1.6603245 0.1143395 14.521 < 2e-16 ***
## cat1COPD -0.6872323 0.1332778 -5.156 2.52e-07 ***
## cat1Lung Cancer 0.4727755 0.3353957 1.410 0.1587
## cat1MOSF w/Malignancy 1.0395008 0.1122223 9.263 < 2e-16 ***
## cat1MOSF w/Sepsis 0.0967485 0.0769735 1.257 0.2088
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 7309.6 on 5734 degrees of freedom
## Residual deviance: 6689.3 on 5722 degrees of freedom
## AIC: 6715.3
##
## Number of Fisher Scoring iterations: 4