Setup

Load Packages

We start by loading the necessary packages for our analysis.

# Check if 'pacman' is installed; if not, install it
if(!require(pacman)) install.packages("pacman")

# Load required packages using 'pacman'
pacman::p_load(tidyverse, here, janitor, gtsummary, report)

Data Import and Cleaning

We import the RHC dataset and clean it by renaming columns and transforming variables.

# Read the CSV file, clean column names, and transform variables
rhc <- 
  read_csv(here("data/rhc.csv")) %>%  # Read the data
  clean_names() %>%  # Clean column names
  transmute(
    age,  # Keep age as is
    sex = as.factor(sex),  # Convert sex to factor
    cat1 = as.factor(cat1),  # Convert cat1 to factor
    meanbp1,  # Keep meanbp1 as is
    death = dth30,  # Rename dth30 to death
    swang1 = as.factor(swang1)  # Convert swang1 to factor
  ) %>% 
  mutate(death = if_else(death == "Yes", "Died", "Did not die"))  # Recode death variable

Summary Statistics

Let’s take a look at the summary statistics of the cleaned data.

# Generate a summary table using 'gtsummary'
rhc  %>% 
  gtsummary::tbl_summary()
Characteristic N = 5,7351
age 64 (50, 74)
sex
    Female 2,543 (44%)
    Male 3,192 (56%)
cat1
    ARF 2,490 (43%)
    CHF 456 (8.0%)
    Cirrhosis 224 (3.9%)
    Colon Cancer 7 (0.1%)
    Coma 436 (7.6%)
    COPD 457 (8.0%)
    Lung Cancer 39 (0.7%)
    MOSF w/Malignancy 399 (7.0%)
    MOSF w/Sepsis 1,227 (21%)
meanbp1 63 (50, 115)
death
    Did not die 3,817 (67%)
    Died 1,918 (33%)
swang1
    No RHC 3,551 (62%)
    RHC 2,184 (38%)
1 Median (IQR); n (%)

Data Preparation for Regression

We prepare the data for logistic regression by selecting relevant variables and recoding the ‘death’ variable.

# Prepare data for logistic regression
rhc_for_reg <- 
  rhc %>% 
  select(age, sex, cat1, meanbp1, death, swang1) %>%  # Select relevant columns
  mutate(death = if_else(death == "Died", 1, 0))  # Recode death as binary

Logistic Regression Model

We fit a logistic regression model to predict the likelihood of death based on several predictors.

# Fit a logistic regression model
model <- glm(
  death ~ swang1 + age + sex + meanbp1 + cat1,  # Specify formula
  data = rhc_for_reg,  # Data source
  family = binomial  # Logistic regression
)

Regression Table

A decent regression table for interpretation.

# Generate a regression table using 'tbl_regression'
tbl_regression(model, exponentiate = TRUE)
Characteristic OR1 95% CI1 p-value
swang1
    No RHC
    RHC 1.44 1.27, 1.63 <0.001
age 1.02 1.01, 1.02 <0.001
sex
    Female
    Male 1.11 0.99, 1.25 0.070
meanbp1 1.0 0.99, 1.00 <0.001
cat1
    ARF
    CHF 0.31 0.23, 0.41 <0.001
    Cirrhosis 2.05 1.54, 2.73 <0.001
    Colon Cancer 0.40 0.02, 2.37 0.4
    Coma 5.26 4.21, 6.60 <0.001
    COPD 0.50 0.39, 0.65 <0.001
    Lung Cancer 1.60 0.81, 3.06 0.2
    MOSF w/Malignancy 2.83 2.27, 3.53 <0.001
    MOSF w/Sepsis 1.10 0.95, 1.28 0.2
1 OR = Odds Ratio, CI = Confidence Interval

Model Report

A nice package for gettin a quick text summary of the model.

# Generate a text report of the model using 'report'
report(model)
## We fitted a logistic model (estimated using ML) to predict death with swang1,
## age, sex, meanbp1 and cat1 (formula: death ~ swang1 + age + sex + meanbp1 +
## cat1). The model's explanatory power is weak (Tjur's R2 = 0.11). The model's
## intercept, corresponding to swang1 = No RHC, age = 0, sex = Female, meanbp1 = 0
## and cat1 = ARF, is at -1.66 (95% CI [-1.96, -1.36], p < .001). Within this
## model:
## 
##   - The effect of swang1 [RHC] is statistically significant and positive (beta =
## 0.36, 95% CI [0.24, 0.49], p < .001; Std. beta = 0.36, 95% CI [0.24, 0.49])
##   - The effect of age is statistically significant and positive (beta = 0.02, 95%
## CI [0.01, 0.02], p < .001; Std. beta = 0.27, 95% CI [0.21, 0.33])
##   - The effect of sex [Male] is statistically non-significant and positive (beta
## = 0.11, 95% CI [-8.74e-03, 0.23], p = 0.070; Std. beta = 0.11, 95% CI
## [-8.74e-03, 0.23])
##   - The effect of meanbp1 is statistically significant and negative (beta =
## -5.09e-03, 95% CI [-6.73e-03, -3.47e-03], p < .001; Std. beta = -0.19, 95% CI
## [-0.26, -0.13])
##   - The effect of cat1 [CHF] is statistically significant and negative (beta =
## -1.17, 95% CI [-1.47, -0.88], p < .001; Std. beta = -1.17, 95% CI [-1.47,
## -0.88])
##   - The effect of cat1 [Cirrhosis] is statistically significant and positive
## (beta = 0.72, 95% CI [0.43, 1.00], p < .001; Std. beta = 0.72, 95% CI [0.43,
## 1.00])
##   - The effect of cat1 [Colon Cancer] is statistically non-significant and
## negative (beta = -0.92, 95% CI [-3.87, 0.86], p = 0.396; Std. beta = -0.92, 95%
## CI [-3.87, 0.86])
##   - The effect of cat1 [Coma] is statistically significant and positive (beta =
## 1.66, 95% CI [1.44, 1.89], p < .001; Std. beta = 1.66, 95% CI [1.44, 1.89])
##   - The effect of cat1 [COPD] is statistically significant and negative (beta =
## -0.69, 95% CI [-0.95, -0.43], p < .001; Std. beta = -0.69, 95% CI [-0.95,
## -0.43])
##   - The effect of cat1 [Lung Cancer] is statistically non-significant and
## positive (beta = 0.47, 95% CI [-0.21, 1.12], p = 0.159; Std. beta = 0.47, 95%
## CI [-0.21, 1.12])
##   - The effect of cat1 [MOSF w/Malignancy] is statistically significant and
## positive (beta = 1.04, 95% CI [0.82, 1.26], p < .001; Std. beta = 1.04, 95% CI
## [0.82, 1.26])
##   - The effect of cat1 [MOSF w/Sepsis] is statistically non-significant and
## positive (beta = 0.10, 95% CI [-0.05, 0.25], p = 0.209; Std. beta = 0.10, 95%
## CI [-0.05, 0.25])
## 
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald z-distribution approximation.

R Summary

R’s built-in summary function. Not very nice output though.

summary(model)
## 
## Call:
## glm(formula = death ~ swang1 + age + sex + meanbp1 + cat1, family = binomial, 
##     data = rhc_for_reg)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8844  -0.8972  -0.6556   1.1881   2.3543  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.6609844  0.1536829 -10.808  < 2e-16 ***
## swang1RHC              0.3617283  0.0636420   5.684 1.32e-08 ***
## age                    0.0162078  0.0018522   8.751  < 2e-16 ***
## sexMale                0.1088107  0.0600403   1.812   0.0699 .  
## meanbp1               -0.0050916  0.0008314  -6.124 9.10e-10 ***
## cat1CHF               -1.1666531  0.1487717  -7.842 4.44e-15 ***
## cat1Cirrhosis          0.7192002  0.1457946   4.933 8.10e-07 ***
## cat1Colon Cancer      -0.9214230  1.0862500  -0.848   0.3963    
## cat1Coma               1.6603245  0.1143395  14.521  < 2e-16 ***
## cat1COPD              -0.6872323  0.1332778  -5.156 2.52e-07 ***
## cat1Lung Cancer        0.4727755  0.3353957   1.410   0.1587    
## cat1MOSF w/Malignancy  1.0395008  0.1122223   9.263  < 2e-16 ***
## cat1MOSF w/Sepsis      0.0967485  0.0769735   1.257   0.2088    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7309.6  on 5734  degrees of freedom
## Residual deviance: 6689.3  on 5722  degrees of freedom
## AIC: 6715.3
## 
## Number of Fisher Scoring iterations: 4