This report is run with the following parameters

Show code
str(params, give.attr = F)
List of 4
 $ era_start    : chr "2023-10-01"
 $ rollout_start: chr "2023-08-01"
 $ rollout_end  : chr "2023-10-01"
 $ ignore_before: chr "2022-09-01"

Importing data

NHSN - CLABSI

The workflow for downloading the NHSN data from Tableau is described in the other script. Once I learn how to use NHSN I will update this section

Show code
NHSN_junk <- "TableauData/2024-10-09 CLASBI SIR" %>%
  
  # Use `fs` package to get the names of all of the CSVs
  fs::dir_ls() %>%
  
  # Read all of the CSVs using map
  purrr::map(readr::read_csv, 
             col_types = "ccccccccccccccccccccccccccccccccccccccc")  %>%
  
  # Bind into a single table
  list_rbind()
Show code
NHSN_clean <- NHSN_junk %>%
  mutate(Month    = `Month of SUMMARYYM`,
         Date     = dmy(str_glue("15-{Month}")),
         Hosp     = as_factor(ORGID),
         Inf_pred = as.numeric(NUMPRED),
         Inf_obs  = as.numeric(`Infections Observed`),
         CL_days  = as.numeric(NUMCLDAYS),
         # .keep="none" is the same as transmute, keeps only new columns
         .keep="none") %>%
  select(-Month)

rm(NHSN_junk)
NHSN_clean %>% rmarkdown::paged_table()
Date formats

Although the data is reported monthly, it’s much easier to work with if it’s in date format. Therefore, by convention I set the date to be the 15th day of each month (e.g. data for September 2023 will appear as “Sept 15th 2023”). This shouldn’t really matter to anyone but me, but if you’re looking at the raw data that’s why!

CHG compliance

CHG data was obtained from the CHG compliance dashboard. I selected device type = central lines and exported the heatmap to CSV

To do

I can only download the past 12 months of data from the dashboard. Will need to get access to more of the data to increase the power of the pre-post analysis

Show code
CHG_raw <- "TableauData/2024-10-18 CHG CL only/All Overall Rate by Hospital and Department_data.csv" %>% 
  read_csv()

CHG_clean <- CHG_raw %>%
  select(Hospital, Date=`Week of Census Time`,
         Denominator=`All Lines Denominator`,
         Numerator=`All Lines Numerator`) %>%
  
  mutate(Date = mdy(Date)) %>%
  
  group_by(Hospital, Date) %>%
  summarise(Denominator = sum(Denominator, na.rm = T),
            Numerator   = sum(Numerator, na.rm = T)) %>%
  ungroup() %>%
  
  # Remove most recent week 
  filter(Date != ymd("2024-10-13"))
   

rm(CHG_raw) 

Because it is reported weekly (and everything else is reported monthly), we need to convert the weekly data into monthly data. Some weeks span the course of multiple months1. In these cases, we will group them with whichever month that the majority of their days fell under

How are the weeks calculated?

When you extract the data from Tableau, the dates it lists are all Sundays. It doesn’t look like it’s explicitly documented anywhere, but I assume that this means the week starts on Sunday, meaning the date provided is day 1 of 7 for that week

Show code
week_to_month <- function(date, beginning=T) {
  require(lubridate)
  
  # Make sure you are given only one date, not a vector
  stopifnot(length(date)==1)
  
  # Calculate the midpoint of the week, which is 3 days away from the start 
  # (or end) date
  if(beginning) {
    mid_date <- date + days(3)
  } else({
    # Otherwise, treat the given day as the end of the week
    mid_date <- date - days(3)
  })
  
  # Extract the month and the year from this date
  new_month <- month(mid_date)
  new_year  <- year(mid_date)
  
  # Create a new date and return
  new_date <- ymd(paste0(new_year, "-", new_month, "-15"))
  return(new_date)
  
}

CHG_monthly <- CHG_clean %>% 
  #  Needed to only process one date at a time
  rowwise() %>%
  mutate(Month = week_to_month(Date)) %>%
  ungroup() %>%
  
  group_by(Hospital, Month) %>%
  summarise(Denominator =sum(Denominator , na.rm = T),
            Numerator   =sum(Numerator   , na.rm = T))

rm(CHG_clean)
CHG_monthly %>% rmarkdown::paged_table()
Show code
#   mutate(Hospital = as.factor(Hospital)) %>% 
#   DT::datatable(filter="top", rownames = F)

Hospital info

This comes from a Google Sheet where I store info for each hospital. There is important data on the sheet that governs the steps below

Show code
library(googlesheets4)
options(gargle_oauth_email = TRUE)
gs4_auth(cache = TRUE)
if(!exists("hospitals")){
  hospitals_raw <- paste0("https://docs.google.com/",
                          "spreadsheets/d/",
                          "1zZ1o1HhzOPrbj2KBdGsaO4rJvLDPMWqN0ogjMCNHfGw/") %>%
    read_sheet(sheet = "CHG Hospitals")
  
  hospitals <- hospitals_raw
  rm(hospitals_raw)
}

Pre-processing

Merging datasets

The CHG data and NHSN data don’t match up directly in a one to one manner, as the CHG reporting splits what NHSN considers to be one “hospital” into multiple hospitals2. Using the logic from the Google Sheet, we will aggregate CHG data from:

  • WVU/UHA, Childrens, and Fairmont into WVUH

  • STF and TMH into TMH (although for some reason the TMH data isn’t reported on NHSN dashboard)

This means that WVUH’s CHG compliance will be based off of the pooled compliance data from the hospitals listed above

To do

I need to see if I can get STF and TMH from NHSN

Show code
CHG_matched1 <- CHG_monthly %>%
  # Join to the GS data to match keys
  left_join(select(hospitals, NHSN_abbv, CHG_abbv), 
            by=c("Hospital"="CHG_abbv")) %>%
  
  # Drop data if the `NHSN_abbv` is NA
  filter(!is.na(NHSN_abbv)) %>%

  # Aggregate data on NHSN abbreviation
  group_by(NHSN_abbv, Month) %>%
  summarise(Denominator =sum(Denominator , na.rm = T),
            Numerator   =sum(Numerator   , na.rm = T),
            .groups = "drop") %>%
  
  rename(Hosp=NHSN_abbv, Date=Month)

Now we have our data in a format where we can directly compare the CHG compliance with the number of CLABSIs observed

Show code
matched1 <- crossing(Hosp = CHG_matched1$Hosp, 
                     Date = c(CHG_matched1$Date, NHSN_clean$Date)) %>%
  left_join(NHSN_clean, by = join_by(Hosp, Date)) %>%
  left_join(CHG_matched1, by = join_by(Hosp, Date)) 

matched1 %>% 
  arrange(desc(Hosp), desc(Date)) %>%
  rmarkdown::paged_table()

Missing data

If you were to click through the 1330 rows in the table above, you’d find lots of missing data, particularly for the CHG compliance.

The figure below illustrates that patterns of missing data with the dark grey showing months where there was no missing data. The blue months are months where there is CLABSI data, but the CHG compliance is missing; conversely the purple months are when the CLABSI data is missing, but we do have CHG data. Finally, the pink (or light purple) months are months where we are missing both CLABSI and CHG data.

Show code
if (!interactive()) {
crossing(Hosp = CHG_matched1$Hosp, 
         Date = c(CHG_matched1$Date, NHSN_clean$Date)) %>%
  left_join(NHSN_clean) %>%
  left_join(CHG_matched1) %>%
  
  mutate(missing_CHG = is.na(Numerator),
         missing_NHSN = is.na(CL_days),
         missing = case_when( missing_CHG &  missing_NHSN ~ "Both",
                              missing_CHG & !missing_NHSN ~ "CHG",
                             !missing_CHG &  missing_NHSN ~ "NHSN",
                             !missing_CHG & !missing_NHSN ~ "No missing data")) %>%
  
  ggplot(aes(x=Hosp, y=Date, fill=missing)) + 
  geom_tile(color="white") +
  coord_flip() +
  theme_bw() +
  scale_fill_manual(values = c("Both missing" = "#c68fea", "CHG missing" = "#2e7fd0", 
                               "NHSN missing" = "#b92092", "No missing data" = "grey30",
                               "Both"         = "#c68fea", "CHG"         = "#2e7fd0", 
                               "NHSN"         = "#b92092" ),
                    name="Missing") +
  labs(x="", y="", title = "Missing data") +
  theme(legend.position = "bottom")
} 

Show code
  # inner_join(NHSN_clean, temp_CHG)

Aggregating smaller hospitals

The Critical Access Hospitals (CAHs) represent an important segment here, as they serve a distinct patient populations compared to larger hopsitals. From a modeling standpoint, they are also unique in that they have much lower rates of infections, as shown in the table below.

Show code
if (!interactive()) {
NHSN_clean %>%
  mutate(Year = year(Date)) %>%
  group_by(Hosp, Year) %>%
  summarise(Infections = sum(Inf_pred, na.rm = T), .groups = "drop") %>%
  pivot_wider(id_cols = Hosp, names_from = Year, values_from = Infections) %>%
  
  left_join(select(hospitals, NHSN_abbv, hosp_type), by=c("Hosp"="NHSN_abbv")) %>%
  mutate(hosp_type = recode(hosp_type, "???"="Small")) %>%
  select(Hosp, hosp_type, starts_with("20")) %>%
  unique() %>%
  arrange(hosp_type) %>%
  
  gt(rowname_col = "Hosp") %>%
  tab_stub_indent(everything(), 5) %>%
  fmt_missing() %>%
  fmt_number(columns = starts_with("20"), n_sigfig = 2) %>%
  
  cols_label(hosp_type = md("Hospital Type")) %>%

  gtExtras::gt_color_rows(
    columns = starts_with("20"),
    # domain = c(0,50)
    # palette = c("purple", "lightgrey", "green")
  ) %>%

  tab_style_body(
    columns = hosp_type,
    values = "Critical access",
    style = list(
      cell_fill(color = '#9f0101'),
      cell_text(color="white")
    )
  ) %>%
  tab_style_body(
    columns = hosp_type,
    values = "Community",
    style = list(
      cell_fill(color = '#5a3286'),
      cell_text(color="white")
    )
  ) %>%
  tab_style_body(
    columns = hosp_type,
    values = "Small",
    style = cell_fill(color = '#e5ce90')
  ) %>%

  tab_header(
    title = md('Total number of CLABSIs, by hospital')
  ) %>%
  opt_stylize(6, color = "gray") 
}
Total number of CLABSIs, by hospital
Hospital Type 2019 2020 2021 2022 2023 2024
WVUH Academic 34 34 33 32 31 22
BMC Community 3.9 3.8 4.4 4.1 3.7 2.0
CCMC Community 8.4 6.1 6.4 5.4 5.7 4.3
PRN Community 2.5 2.8 4.3 4.6 4.5 2.8
RMH Community 0.60 0.73 0.98 1.0 0.89 0.50
UHC Community 4.8 6.1 9.1 8.9 8.8 5.7
UTN Community 2.1 2.2 2.7 2.9 2.3 1.5
WHL Community 8.0 9.2 9.4 6.3 5.1 3.7
BRN Critical access 0 0.012 0.012 0.0040
BRX Critical access 0.0030 0.020 0.028 0.016
GMH Critical access 0.092 0.10 0.20 0.17 0.14 0.030
GRMC Critical access 0.34 0.36 0.76 0.61 0.58 0.34
HRS Critical access 0.0020 0.0050 0 0 0.0080 0.0080
JAX Critical access 0.048 0.038 0.051 0.045 0.030 0.030
JMC Critical access 0.095 0.15 0.20 0.17 0.21 0.070
PVH Critical access 0.20 0.13 0.075 0.091 0.11 0.063
SMR Critical access 0.13 0.19 0.16 0.19 0.22 0.088
STJ Critical access 0.056 0.066 0.093 0.071 0.070 0.078
WTZ Small 0.0060 0.0020 0.0080 0.062 0.065 0.12
Show code
small_hospitals <- hospitals %>% 
  filter(hosp_type=="Critical access") %>%
  pull(NHSN_abbv)

small_hospitals <- c(small_hospitals, "WTZ")

To accommodate this unique set of hospitals in the analysis, we will aggregate their monthly numbers together and treat them as one “hospital”. Although WTZ isn’t classified as a critical access hospital, it has a smaller capacity (bed size < 50) so we will include it with the CAHs. Specifically, we will treat BRN, BRX, GRMC, GMH, HRS, JAX, JMC, PVH, STJ, SMR, and WTZ as small hospitals.

Show code
# Temporary df to join hosp info w/ matched df
df_temp <- hospitals %>%
  select(NHSN_abbv, name_short, hosp_type) %>%
  
  # Recode the ones that are under WVUH
  mutate(name_short = if_else(NHSN_abbv == "WVUH",
                              "Ruby & Fairmont",name_short)) %>%
  
  # Ignore small hopspitals or those not in NHSN
  filter(!NHSN_abbv %in% small_hospitals,
         !is.na(NHSN_abbv)) %>%
  unique() %>%
  
  add_row(NHSN_abbv  = "Small",
          name_short = "Small hospitals",
          hosp_type  = "CAH / small")



matched_df <- matched1 %>%
  mutate(Hosp = if_else(Hosp %in% small_hospitals, "Small", Hosp)) %>%
  group_by(Hosp, Date) %>%
  summarise(Inf_pred = sum(Inf_pred, na.rm = T),
            Inf_obs = sum(Inf_obs, na.rm = T),
            CL_days = sum(CL_days, na.rm = T),
            Denominator = sum(Denominator, na.rm = T),
            Numerator = sum(Numerator, na.rm = T),
            .groups = "drop") %>%
  
  left_join(df_temp, by=c("Hosp"="NHSN_abbv")) %>%
  
  # Handle implicit zeros introduced my the `summarise`
  mutate(CL_days  = if_else(Inf_pred==0, NA, CL_days),
         Inf_obs  = if_else(Inf_pred==0, NA, Inf_obs),
         Inf_pred = if_else(Inf_pred==0, NA, Inf_pred)) %>%
  
  mutate(missing_CHG = Denominator==0 & Numerator == 0,
         Denominator = if_else(missing_CHG, NA, Denominator),
         Numerator   = if_else(missing_CHG, NA, Numerator)) %>%
  
  # Define the era
  mutate(Era = case_when(Date < ymd(params$rollout_start) ~ "Before",
                         Date > ymd(params$rollout_end)   ~ "After",
                         TRUE                             ~ "During"),
         Era = as_factor(Era))
  

# crossing(Hosp = df_temp$NHSN_abbv, 
#          Date = c(CHG_matched1$Date, NHSN_clean$Date)) %>%
#   left_join(matched_df) %>%
#   mutate(missing_CHG = is.na(Numerator),
#          missing_NHSN = is.na(CL_days),
#          missing = case_when( missing_CHG &  missing_NHSN ~ "Both",
#                               missing_CHG & !missing_NHSN ~ "CHG",
#                              !missing_CHG &  missing_NHSN ~ "NHSN",
#                              !missing_CHG & !missing_NHSN ~ "No missing data")) %>%
#   
#   ggplot(aes(x=Hosp, y=Date, fill=missing)) + 
#   geom_tile(color="white") +
#   coord_flip() +
#   theme_bw() +
#   scale_fill_manual(values = c("Both missing" = "#c68fea", "CHG missing" = "#2e7fd0", 
#                                "NHSN missing" = "#b92092", "No missing data" = "grey30",
#                                "Both"         = "#c68fea", "CHG"         = "#2e7fd0", 
#                                "NHSN"         = "#b92092" ),
#                     name="Missing") +
#   labs(x="", y="", title = "Missing data") +
#   theme(legend.position = "bottom")

rm(matched1, CHG_matched1, df_temp)

This transforms the previous table to look like this:

Show code
if (!interactive()) {
matched_df %>%
  mutate(Year = year(Date)) %>%
  group_by(Hosp, Year, hosp_type) %>%
  summarise(Infections = sum(Inf_pred, na.rm = T), .groups = "drop") %>%
  pivot_wider(id_cols = c(Hosp,hosp_type), names_from = Year, values_from = Infections) %>%
  
  select(Hosp, hosp_type, starts_with("20")) %>%
  unique() %>%
  arrange(hosp_type) %>%
  
  gt(rowname_col = "Hosp") %>%
  tab_stub_indent(everything(), 5) %>%
  fmt_missing() %>%
  fmt_number(columns = starts_with("20"), n_sigfig = 2) %>%
  
  cols_label(hosp_type = md("Hospital Type")) %>%

  gtExtras::gt_color_rows(
    columns = starts_with("20"),
    # domain = c(0,50)
    # palette = c("purple", "lightgrey", "green")
  ) %>%

  tab_style_body(
    columns = hosp_type,
    values = "Critical access",
    style = list(
      cell_fill(color = '#9f0101'),
      cell_text(color="white")
    )
  ) %>%
  tab_style_body(
    columns = hosp_type,
    values = "Community",
    style = list(
      cell_fill(color = '#5a3286'),
      cell_text(color="white")
    )
  ) %>%
  tab_style_body(
    columns = hosp_type,
    values = "Small",
    style = cell_fill(color = '#e5ce90')
  ) %>%

  tab_header(
    title = md('Total number of CLABSIs, by hospital')
  ) %>%
  opt_stylize(6, color = "gray")
}
Total number of CLABSIs, by hospital
Hospital Type 2019 2020 2021 2022 2023 2024
WVUH Academic 34 34 33 32 31 22
Small CAH / small 0.97 1.0 1.5 1.4 1.5 0.85
BMC Community 3.9 3.8 4.4 4.1 3.7 2.0
CCMC Community 8.4 6.1 6.4 5.4 5.7 4.3
PRN Community 2.5 2.8 4.3 4.6 4.5 2.8
RMH Community 0.60 0.73 0.98 1.0 0.89 0.50
UHC Community 4.8 6.1 9.1 8.9 8.8 5.7
UTN Community 2.1 2.2 2.7 2.9 2.3 1.5
WHL Community 8.0 9.2 9.4 6.3 5.1 3.7

Descriptive statistics

Show code
time_limited_full <- matched_df %>%
  filter(Date > ymd(params$ignore_before)) %>%
  filter(!is.na(Inf_pred))

time_limited <- time_limited_full %>%
  filter(Era != "During")

key_dates <- list(
  before_start = min(pull(filter(time_limited, Era=="Before"), "Date")),
  before_end   = max(pull(filter(time_limited, Era=="Before"), "Date")),
  before_len   = length(unique(pull(filter(time_limited, Era=="Before"), "Date"))),
  after_start  = min(pull(filter(time_limited, Era=="After"), "Date")),
  after_end    = max(pull(filter(time_limited, Era=="After"), "Date")),
  after_len    = length(unique(pull(filter(time_limited, Era=="After"), "Date")))
)

The before era spans from Sep 2022 to Jul 2023 (11 months), while the after era spans from Oct 2023 to Aug 2024 (11 months).

Before & After

Show code
df_1 <- time_limited %>%
  # group_by(Era) %>%
  group_by(Era, hosp_type) %>%
  summarise(Inf_pred = sum(Inf_pred, na.rm = T),
            Inf_obs  = sum(Inf_obs, na.rm = T),
            CL_days  = sum(CL_days, na.rm = T), 
            .groups = "drop") %>%
  pivot_longer(cols = -c(Era, hosp_type)) %>%
  pivot_wider(names_from = Era, values_from = value) 
  
df_2 <- df_1 %>%
  group_by(name) %>%
  summarise(Before = sum(Before),
            After  = sum(After)) %>%
  mutate(hosp_type = "Overall")
  
# bind_rows(df_2, df_1) %>%
#   relocate(hosp_type) %>%
#   group_by(name) %>%
#   gt()
#   
# bind_rows(df_2, df_1) %>%
#   relocate(hosp_type) %>%
#   pivot_wider(names_from = name, values_from = c(Before, After), names_glue = "{name}.{.value}", names_vary = "slowest") %>%
#   
#   relocate(CL_days.Before, CL_days.After, .after = Inf_pred.After) %>%
#   
#   
#   gt(rowname_col = "hosp_type") %>%
#   # fmt_number(columns = starts_with("Inf_pred"), n_sigfig = 3) %>%
#   tab_header(
#     title = md('Pre-Post')
#   ) %>%
#   opt_stylize(6, color = "gray")
rm(df_1, df_2)

time_limited %>%
  filter(!is.na(Inf_obs)) %>%
  # group_by(Era) %>%
  group_by(Era, hosp_type) %>%
  summarise(
    # Inf_pred = qwraps2::median_iqr(Inf_pred, na_rm  = T, show_n="never"),
    # Inf_obs  = qwraps2::median_iqr(Inf_obs, na_rm  = T, show_n="never"),
    # CL_days  = qwraps2::median_iqr(CL_days, na_rm  = T, show_n="never", digits=0), 
    Inf_pred = qwraps2::frmtci(qwraps2::mean_ci(Inf_pred, na_rm  = T), format = "est (lcl - ucl)"),
    Inf_obs  = qwraps2::frmtci(qwraps2::mean_ci(Inf_obs, na_rm  = T), format = "est (lcl - ucl)"),
    CL_days  = qwraps2::frmtci(qwraps2::mean_ci(CL_days, na_rm  = T), format = "est (lcl - ucl)", digits = 1), 
    # n = as.character(n()),
    .groups = "drop") %>%
  pivot_longer(cols = -c(Era, hosp_type)) %>%
  pivot_wider(names_from = Era, values_from = value) %>%
  
  relocate(hosp_type) %>%
  pivot_wider(names_from = name, values_from = c(Before, After), names_glue = "{name}.{.value}", names_vary = "slowest") %>%
  
  relocate(Inf_pred.Before, Inf_pred.After, .after = Inf_obs.After) %>%
  relocate(CL_days.Before, CL_days.After, .after = Inf_pred.After) %>%
    
    
  
  gt(rowname_col = "hosp_type") %>%
  # fmt_number(columns = starts_with("Inf_pred"), n_sigfig = 3) %>%
  gt::tab_spanner_delim(delim = ".") %>%
  tab_header(
    title = md('Pre-Post')
  ) %>%
  opt_stylize(6, color = "gray")
Pre-Post
Inf_obs Inf_pred CL_days
Before After Before After Before After
Academic 3.00 (2.01 - 3.99) 1.82 (1.03 - 2.60) 2.59 (2.47 - 2.71) 2.72 (2.57 - 2.87) 2,263.9 (2,169.9 - 2,358.0) 2,352.4 (2,238.2 - 2,466.5)
CAH / small 0.18 (-0.06 - 0.42) 0.18 (-0.06 - 0.42) 0.13 (0.11 - 0.14) 0.11 (0.09 - 0.13) 351.5 (310.7 - 392.4) 291.2 (243.8 - 338.6)
Community 0.42 (0.25 - 0.58) 0.30 (0.16 - 0.43) 0.37 (0.33 - 0.42) 0.39 (0.34 - 0.44) 434.9 (386.6 - 483.2) 454.7 (401.1 - 508.4)

Infection rate over time

To start, we will look at the raw (absolute) number of CLABSIs that occurred monthly by each hospital. The colored bars represent the actual number of infections that have occurred (the very small ones at the bottom of the Y-axis are zero infections), and the horizontal black solid line are the predicted number of infections as calculated by the NHSN. There is also a grey dashed vertical line that marks the implementation phase of the system wide CHG roll out.

Show code
time_limited_full %>%
  ggplot(aes(x=Date, y=Inf_obs, fill=hosp_type)) +
  geom_vline(xintercept = ymd(params$rollout_start), 
             alpha=0.5, linetype = "dashed",
             color="grey30") +
  geom_vline(xintercept = ymd(params$rollout_end), 
             alpha=0.5, linetype = "dashed",
             color="grey30") +
  geom_line(aes(y=Inf_pred)) +
  geom_col(aes(color=hosp_type),alpha=0.7) +
  facet_wrap("Hosp") +
  guides(color="none") +
  scale_x_date(date_labels = "%b\n%Y") +
  labs(title="CLABSI: Observed vs Predicted",
       subtitle="Black line solid is predicted number from NHSN",
       y="Number of CLABSIs",
       x="Month",
       fill="Hospital\nType",
       caption = "CAH = Critical Access Hospital") +
  theme_bw()

In a similar vein, the graph below shows the rates of CLABSIs per central line day.

Show code
time_limited_full %>%
  mutate(Inf_rate = 10000 * Inf_obs / CL_days,
         Pred_rate = 10000 * Inf_pred / CL_days) %>%
  
  ggplot(aes(x=Date, y=Inf_rate, fill=hosp_type)) +
  geom_vline(xintercept = ymd(params$rollout_start), 
             alpha=0.5, linetype = "dashed",
             color="grey30") +
  geom_vline(xintercept = ymd(params$rollout_end), 
             alpha=0.5, linetype = "dashed",
             color="grey30") +
  geom_line(aes(y=Pred_rate)) +
  geom_col(aes(color=hosp_type),alpha=0.7) +
  facet_wrap("Hosp") +
  guides(color="none") +
  scale_x_date(date_labels = "%b\n%Y") +
  labs(title="CLABSI Rate Over Time",
       subtitle="Black line solid is predicted rate from NHSN",
       y="CLABSIs (per 10,000 CL days)",
       x="Month",
       fill="Hospital\nType",
       caption = "CL = central line\nCAH = Critical Access Hospital") +
  theme_bw()

Show code
time_limited %>%
  mutate(Q = str_glue("{year(Date)} Q{quarter(Date)}")) %>%
  group_by(Hosp, hosp_type, Q) %>%
  summarise(Inf_rate = 10000 * sum(Inf_obs, na.rm = T) / sum(CL_days, na.rm = T),
         .groups="drop") %>%
  # pivot_wider(names_from = c(hosp_type, Hosp), 
  #             values_from = c(Inf_rate),
  #             names_glue = "{hosp_type}.{Hosp}") %>%
  select(-hosp_type) %>%
  pivot_wider(names_from = Hosp, 
              values_from = c(Inf_rate)) %>%
  filter(Q != "2024 Q4") %>%
  gt(rowname_col = "Q") %>%
  fmt_number(columns = -Q) %>%
  # tab_spanner_delim(delim = ".") %>%

  data_color(
    columns = -Q,
    # domain = c(0,1000),
    palette = "BuPu",
    # pal_type = "continuous"
  ) %>%
  tab_header(
    title = md('CLABSI Rate (per 10,000 CL days) by Quarter'),
    subtitle = " " ) %>%
  opt_stylize(6, color="gray") 
CLABSI Rate (per 10,000 CL days) by Quarter
BMC CCMC PRN RMH Small UHC UTN WHL WVUH
2022 Q3 0.00 0.00 0.00 0.00 0.00 0.00 31.85 0.00 24.28
2022 Q4 0.00 0.00 17.65 0.00 0.00 4.34 36.73 0.00 5.51
2023 Q1 8.99 0.00 31.89 0.00 8.40 15.53 35.46 5.77 13.59
2023 Q2 0.00 0.00 11.50 0.00 0.00 4.30 15.50 12.58 13.40
2023 Q3 0.00 0.00 16.37 0.00 35.59 13.46 0.00 19.19 26.74
2023 Q4 0.00 0.00 24.83 0.00 9.64 4.08 0.00 6.56 8.60
2024 Q1 0.00 12.49 12.00 0.00 9.80 7.72 0.00 0.00 9.92
2024 Q2 0.00 0.00 21.26 0.00 0.00 0.00 10.58 11.52 6.94
2024 Q3 0.00 10.42 10.88 0.00 0.00 0.00 72.99 0.00 4.31

Infection rates vs era

Next we can look at the rates of CLABSIs during the before and after CHG bathing was implemented. The table on the left looks at the standardized infection ratio (SIR) and the table on the right shows the rates of CLABSIs based on the number of central line days.

The standardized infection ratio (SIR) is a summary measure used to track HAIs at a national, state, or local level over time. The SIR adjusts for various facility and/or patient-level factors that contribute to HAI risk within each facility. The method of calculating an SIR is similar to the method used to calculate the Standardized Mortality Ratio (SMR), a summary statistic widely used in public health to analyze mortality data. In HAI data analysis, the SIR compares the actual number of HAIs reported to the number that would be predicted, given the standard population (i.e., NHSN baseline), adjusting for several risk factors that have been found to be significantly associated with differences in infection incidence.

NHSN documentation, page 4

The formula for SIR is:

\[ SIR = \frac{Observed\ HAIs}{Predicted\ HAIs} \]

Note

The NHSN prefers to use SIRs (instead of rates) because the SIR adjusted for differences in risk among populations (using the predicted number of infections), as explained on page 4 of their documentation. We will capitalize on the NHSN’s calculation of predicted infections later in our analysis, as their calculation of predicted infections accounts for numerous confounding variables

Show code
time_limited %>%
  group_by(Era, Hosp) %>%
  summarise(Inf_pred = sum(Inf_pred, na.rm = T),
            Inf_obs  = sum(Inf_obs, na.rm = T),
            CL_days  = sum(CL_days, na.rm = T), 
            .groups = "drop") %>%
  mutate(SIR = Inf_obs/Inf_pred) %>%
  select(Hosp, Era, SIR) %>%
  pivot_wider(names_from = Era, 
              values_from = SIR) %>%
  gt(rowname_col = "Hosp") %>%
  # tab_stub_indent(everything(), 5) %>%
  fmt_missing() %>%
  # fmt_number(columns = starts_with("20"), n_sigfig = 2) %>%
  fmt_number() %>%
  
  gtExtras::gt_hulk_col_numeric(
    reverse = T
    # columns= Before
  ) %>%
  tab_header(
    title = md('CLABSI SIR in the Pre & Post CHG Era'),
    subtitle = "SIR = standardized infection ratio" )
CLABSI SIR in the Pre & Post CHG Era
SIR = standardized infection ratio
Before After
BMC 0.30 0.00
CCMC 0.00 0.52
PRN 2.53 2.57
RMH 0.00 0.00
Small 1.43 1.65
UHC 0.88 0.38
UTN 4.12 1.38
WHL 0.80 0.61
WVUH 1.16 0.67
Show code
time_limited %>%
  group_by(Era, Hosp) %>%
  summarise(Inf_pred = sum(Inf_pred, na.rm = T),
            Inf_obs  = sum(Inf_obs, na.rm = T),
            CL_days  = sum(CL_days, na.rm = T), 
            .groups = "drop") %>%
  mutate(x = 10000 * Inf_obs/CL_days) %>%
  select(Hosp, Era, x) %>%
  pivot_wider(names_from = Era, 
              values_from = x) %>%
  gt(rowname_col = "Hosp") %>%
  # tab_stub_indent(everything(), 5) %>%
  fmt_missing() %>%
  # fmt_number(columns = starts_with("20"), n_sigfig = 2) %>%
  fmt_number(decimals = 1, ) %>%

  tab_header(
    title = md('CLABSI rate in the Pre & Post CHG Era'),
    subtitle = "Per 10,000 central line days" ) %>%
  opt_stylize(6, color="gray") 
CLABSI rate in the Pre & Post CHG Era
Per 10,000 central line days
Before After
BMC 2.7 0.0
CCMC 0.0 5.4
PRN 17.7 17.8
RMH 0.0 0.0
Small 5.2 6.2
UHC 8.0 3.4
UTN 28.6 9.4
WHL 6.9 4.9
WVUH 13.3 7.7

Figure 1: CLABSI occurrences in the pre & post CHG era

CHG vs time

This shows the CHG compliance for each hospital (for the period that I have the data available). Each bar is colored by the “denominator”, meaning the number of patients that are eligible for CHG bathing

Show code
time_limited %>%
  filter(!missing_CHG) %>%
  mutate(CHG = Numerator / Denominator) %>%
  ggplot(aes(x=Date, y=CHG, fill=Denominator)) +
  geom_col(color='black') +
  facet_wrap("Hosp") +
  # scale_fill_continuous(trans = "log10") +
  # scale_fill_binned(trans = "log10") +
  scale_fill_fermenter(palette="GnBu", trans = "log10") +
  scale_y_continuous(labels = scales::percent, limits = c(0,1)) +
  scale_x_date(date_labels = "%b\n%Y") +
  guides(color="none") +
  labs(title="CHG compliance",
       y="CHG compliance (%)",
       x="Month",
       fill="CHG\nDenominator",
       caption = "CAH = Critical Access Hospital") +
  theme_bw()

Inf rate vs CHG

In progress

TODO

I’m struggling with how to analyze this best, so I might revise it later…

Show code
time_limited %>%
  filter(!missing_CHG) %>%
  mutate(CHG = Numerator / Denominator) %>%
  ggplot(aes(x=Inf_obs, y=CHG)) +
  geom_boxplot(aes(group=Inf_obs)) +
  geom_jitter(alpha=.25)

Regression

In this section, we will use a negative binomial regression to examine any influence that CHG bathing has on CLABSIs. This is the same type of model that the NHSN uses to predict the number of expected infections.

Unless otherwise specified, our outcome of interest will be the monthly number of CLABSIs observed for each hospital. Notably, this is the absolute number, not a rate. To account for confounding variables we will include the number of predicted infections (from the NHSN data).

Model 1

Show code
m1 <- MASS::glm.nb(Inf_obs ~ Inf_pred + Era, data = time_limited) 
summary(m1)
# DHARMa::testDispersion(m1)
# glance(m1)
# unique(select(time_limited, Date, Era)) %>% left_join(NHSN_clean)

The first model is a simple negative binomial model, using the formula below

Inf_obs ~ Inf_pred + Era

The results of this model is shown in the table below:

Show code
broom::tidy(m1, conf.int = T, exponentiate=T) %>%
  relocate(conf.low, conf.high, .before = p.value) %>%
  select(-c(std.error, statistic)) %>%
  
  gt() %>%
  cols_merge_range(col_begin = conf.low, col_end = conf.high, sep = " - ") %>%
  # tab_stub_indent(everything()) %>%
  cols_label(estimate = md("Incident Rate Ratio (IRR)"),
             term     = "Term",
             conf.low = "95% CI",
             p.value  = "P value") %>%
  fmt_number(columns = c(estimate, conf.low, conf.high, p.value), n_sigfig = 3) %>%
  fmt_scientific(columns = p.value, 
             rows = p.value < 0.001) %>%
  fmt_scientific(columns = conf.low, 
             rows = conf.low < 0.001 | conf.low > 1000) %>%
  fmt_scientific(columns = conf.high, 
             rows = conf.high < 0.001 | conf.high > 1000) %>%
  tab_header(
    title = 'Negative binomial model'
  ) %>%
  opt_stylize(5)
Negative binomial model
Term Incident Rate Ratio (IRR) 95% CI P value
(Intercept) 0.301 0.209 - 0.423 1.87 × 10−11
Inf_pred 2.35 1.95 - 2.84 1.60 × 10−19
EraAfter 0.645 0.418 - 0.985 0.0441

This shows that compared to the before era, the “after” era (post CHG implementation) had a 35% reduction in CLABSIs (1 - 0.645) with a P-value of 0.044. It also shows that the predicted infections are significantly associated with increased observed infections (which is to be expected).

The same data is shown graphically below:

Show code
sjPlot::plot_model(m1)

Model 2 - random effects

Show code
library(glmmTMB)
m2 <- glmmTMB(Inf_obs ~ Inf_pred + Era + (1|Hosp) ,
        data = time_limited, 
        family = "nbinom1") 
summary(m2)
# broom.mixed::glance(m2)
# DHARMa::testDispersion(m2)

In Model 1, we didn’t do anything to account for the various hospitals. There are arguments for and against adjusting for hospital-specific effects, outlined below

Reason for adjusting for each hospital

The biggest reason for doing this would be because we have repeated measures, meaning that we have within-group correlation (e.g. the likelihood of CLABSIs at Ruby in July are not independent of the likelihood at Ruby in November). I also feel like doing a random effects model would be helpful for getting a true population level estimate, but I don’t have a clear reason for this belief

Reasons not to adjust for each hospital

The beauty (and downfall) of using NHSN’s predicted number of infections is that it already accounts for many variables that we would want to include. Among these, NHSN already accounts for hospital size & teaching hospital, so introducing “additional adjustment” for hospital size is redundant. This might cause issues with multicollinearity as well as difficulty in interpruting the results. Additionally, the only factors NHSN uses for calculating the predicted CLABSIs at CAH is the number of central line days (see Table 3 and page 42 of the NHSN documents), so I’m really not sure about this

I think it’s worthwhile to at least include this model (even if it’s a sensitivity analysis), which is the same as Model 1 but now with random effects for each hospital. The model formula I use is below:

Inf_obs ~ Inf_pred + Era + (1 | Hosp)

The (1|Hosp) term means each hospital has a random intercept, capturing the variation in infection rates across hospitals. This helps to account for unobserved hospital-specific factors that may influence the CLABSI rates, but still assumes that the effect of CHG implementation is similar across hospitals

Show code
broom.mixed::tidy(m2, conf.int = T, exponentiate=T) %>%
  filter(effect=="fixed") %>%
  select(term, estimate, conf.low, conf.high, p.value) %>%
  
  gt() %>%
  cols_merge_range(col_begin = conf.low, col_end = conf.high, sep = " - ") %>%
  # tab_stub_indent(everything()) %>%
  cols_label(estimate = md("Incident Rate Ratio (IRR)"),
             term     = "Term",
             conf.low = "95% CI",
             p.value  = "P value") %>%
  fmt_number(columns = c(estimate, conf.low, conf.high, p.value), n_sigfig = 3) %>%
  fmt_scientific(columns = p.value, 
             rows = p.value < 0.001) %>%
  fmt_scientific(columns = conf.low, 
             rows = conf.low < 0.001 | conf.low > 1000) %>%
  fmt_scientific(columns = conf.high, 
             rows = conf.high < 0.001 | conf.high > 1000) %>%
  tab_header(
    title = md('**Random effects** negative binomial model'),
    subtitle = "Random effects for each hospital"
  ) %>%
  opt_stylize(5)
Random effects negative binomial model
Random effects for each hospital
Term Incident Rate Ratio (IRR) 95% CI P value
(Intercept) 0.273 0.115 - 0.646 0.00316
Inf_pred 1.57 0.615 - 4.02 0.345
EraAfter 0.650 0.435 - 0.969 0.0347

What is interesting here is that the p-value for Inf_pred is now 0.345

Show code
sjPlot::plot_model(m2)

Model 3 - time series

Show code
# KEY
#   P: Time since intervention
#   Time: Time

tempdf <- time_limited %>%
  filter(Era == "After") %>%
  select(Date) %>%
  unique() %>%
  arrange(Date) %>%
  mutate(P = row_number())

tempdf2 <- time_limited %>%
  # filter(Era == "After") %>%
  select(Date) %>%
  unique() %>%
  mutate(Time = row_number()) %>%
  left_join(tempdf, by="Date") %>%
  mutate(P = if_else(is.na(P), 0, P))

df_TS <- time_limited %>%
  left_join(tempdf2, by="Date")
rm(tempdf, tempdf2)

m3 <- MASS::glm.nb(Inf_obs ~ Inf_pred + Era + P + Time, data = df_TS) 
summary(m3)
glance(m3)
# DHARMa::testDispersion(m3)

Another option is to do an interrupted time series. I’m not as familiar with these, but I do like the output graphs that can create since you can show the counter factual. The formula for this model is:

Inf_obs ~ Inf_pred + Era + P + Time

In this formula

  • Time = The number of months that have passed since the start of the oberseravtion period

  • Era = Indicates before / after system wide CHG was implemented

  • P = The number of months that have passed since CHG implementation occurred. This helps to show if the internvention has a sustained effect

Show code
broom::tidy(m3, conf.int = T, exponentiate=T) %>%
  relocate(conf.low, conf.high, .before = p.value) %>%
  select(-c(std.error, statistic)) %>%
  
  gt() %>%
  cols_merge_range(col_begin = conf.low, col_end = conf.high, sep = " - ") %>%
  # tab_stub_indent(everything()) %>%
  cols_label(estimate = md("Incident Rate Ratio (IRR)"),
             term     = "Term",
             conf.low = "95% CI",
             p.value  = "P value") %>%
  fmt_number(columns = c(estimate, conf.low, conf.high, p.value), n_sigfig = 3) %>%
  fmt_scientific(columns = p.value, 
             rows = p.value < 0.001) %>%
  fmt_scientific(columns = conf.low, 
             rows = conf.low < 0.001 | conf.low > 1000) %>%
  fmt_scientific(columns = conf.high, 
             rows = conf.high < 0.001 | conf.high > 1000) %>%
  tab_header(
    title = md('Negative binomial: **Interrupted Time Series**'),
    subtitle = md("Time = Month; P = Month since implmentation")
  ) %>%
  opt_stylize(5)
Negative binomial: Interrupted Time Series
Time = Month; P = Month since implmentation
Term Incident Rate Ratio (IRR) 95% CI P value
(Intercept) 0.234 0.120 - 0.434 1.17 × 10−5
Inf_pred 2.34 1.95 - 2.82 3.46 × 10−20
EraAfter 0.617 0.265 - 1.40 0.253
P 0.935 0.816 - 1.07 0.324
Time 1.04 0.957 - 1.14 0.352

This has a similar magnitude for Era as the other models, but now Era is no longer signifigant.

In progress

Looking at hospital types

Here we need to use CL_days as the predictor (not Inf_pred) because the predicted infections from NHSN already incorporate the hospital type in their formula for predicting infections

Show code
m4 <- MASS::glm.nb(Inf_obs ~ CL_days + Era + hosp_type, data = time_limited)
summary(m4)

Call:
MASS::glm.nb(formula = Inf_obs ~ CL_days + Era + hosp_type, data = time_limited, 
    init.theta = 5.42583656, link = log)

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)  
(Intercept)          -1.4327773  1.2219447  -1.173   0.2410  
CL_days               0.0010918  0.0005249   2.080   0.0375 *
EraAfter             -0.4409802  0.2153672  -2.048   0.0406 *
hosp_typeCAH / small -0.4348348  1.1719261  -0.371   0.7106  
hosp_typeCommunity    0.0805961  0.9834201   0.082   0.9347  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(5.4258) family taken to be 1)

    Null deviance: 258.07  on 194  degrees of freedom
Residual deviance: 169.93  on 190  degrees of freedom
AIC: 347.6

Number of Fisher Scoring iterations: 1

              Theta:  5.43 
          Std. Err.:  5.62 

 2 x log-likelihood:  -335.605 
Show code
# DHARMa::testDispersion(m1)
# sjPlot::plot_model(m4)
broom::tidy(m4, conf.int = T, exponentiate=T) %>% 
  relocate(conf.low, conf.high, .before = p.value) %>%
  select(-c(std.error, statistic)) %>%
  knitr::kable()
term estimate conf.low conf.high p.value
(Intercept) 0.2386452 0.0213414 2.6280704 0.2409809
CL_days 1.0010924 1.0000499 1.0021393 0.0375328
EraAfter 0.6434054 0.4189051 0.9786849 0.0406018
hosp_typeCAH / small 0.6473716 0.0626463 6.3539743 0.7106056
hosp_typeCommunity 1.0839330 0.1552118 7.3574545 0.9346826
Show code
m5 <- MASS::glm.nb(Inf_obs ~ CL_days + Era * hosp_type, data = time_limited)
summary(m5)

Call:
MASS::glm.nb(formula = Inf_obs ~ CL_days + Era * hosp_type, data = time_limited, 
    init.theta = 5.46940686, link = log)

Coefficients:
                                Estimate Std. Error z value Pr(>|z|)  
(Intercept)                   -1.4607682  1.2221900  -1.195   0.2320  
CL_days                        0.0011359  0.0005275   2.153   0.0313 *
EraAfter                      -0.6156412  0.3397506  -1.812   0.0700 .
hosp_typeCAH / small          -0.6456433  1.2622453  -0.512   0.6090  
hosp_typeCommunity             0.0532167  0.9891992   0.054   0.9571  
EraAfter:hosp_typeCAH / small  0.6826420  1.0737819   0.636   0.5249  
EraAfter:hosp_typeCommunity    0.2574708  0.4427173   0.582   0.5609  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(5.4694) family taken to be 1)

    Null deviance: 258.28  on 194  degrees of freedom
Residual deviance: 169.46  on 188  degrees of freedom
AIC: 351.01

Number of Fisher Scoring iterations: 1

              Theta:  5.47 
          Std. Err.:  5.65 

 2 x log-likelihood:  -335.013 
Show code
broom::tidy(m5, conf.int = T, exponentiate=T) %>% 
  relocate(conf.low, conf.high, .before = p.value) %>%
  select(-c(std.error, statistic)) %>%
  knitr::kable()
term estimate conf.low conf.high p.value
(Intercept) 0.2320579 0.0207388 2.556486 0.2320067
CL_days 1.0011365 1.0000878 1.002190 0.0312971
EraAfter 0.5402944 0.2726206 1.050491 0.0699802
hosp_typeCAH / small 0.5243251 0.0381462 5.816207 0.6089983
hosp_typeCommunity 1.0546582 0.1497819 7.227973 0.9570963
EraAfter:hosp_typeCAH / small 1.9790997 0.2107256 18.614941 0.5249484
EraAfter:hosp_typeCommunity 1.2936540 0.5407193 3.107744 0.5608568
Show code
library(interactions)

cat_plot(m5, pred = hosp_type, modx = Era, interval = TRUE, data=time_limited)

Show code
interact_plot(m5, pred = CL_days, modx = Era, plot.points = TRUE, interval = TRUE)

Show code
# car::vif(m4)
Show code
m_stratified <- time_limited %>% 
  split(time_limited$hosp_type) %>%
  # group_by(hosp_type) %>%
  # nest() %>%
  map(~MASS::glm.nb(Inf_obs ~ Inf_pred + Era, data = .x) ) 

bind_rows(
  mutate(broom::tidy(m_stratified$Academic, conf.int = T, exponentiate=T), strata = "Academic"),
  mutate(broom::tidy(m_stratified$Community, conf.int = T, exponentiate=T), strata = "Community"),
  mutate(broom::tidy(m_stratified$`CAH / small`, conf.int = T, exponentiate=T), strata = "CAH / small")
) %>%
  relocate(strata) %>%
  relocate(conf.low, conf.high, .before = p.value) %>%
  group_by(strata) %>%
  select(-c(std.error, statistic)) %>%
  
  gt() %>%
  cols_merge_range(col_begin = conf.low, col_end = conf.high, sep = " - ") %>%
  # tab_stub_indent(everything()) %>%
  cols_label(estimate = md("Incident Rate Ratio (IRR)"),
             term     = "Term",
             conf.low = "95% CI",
             p.value  = "P value") %>%
  fmt_number(columns = c(estimate, conf.low, conf.high), n_sigfig = 3) %>%
  fmt_scientific(columns = conf.low, 
             rows = conf.low < 0.001 | conf.low > 1000) %>%
  fmt_scientific(columns = conf.high, 
             rows = conf.high < 0.001 | conf.high > 1000) %>%
  tab_header(
    title = 'Regression, stratified by hopsital type'
  ) %>%
  opt_stylize(5) %>%
  tab_style(
    style = list(
      cell_fill(color = "lightyellow")  # Set the background color
    ),
    locations = cells_row_groups()  # Target the groupname_col
  )
Regression, stratified by hopsital type
Term Incident Rate Ratio (IRR) 95% CI P value
Academic
(Intercept) 45.8 1.81 - 1.17 × 103 0.0201923743
Inf_pred 0.346 0.0964 - 1.21 0.0986246727
EraAfter 0.688 0.382 - 1.21 0.2004854147
Community
(Intercept) 0.253 0.120 - 0.511 0.0001454131
Inf_pred 3.44 0.807 - 14.8 0.0873523253
EraAfter 0.707 0.376 - 1.31 0.2722252261
CAH / small
(Intercept) 0.0799 1.81 × 10−4 - 10.7 0.3616403333
Inf_pred 601 7.23 × 10−15 - 6.00 × 1020 0.7555758257
EraAfter 1.11 0.122 - 10.1 0.9205651567
Show code
glmmTMB(Inf_obs ~ Inf_pred + Era + (1|Hosp) ,
        data = time_limited, 
        family = "poisson") %>%
  summary()


calc_dispersion <- function(mod, verbose=F){
  # Identify the model family
  fam_raw <- mod$family$family
  
  if(str_detect(fam_raw, "poisson")) family <- "poisson"
  if(str_detect(fam_raw, "Negative Binomial")) family <- "NB"
  
  E2 <- resid(mod, type = "pearson")
  N <- nrow(mod$model)
  p <- length(coef(mod)) 
  
  if(family=="NB") p <- p + 1 # '+1' is for variance parameter in NB
  
  result <- sum(E2^2) / (N - p)
  
  if(verbose) {
    if(result>1) message("Overdisperion")
    if(result<1) message("Underdisperion")
  }
  
  return(result)
}
calc_dispersion(m1)
DHARMa::testDispersion(m2)

Footnotes

  1. For example, March 31st (2024) was on a Sunday, so the week starting on 3/31/24 mostly reflects April 2024↩︎

  2. For example, the CHG dashboard reports compliance at (1) WVU Medicine Children’s Hospital and (2) JW Ruby Memorial Hospital, although CMS considers these to be the same facility (they share the same CMS ID, 510001). To complicate things further, it appears that for NHSN purposes Fairmont is also considered to fall under Ruby↩︎