Show code
str(params, give.attr = F)List of 4
$ era_start : chr "2023-10-01"
$ rollout_start: chr "2023-08-01"
$ rollout_end : chr "2023-10-01"
$ ignore_before: chr "2022-09-01"
This report is run with the following parameters
str(params, give.attr = F)List of 4
$ era_start : chr "2023-10-01"
$ rollout_start: chr "2023-08-01"
$ rollout_end : chr "2023-10-01"
$ ignore_before: chr "2022-09-01"
The workflow for downloading the NHSN data from Tableau is described in the other script. Once I learn how to use NHSN I will update this section
NHSN_junk <- "TableauData/2024-10-09 CLASBI SIR" %>%
# Use `fs` package to get the names of all of the CSVs
fs::dir_ls() %>%
# Read all of the CSVs using map
purrr::map(readr::read_csv,
col_types = "ccccccccccccccccccccccccccccccccccccccc") %>%
# Bind into a single table
list_rbind()NHSN_clean <- NHSN_junk %>%
mutate(Month = `Month of SUMMARYYM`,
Date = dmy(str_glue("15-{Month}")),
Hosp = as_factor(ORGID),
Inf_pred = as.numeric(NUMPRED),
Inf_obs = as.numeric(`Infections Observed`),
CL_days = as.numeric(NUMCLDAYS),
# .keep="none" is the same as transmute, keeps only new columns
.keep="none") %>%
select(-Month)
rm(NHSN_junk)
NHSN_clean %>% rmarkdown::paged_table()CHG data was obtained from the CHG compliance dashboard. I selected device type = central lines and exported the heatmap to CSV
I can only download the past 12 months of data from the dashboard. Will need to get access to more of the data to increase the power of the pre-post analysis
CHG_raw <- "TableauData/2024-10-18 CHG CL only/All Overall Rate by Hospital and Department_data.csv" %>%
read_csv()
CHG_clean <- CHG_raw %>%
select(Hospital, Date=`Week of Census Time`,
Denominator=`All Lines Denominator`,
Numerator=`All Lines Numerator`) %>%
mutate(Date = mdy(Date)) %>%
group_by(Hospital, Date) %>%
summarise(Denominator = sum(Denominator, na.rm = T),
Numerator = sum(Numerator, na.rm = T)) %>%
ungroup() %>%
# Remove most recent week
filter(Date != ymd("2024-10-13"))
rm(CHG_raw) Because it is reported weekly (and everything else is reported monthly), we need to convert the weekly data into monthly data. Some weeks span the course of multiple months1. In these cases, we will group them with whichever month that the majority of their days fell under
When you extract the data from Tableau, the dates it lists are all Sundays. It doesn’t look like it’s explicitly documented anywhere, but I assume that this means the week starts on Sunday, meaning the date provided is day 1 of 7 for that week
week_to_month <- function(date, beginning=T) {
require(lubridate)
# Make sure you are given only one date, not a vector
stopifnot(length(date)==1)
# Calculate the midpoint of the week, which is 3 days away from the start
# (or end) date
if(beginning) {
mid_date <- date + days(3)
} else({
# Otherwise, treat the given day as the end of the week
mid_date <- date - days(3)
})
# Extract the month and the year from this date
new_month <- month(mid_date)
new_year <- year(mid_date)
# Create a new date and return
new_date <- ymd(paste0(new_year, "-", new_month, "-15"))
return(new_date)
}
CHG_monthly <- CHG_clean %>%
# Needed to only process one date at a time
rowwise() %>%
mutate(Month = week_to_month(Date)) %>%
ungroup() %>%
group_by(Hospital, Month) %>%
summarise(Denominator =sum(Denominator , na.rm = T),
Numerator =sum(Numerator , na.rm = T))
rm(CHG_clean)
CHG_monthly %>% rmarkdown::paged_table()# mutate(Hospital = as.factor(Hospital)) %>%
# DT::datatable(filter="top", rownames = F)This comes from a Google Sheet where I store info for each hospital. There is important data on the sheet that governs the steps below
library(googlesheets4)
options(gargle_oauth_email = TRUE)
gs4_auth(cache = TRUE)
if(!exists("hospitals")){
hospitals_raw <- paste0("https://docs.google.com/",
"spreadsheets/d/",
"1zZ1o1HhzOPrbj2KBdGsaO4rJvLDPMWqN0ogjMCNHfGw/") %>%
read_sheet(sheet = "CHG Hospitals")
hospitals <- hospitals_raw
rm(hospitals_raw)
}The CHG data and NHSN data don’t match up directly in a one to one manner, as the CHG reporting splits what NHSN considers to be one “hospital” into multiple hospitals2. Using the logic from the Google Sheet, we will aggregate CHG data from:
WVU/UHA, Childrens, and Fairmont into WVUH
STF and TMH into TMH (although for some reason the TMH data isn’t reported on NHSN dashboard)
This means that WVUH’s CHG compliance will be based off of the pooled compliance data from the hospitals listed above
I need to see if I can get STF and TMH from NHSN
CHG_matched1 <- CHG_monthly %>%
# Join to the GS data to match keys
left_join(select(hospitals, NHSN_abbv, CHG_abbv),
by=c("Hospital"="CHG_abbv")) %>%
# Drop data if the `NHSN_abbv` is NA
filter(!is.na(NHSN_abbv)) %>%
# Aggregate data on NHSN abbreviation
group_by(NHSN_abbv, Month) %>%
summarise(Denominator =sum(Denominator , na.rm = T),
Numerator =sum(Numerator , na.rm = T),
.groups = "drop") %>%
rename(Hosp=NHSN_abbv, Date=Month)Now we have our data in a format where we can directly compare the CHG compliance with the number of CLABSIs observed
matched1 <- crossing(Hosp = CHG_matched1$Hosp,
Date = c(CHG_matched1$Date, NHSN_clean$Date)) %>%
left_join(NHSN_clean, by = join_by(Hosp, Date)) %>%
left_join(CHG_matched1, by = join_by(Hosp, Date))
matched1 %>%
arrange(desc(Hosp), desc(Date)) %>%
rmarkdown::paged_table()If you were to click through the 1330 rows in the table above, you’d find lots of missing data, particularly for the CHG compliance.
The figure below illustrates that patterns of missing data with the dark grey showing months where there was no missing data. The blue months are months where there is CLABSI data, but the CHG compliance is missing; conversely the purple months are when the CLABSI data is missing, but we do have CHG data. Finally, the pink (or light purple) months are months where we are missing both CLABSI and CHG data.
if (!interactive()) {
crossing(Hosp = CHG_matched1$Hosp,
Date = c(CHG_matched1$Date, NHSN_clean$Date)) %>%
left_join(NHSN_clean) %>%
left_join(CHG_matched1) %>%
mutate(missing_CHG = is.na(Numerator),
missing_NHSN = is.na(CL_days),
missing = case_when( missing_CHG & missing_NHSN ~ "Both",
missing_CHG & !missing_NHSN ~ "CHG",
!missing_CHG & missing_NHSN ~ "NHSN",
!missing_CHG & !missing_NHSN ~ "No missing data")) %>%
ggplot(aes(x=Hosp, y=Date, fill=missing)) +
geom_tile(color="white") +
coord_flip() +
theme_bw() +
scale_fill_manual(values = c("Both missing" = "#c68fea", "CHG missing" = "#2e7fd0",
"NHSN missing" = "#b92092", "No missing data" = "grey30",
"Both" = "#c68fea", "CHG" = "#2e7fd0",
"NHSN" = "#b92092" ),
name="Missing") +
labs(x="", y="", title = "Missing data") +
theme(legend.position = "bottom")
} # inner_join(NHSN_clean, temp_CHG)The Critical Access Hospitals (CAHs) represent an important segment here, as they serve a distinct patient populations compared to larger hopsitals. From a modeling standpoint, they are also unique in that they have much lower rates of infections, as shown in the table below.
if (!interactive()) {
NHSN_clean %>%
mutate(Year = year(Date)) %>%
group_by(Hosp, Year) %>%
summarise(Infections = sum(Inf_pred, na.rm = T), .groups = "drop") %>%
pivot_wider(id_cols = Hosp, names_from = Year, values_from = Infections) %>%
left_join(select(hospitals, NHSN_abbv, hosp_type), by=c("Hosp"="NHSN_abbv")) %>%
mutate(hosp_type = recode(hosp_type, "???"="Small")) %>%
select(Hosp, hosp_type, starts_with("20")) %>%
unique() %>%
arrange(hosp_type) %>%
gt(rowname_col = "Hosp") %>%
tab_stub_indent(everything(), 5) %>%
fmt_missing() %>%
fmt_number(columns = starts_with("20"), n_sigfig = 2) %>%
cols_label(hosp_type = md("Hospital Type")) %>%
gtExtras::gt_color_rows(
columns = starts_with("20"),
# domain = c(0,50)
# palette = c("purple", "lightgrey", "green")
) %>%
tab_style_body(
columns = hosp_type,
values = "Critical access",
style = list(
cell_fill(color = '#9f0101'),
cell_text(color="white")
)
) %>%
tab_style_body(
columns = hosp_type,
values = "Community",
style = list(
cell_fill(color = '#5a3286'),
cell_text(color="white")
)
) %>%
tab_style_body(
columns = hosp_type,
values = "Small",
style = cell_fill(color = '#e5ce90')
) %>%
tab_header(
title = md('Total number of CLABSIs, by hospital')
) %>%
opt_stylize(6, color = "gray")
}| Total number of CLABSIs, by hospital | |||||||
| Hospital Type | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
|---|---|---|---|---|---|---|---|
| WVUH | Academic | 34 | 34 | 33 | 32 | 31 | 22 |
| BMC | Community | 3.9 | 3.8 | 4.4 | 4.1 | 3.7 | 2.0 |
| CCMC | Community | 8.4 | 6.1 | 6.4 | 5.4 | 5.7 | 4.3 |
| PRN | Community | 2.5 | 2.8 | 4.3 | 4.6 | 4.5 | 2.8 |
| RMH | Community | 0.60 | 0.73 | 0.98 | 1.0 | 0.89 | 0.50 |
| UHC | Community | 4.8 | 6.1 | 9.1 | 8.9 | 8.8 | 5.7 |
| UTN | Community | 2.1 | 2.2 | 2.7 | 2.9 | 2.3 | 1.5 |
| WHL | Community | 8.0 | 9.2 | 9.4 | 6.3 | 5.1 | 3.7 |
| BRN | Critical access | — | — | 0 | 0.012 | 0.012 | 0.0040 |
| BRX | Critical access | — | — | 0.0030 | 0.020 | 0.028 | 0.016 |
| GMH | Critical access | 0.092 | 0.10 | 0.20 | 0.17 | 0.14 | 0.030 |
| GRMC | Critical access | 0.34 | 0.36 | 0.76 | 0.61 | 0.58 | 0.34 |
| HRS | Critical access | 0.0020 | 0.0050 | 0 | 0 | 0.0080 | 0.0080 |
| JAX | Critical access | 0.048 | 0.038 | 0.051 | 0.045 | 0.030 | 0.030 |
| JMC | Critical access | 0.095 | 0.15 | 0.20 | 0.17 | 0.21 | 0.070 |
| PVH | Critical access | 0.20 | 0.13 | 0.075 | 0.091 | 0.11 | 0.063 |
| SMR | Critical access | 0.13 | 0.19 | 0.16 | 0.19 | 0.22 | 0.088 |
| STJ | Critical access | 0.056 | 0.066 | 0.093 | 0.071 | 0.070 | 0.078 |
| WTZ | Small | 0.0060 | 0.0020 | 0.0080 | 0.062 | 0.065 | 0.12 |
small_hospitals <- hospitals %>%
filter(hosp_type=="Critical access") %>%
pull(NHSN_abbv)
small_hospitals <- c(small_hospitals, "WTZ")To accommodate this unique set of hospitals in the analysis, we will aggregate their monthly numbers together and treat them as one “hospital”. Although WTZ isn’t classified as a critical access hospital, it has a smaller capacity (bed size < 50) so we will include it with the CAHs. Specifically, we will treat BRN, BRX, GRMC, GMH, HRS, JAX, JMC, PVH, STJ, SMR, and WTZ as small hospitals.
# Temporary df to join hosp info w/ matched df
df_temp <- hospitals %>%
select(NHSN_abbv, name_short, hosp_type) %>%
# Recode the ones that are under WVUH
mutate(name_short = if_else(NHSN_abbv == "WVUH",
"Ruby & Fairmont",name_short)) %>%
# Ignore small hopspitals or those not in NHSN
filter(!NHSN_abbv %in% small_hospitals,
!is.na(NHSN_abbv)) %>%
unique() %>%
add_row(NHSN_abbv = "Small",
name_short = "Small hospitals",
hosp_type = "CAH / small")
matched_df <- matched1 %>%
mutate(Hosp = if_else(Hosp %in% small_hospitals, "Small", Hosp)) %>%
group_by(Hosp, Date) %>%
summarise(Inf_pred = sum(Inf_pred, na.rm = T),
Inf_obs = sum(Inf_obs, na.rm = T),
CL_days = sum(CL_days, na.rm = T),
Denominator = sum(Denominator, na.rm = T),
Numerator = sum(Numerator, na.rm = T),
.groups = "drop") %>%
left_join(df_temp, by=c("Hosp"="NHSN_abbv")) %>%
# Handle implicit zeros introduced my the `summarise`
mutate(CL_days = if_else(Inf_pred==0, NA, CL_days),
Inf_obs = if_else(Inf_pred==0, NA, Inf_obs),
Inf_pred = if_else(Inf_pred==0, NA, Inf_pred)) %>%
mutate(missing_CHG = Denominator==0 & Numerator == 0,
Denominator = if_else(missing_CHG, NA, Denominator),
Numerator = if_else(missing_CHG, NA, Numerator)) %>%
# Define the era
mutate(Era = case_when(Date < ymd(params$rollout_start) ~ "Before",
Date > ymd(params$rollout_end) ~ "After",
TRUE ~ "During"),
Era = as_factor(Era))
# crossing(Hosp = df_temp$NHSN_abbv,
# Date = c(CHG_matched1$Date, NHSN_clean$Date)) %>%
# left_join(matched_df) %>%
# mutate(missing_CHG = is.na(Numerator),
# missing_NHSN = is.na(CL_days),
# missing = case_when( missing_CHG & missing_NHSN ~ "Both",
# missing_CHG & !missing_NHSN ~ "CHG",
# !missing_CHG & missing_NHSN ~ "NHSN",
# !missing_CHG & !missing_NHSN ~ "No missing data")) %>%
#
# ggplot(aes(x=Hosp, y=Date, fill=missing)) +
# geom_tile(color="white") +
# coord_flip() +
# theme_bw() +
# scale_fill_manual(values = c("Both missing" = "#c68fea", "CHG missing" = "#2e7fd0",
# "NHSN missing" = "#b92092", "No missing data" = "grey30",
# "Both" = "#c68fea", "CHG" = "#2e7fd0",
# "NHSN" = "#b92092" ),
# name="Missing") +
# labs(x="", y="", title = "Missing data") +
# theme(legend.position = "bottom")
rm(matched1, CHG_matched1, df_temp)This transforms the previous table to look like this:
if (!interactive()) {
matched_df %>%
mutate(Year = year(Date)) %>%
group_by(Hosp, Year, hosp_type) %>%
summarise(Infections = sum(Inf_pred, na.rm = T), .groups = "drop") %>%
pivot_wider(id_cols = c(Hosp,hosp_type), names_from = Year, values_from = Infections) %>%
select(Hosp, hosp_type, starts_with("20")) %>%
unique() %>%
arrange(hosp_type) %>%
gt(rowname_col = "Hosp") %>%
tab_stub_indent(everything(), 5) %>%
fmt_missing() %>%
fmt_number(columns = starts_with("20"), n_sigfig = 2) %>%
cols_label(hosp_type = md("Hospital Type")) %>%
gtExtras::gt_color_rows(
columns = starts_with("20"),
# domain = c(0,50)
# palette = c("purple", "lightgrey", "green")
) %>%
tab_style_body(
columns = hosp_type,
values = "Critical access",
style = list(
cell_fill(color = '#9f0101'),
cell_text(color="white")
)
) %>%
tab_style_body(
columns = hosp_type,
values = "Community",
style = list(
cell_fill(color = '#5a3286'),
cell_text(color="white")
)
) %>%
tab_style_body(
columns = hosp_type,
values = "Small",
style = cell_fill(color = '#e5ce90')
) %>%
tab_header(
title = md('Total number of CLABSIs, by hospital')
) %>%
opt_stylize(6, color = "gray")
}| Total number of CLABSIs, by hospital | |||||||
| Hospital Type | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
|---|---|---|---|---|---|---|---|
| WVUH | Academic | 34 | 34 | 33 | 32 | 31 | 22 |
| Small | CAH / small | 0.97 | 1.0 | 1.5 | 1.4 | 1.5 | 0.85 |
| BMC | Community | 3.9 | 3.8 | 4.4 | 4.1 | 3.7 | 2.0 |
| CCMC | Community | 8.4 | 6.1 | 6.4 | 5.4 | 5.7 | 4.3 |
| PRN | Community | 2.5 | 2.8 | 4.3 | 4.6 | 4.5 | 2.8 |
| RMH | Community | 0.60 | 0.73 | 0.98 | 1.0 | 0.89 | 0.50 |
| UHC | Community | 4.8 | 6.1 | 9.1 | 8.9 | 8.8 | 5.7 |
| UTN | Community | 2.1 | 2.2 | 2.7 | 2.9 | 2.3 | 1.5 |
| WHL | Community | 8.0 | 9.2 | 9.4 | 6.3 | 5.1 | 3.7 |
time_limited_full <- matched_df %>%
filter(Date > ymd(params$ignore_before)) %>%
filter(!is.na(Inf_pred))
time_limited <- time_limited_full %>%
filter(Era != "During")
key_dates <- list(
before_start = min(pull(filter(time_limited, Era=="Before"), "Date")),
before_end = max(pull(filter(time_limited, Era=="Before"), "Date")),
before_len = length(unique(pull(filter(time_limited, Era=="Before"), "Date"))),
after_start = min(pull(filter(time_limited, Era=="After"), "Date")),
after_end = max(pull(filter(time_limited, Era=="After"), "Date")),
after_len = length(unique(pull(filter(time_limited, Era=="After"), "Date")))
)The before era spans from Sep 2022 to Jul 2023 (11 months), while the after era spans from Oct 2023 to Aug 2024 (11 months).
df_1 <- time_limited %>%
# group_by(Era) %>%
group_by(Era, hosp_type) %>%
summarise(Inf_pred = sum(Inf_pred, na.rm = T),
Inf_obs = sum(Inf_obs, na.rm = T),
CL_days = sum(CL_days, na.rm = T),
.groups = "drop") %>%
pivot_longer(cols = -c(Era, hosp_type)) %>%
pivot_wider(names_from = Era, values_from = value)
df_2 <- df_1 %>%
group_by(name) %>%
summarise(Before = sum(Before),
After = sum(After)) %>%
mutate(hosp_type = "Overall")
# bind_rows(df_2, df_1) %>%
# relocate(hosp_type) %>%
# group_by(name) %>%
# gt()
#
# bind_rows(df_2, df_1) %>%
# relocate(hosp_type) %>%
# pivot_wider(names_from = name, values_from = c(Before, After), names_glue = "{name}.{.value}", names_vary = "slowest") %>%
#
# relocate(CL_days.Before, CL_days.After, .after = Inf_pred.After) %>%
#
#
# gt(rowname_col = "hosp_type") %>%
# # fmt_number(columns = starts_with("Inf_pred"), n_sigfig = 3) %>%
# tab_header(
# title = md('Pre-Post')
# ) %>%
# opt_stylize(6, color = "gray")
rm(df_1, df_2)
time_limited %>%
filter(!is.na(Inf_obs)) %>%
# group_by(Era) %>%
group_by(Era, hosp_type) %>%
summarise(
# Inf_pred = qwraps2::median_iqr(Inf_pred, na_rm = T, show_n="never"),
# Inf_obs = qwraps2::median_iqr(Inf_obs, na_rm = T, show_n="never"),
# CL_days = qwraps2::median_iqr(CL_days, na_rm = T, show_n="never", digits=0),
Inf_pred = qwraps2::frmtci(qwraps2::mean_ci(Inf_pred, na_rm = T), format = "est (lcl - ucl)"),
Inf_obs = qwraps2::frmtci(qwraps2::mean_ci(Inf_obs, na_rm = T), format = "est (lcl - ucl)"),
CL_days = qwraps2::frmtci(qwraps2::mean_ci(CL_days, na_rm = T), format = "est (lcl - ucl)", digits = 1),
# n = as.character(n()),
.groups = "drop") %>%
pivot_longer(cols = -c(Era, hosp_type)) %>%
pivot_wider(names_from = Era, values_from = value) %>%
relocate(hosp_type) %>%
pivot_wider(names_from = name, values_from = c(Before, After), names_glue = "{name}.{.value}", names_vary = "slowest") %>%
relocate(Inf_pred.Before, Inf_pred.After, .after = Inf_obs.After) %>%
relocate(CL_days.Before, CL_days.After, .after = Inf_pred.After) %>%
gt(rowname_col = "hosp_type") %>%
# fmt_number(columns = starts_with("Inf_pred"), n_sigfig = 3) %>%
gt::tab_spanner_delim(delim = ".") %>%
tab_header(
title = md('Pre-Post')
) %>%
opt_stylize(6, color = "gray")| Pre-Post | ||||||
| Inf_obs | Inf_pred | CL_days | ||||
|---|---|---|---|---|---|---|
| Before | After | Before | After | Before | After | |
| Academic | 3.00 (2.01 - 3.99) | 1.82 (1.03 - 2.60) | 2.59 (2.47 - 2.71) | 2.72 (2.57 - 2.87) | 2,263.9 (2,169.9 - 2,358.0) | 2,352.4 (2,238.2 - 2,466.5) |
| CAH / small | 0.18 (-0.06 - 0.42) | 0.18 (-0.06 - 0.42) | 0.13 (0.11 - 0.14) | 0.11 (0.09 - 0.13) | 351.5 (310.7 - 392.4) | 291.2 (243.8 - 338.6) |
| Community | 0.42 (0.25 - 0.58) | 0.30 (0.16 - 0.43) | 0.37 (0.33 - 0.42) | 0.39 (0.34 - 0.44) | 434.9 (386.6 - 483.2) | 454.7 (401.1 - 508.4) |
To start, we will look at the raw (absolute) number of CLABSIs that occurred monthly by each hospital. The colored bars represent the actual number of infections that have occurred (the very small ones at the bottom of the Y-axis are zero infections), and the horizontal black solid line are the predicted number of infections as calculated by the NHSN. There is also a grey dashed vertical line that marks the implementation phase of the system wide CHG roll out.
time_limited_full %>%
ggplot(aes(x=Date, y=Inf_obs, fill=hosp_type)) +
geom_vline(xintercept = ymd(params$rollout_start),
alpha=0.5, linetype = "dashed",
color="grey30") +
geom_vline(xintercept = ymd(params$rollout_end),
alpha=0.5, linetype = "dashed",
color="grey30") +
geom_line(aes(y=Inf_pred)) +
geom_col(aes(color=hosp_type),alpha=0.7) +
facet_wrap("Hosp") +
guides(color="none") +
scale_x_date(date_labels = "%b\n%Y") +
labs(title="CLABSI: Observed vs Predicted",
subtitle="Black line solid is predicted number from NHSN",
y="Number of CLABSIs",
x="Month",
fill="Hospital\nType",
caption = "CAH = Critical Access Hospital") +
theme_bw()In a similar vein, the graph below shows the rates of CLABSIs per central line day.
time_limited_full %>%
mutate(Inf_rate = 10000 * Inf_obs / CL_days,
Pred_rate = 10000 * Inf_pred / CL_days) %>%
ggplot(aes(x=Date, y=Inf_rate, fill=hosp_type)) +
geom_vline(xintercept = ymd(params$rollout_start),
alpha=0.5, linetype = "dashed",
color="grey30") +
geom_vline(xintercept = ymd(params$rollout_end),
alpha=0.5, linetype = "dashed",
color="grey30") +
geom_line(aes(y=Pred_rate)) +
geom_col(aes(color=hosp_type),alpha=0.7) +
facet_wrap("Hosp") +
guides(color="none") +
scale_x_date(date_labels = "%b\n%Y") +
labs(title="CLABSI Rate Over Time",
subtitle="Black line solid is predicted rate from NHSN",
y="CLABSIs (per 10,000 CL days)",
x="Month",
fill="Hospital\nType",
caption = "CL = central line\nCAH = Critical Access Hospital") +
theme_bw()time_limited %>%
mutate(Q = str_glue("{year(Date)} Q{quarter(Date)}")) %>%
group_by(Hosp, hosp_type, Q) %>%
summarise(Inf_rate = 10000 * sum(Inf_obs, na.rm = T) / sum(CL_days, na.rm = T),
.groups="drop") %>%
# pivot_wider(names_from = c(hosp_type, Hosp),
# values_from = c(Inf_rate),
# names_glue = "{hosp_type}.{Hosp}") %>%
select(-hosp_type) %>%
pivot_wider(names_from = Hosp,
values_from = c(Inf_rate)) %>%
filter(Q != "2024 Q4") %>%
gt(rowname_col = "Q") %>%
fmt_number(columns = -Q) %>%
# tab_spanner_delim(delim = ".") %>%
data_color(
columns = -Q,
# domain = c(0,1000),
palette = "BuPu",
# pal_type = "continuous"
) %>%
tab_header(
title = md('CLABSI Rate (per 10,000 CL days) by Quarter'),
subtitle = " " ) %>%
opt_stylize(6, color="gray") | CLABSI Rate (per 10,000 CL days) by Quarter | |||||||||
| BMC | CCMC | PRN | RMH | Small | UHC | UTN | WHL | WVUH | |
|---|---|---|---|---|---|---|---|---|---|
| 2022 Q3 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 31.85 | 0.00 | 24.28 |
| 2022 Q4 | 0.00 | 0.00 | 17.65 | 0.00 | 0.00 | 4.34 | 36.73 | 0.00 | 5.51 |
| 2023 Q1 | 8.99 | 0.00 | 31.89 | 0.00 | 8.40 | 15.53 | 35.46 | 5.77 | 13.59 |
| 2023 Q2 | 0.00 | 0.00 | 11.50 | 0.00 | 0.00 | 4.30 | 15.50 | 12.58 | 13.40 |
| 2023 Q3 | 0.00 | 0.00 | 16.37 | 0.00 | 35.59 | 13.46 | 0.00 | 19.19 | 26.74 |
| 2023 Q4 | 0.00 | 0.00 | 24.83 | 0.00 | 9.64 | 4.08 | 0.00 | 6.56 | 8.60 |
| 2024 Q1 | 0.00 | 12.49 | 12.00 | 0.00 | 9.80 | 7.72 | 0.00 | 0.00 | 9.92 |
| 2024 Q2 | 0.00 | 0.00 | 21.26 | 0.00 | 0.00 | 0.00 | 10.58 | 11.52 | 6.94 |
| 2024 Q3 | 0.00 | 10.42 | 10.88 | 0.00 | 0.00 | 0.00 | 72.99 | 0.00 | 4.31 |
Next we can look at the rates of CLABSIs during the before and after CHG bathing was implemented. The table on the left looks at the standardized infection ratio (SIR) and the table on the right shows the rates of CLABSIs based on the number of central line days.
The standardized infection ratio (SIR) is a summary measure used to track HAIs at a national, state, or local level over time. The SIR adjusts for various facility and/or patient-level factors that contribute to HAI risk within each facility. The method of calculating an SIR is similar to the method used to calculate the Standardized Mortality Ratio (SMR), a summary statistic widely used in public health to analyze mortality data. In HAI data analysis, the SIR compares the actual number of HAIs reported to the number that would be predicted, given the standard population (i.e., NHSN baseline), adjusting for several risk factors that have been found to be significantly associated with differences in infection incidence.
– NHSN documentation, page 4
The formula for SIR is:
\[ SIR = \frac{Observed\ HAIs}{Predicted\ HAIs} \]
The NHSN prefers to use SIRs (instead of rates) because the SIR adjusted for differences in risk among populations (using the predicted number of infections), as explained on page 4 of their documentation. We will capitalize on the NHSN’s calculation of predicted infections later in our analysis, as their calculation of predicted infections accounts for numerous confounding variables
time_limited %>%
group_by(Era, Hosp) %>%
summarise(Inf_pred = sum(Inf_pred, na.rm = T),
Inf_obs = sum(Inf_obs, na.rm = T),
CL_days = sum(CL_days, na.rm = T),
.groups = "drop") %>%
mutate(SIR = Inf_obs/Inf_pred) %>%
select(Hosp, Era, SIR) %>%
pivot_wider(names_from = Era,
values_from = SIR) %>%
gt(rowname_col = "Hosp") %>%
# tab_stub_indent(everything(), 5) %>%
fmt_missing() %>%
# fmt_number(columns = starts_with("20"), n_sigfig = 2) %>%
fmt_number() %>%
gtExtras::gt_hulk_col_numeric(
reverse = T
# columns= Before
) %>%
tab_header(
title = md('CLABSI SIR in the Pre & Post CHG Era'),
subtitle = "SIR = standardized infection ratio" )| CLABSI SIR in the Pre & Post CHG Era | ||
| SIR = standardized infection ratio | ||
| Before | After | |
|---|---|---|
| BMC | 0.30 | 0.00 |
| CCMC | 0.00 | 0.52 |
| PRN | 2.53 | 2.57 |
| RMH | 0.00 | 0.00 |
| Small | 1.43 | 1.65 |
| UHC | 0.88 | 0.38 |
| UTN | 4.12 | 1.38 |
| WHL | 0.80 | 0.61 |
| WVUH | 1.16 | 0.67 |
time_limited %>%
group_by(Era, Hosp) %>%
summarise(Inf_pred = sum(Inf_pred, na.rm = T),
Inf_obs = sum(Inf_obs, na.rm = T),
CL_days = sum(CL_days, na.rm = T),
.groups = "drop") %>%
mutate(x = 10000 * Inf_obs/CL_days) %>%
select(Hosp, Era, x) %>%
pivot_wider(names_from = Era,
values_from = x) %>%
gt(rowname_col = "Hosp") %>%
# tab_stub_indent(everything(), 5) %>%
fmt_missing() %>%
# fmt_number(columns = starts_with("20"), n_sigfig = 2) %>%
fmt_number(decimals = 1, ) %>%
tab_header(
title = md('CLABSI rate in the Pre & Post CHG Era'),
subtitle = "Per 10,000 central line days" ) %>%
opt_stylize(6, color="gray") | CLABSI rate in the Pre & Post CHG Era | ||
| Per 10,000 central line days | ||
| Before | After | |
|---|---|---|
| BMC | 2.7 | 0.0 |
| CCMC | 0.0 | 5.4 |
| PRN | 17.7 | 17.8 |
| RMH | 0.0 | 0.0 |
| Small | 5.2 | 6.2 |
| UHC | 8.0 | 3.4 |
| UTN | 28.6 | 9.4 |
| WHL | 6.9 | 4.9 |
| WVUH | 13.3 | 7.7 |
This shows the CHG compliance for each hospital (for the period that I have the data available). Each bar is colored by the “denominator”, meaning the number of patients that are eligible for CHG bathing
time_limited %>%
filter(!missing_CHG) %>%
mutate(CHG = Numerator / Denominator) %>%
ggplot(aes(x=Date, y=CHG, fill=Denominator)) +
geom_col(color='black') +
facet_wrap("Hosp") +
# scale_fill_continuous(trans = "log10") +
# scale_fill_binned(trans = "log10") +
scale_fill_fermenter(palette="GnBu", trans = "log10") +
scale_y_continuous(labels = scales::percent, limits = c(0,1)) +
scale_x_date(date_labels = "%b\n%Y") +
guides(color="none") +
labs(title="CHG compliance",
y="CHG compliance (%)",
x="Month",
fill="CHG\nDenominator",
caption = "CAH = Critical Access Hospital") +
theme_bw()In progress
I’m struggling with how to analyze this best, so I might revise it later…
time_limited %>%
filter(!missing_CHG) %>%
mutate(CHG = Numerator / Denominator) %>%
ggplot(aes(x=Inf_obs, y=CHG)) +
geom_boxplot(aes(group=Inf_obs)) +
geom_jitter(alpha=.25)In this section, we will use a negative binomial regression to examine any influence that CHG bathing has on CLABSIs. This is the same type of model that the NHSN uses to predict the number of expected infections.
Unless otherwise specified, our outcome of interest will be the monthly number of CLABSIs observed for each hospital. Notably, this is the absolute number, not a rate. To account for confounding variables we will include the number of predicted infections (from the NHSN data).
m1 <- MASS::glm.nb(Inf_obs ~ Inf_pred + Era, data = time_limited)
summary(m1)
# DHARMa::testDispersion(m1)
# glance(m1)
# unique(select(time_limited, Date, Era)) %>% left_join(NHSN_clean)The first model is a simple negative binomial model, using the formula below
Inf_obs ~ Inf_pred + Era
The results of this model is shown in the table below:
broom::tidy(m1, conf.int = T, exponentiate=T) %>%
relocate(conf.low, conf.high, .before = p.value) %>%
select(-c(std.error, statistic)) %>%
gt() %>%
cols_merge_range(col_begin = conf.low, col_end = conf.high, sep = " - ") %>%
# tab_stub_indent(everything()) %>%
cols_label(estimate = md("Incident Rate Ratio (IRR)"),
term = "Term",
conf.low = "95% CI",
p.value = "P value") %>%
fmt_number(columns = c(estimate, conf.low, conf.high, p.value), n_sigfig = 3) %>%
fmt_scientific(columns = p.value,
rows = p.value < 0.001) %>%
fmt_scientific(columns = conf.low,
rows = conf.low < 0.001 | conf.low > 1000) %>%
fmt_scientific(columns = conf.high,
rows = conf.high < 0.001 | conf.high > 1000) %>%
tab_header(
title = 'Negative binomial model'
) %>%
opt_stylize(5)| Negative binomial model | |||
| Term | Incident Rate Ratio (IRR) | 95% CI | P value |
|---|---|---|---|
| (Intercept) | 0.301 | 0.209 - 0.423 | 1.87 × 10−11 |
| Inf_pred | 2.35 | 1.95 - 2.84 | 1.60 × 10−19 |
| EraAfter | 0.645 | 0.418 - 0.985 | 0.0441 |
This shows that compared to the before era, the “after” era (post CHG implementation) had a 35% reduction in CLABSIs (1 - 0.645) with a P-value of 0.044. It also shows that the predicted infections are significantly associated with increased observed infections (which is to be expected).
The same data is shown graphically below:
sjPlot::plot_model(m1)library(glmmTMB)
m2 <- glmmTMB(Inf_obs ~ Inf_pred + Era + (1|Hosp) ,
data = time_limited,
family = "nbinom1")
summary(m2)
# broom.mixed::glance(m2)
# DHARMa::testDispersion(m2)In Model 1, we didn’t do anything to account for the various hospitals. There are arguments for and against adjusting for hospital-specific effects, outlined below
I think it’s worthwhile to at least include this model (even if it’s a sensitivity analysis), which is the same as Model 1 but now with random effects for each hospital. The model formula I use is below:
Inf_obs ~ Inf_pred + Era + (1 | Hosp)
The (1|Hosp) term means each hospital has a random intercept, capturing the variation in infection rates across hospitals. This helps to account for unobserved hospital-specific factors that may influence the CLABSI rates, but still assumes that the effect of CHG implementation is similar across hospitals
broom.mixed::tidy(m2, conf.int = T, exponentiate=T) %>%
filter(effect=="fixed") %>%
select(term, estimate, conf.low, conf.high, p.value) %>%
gt() %>%
cols_merge_range(col_begin = conf.low, col_end = conf.high, sep = " - ") %>%
# tab_stub_indent(everything()) %>%
cols_label(estimate = md("Incident Rate Ratio (IRR)"),
term = "Term",
conf.low = "95% CI",
p.value = "P value") %>%
fmt_number(columns = c(estimate, conf.low, conf.high, p.value), n_sigfig = 3) %>%
fmt_scientific(columns = p.value,
rows = p.value < 0.001) %>%
fmt_scientific(columns = conf.low,
rows = conf.low < 0.001 | conf.low > 1000) %>%
fmt_scientific(columns = conf.high,
rows = conf.high < 0.001 | conf.high > 1000) %>%
tab_header(
title = md('**Random effects** negative binomial model'),
subtitle = "Random effects for each hospital"
) %>%
opt_stylize(5)| Random effects negative binomial model | |||
| Random effects for each hospital | |||
| Term | Incident Rate Ratio (IRR) | 95% CI | P value |
|---|---|---|---|
| (Intercept) | 0.273 | 0.115 - 0.646 | 0.00316 |
| Inf_pred | 1.57 | 0.615 - 4.02 | 0.345 |
| EraAfter | 0.650 | 0.435 - 0.969 | 0.0347 |
What is interesting here is that the p-value for Inf_pred is now 0.345
sjPlot::plot_model(m2)# KEY
# P: Time since intervention
# Time: Time
tempdf <- time_limited %>%
filter(Era == "After") %>%
select(Date) %>%
unique() %>%
arrange(Date) %>%
mutate(P = row_number())
tempdf2 <- time_limited %>%
# filter(Era == "After") %>%
select(Date) %>%
unique() %>%
mutate(Time = row_number()) %>%
left_join(tempdf, by="Date") %>%
mutate(P = if_else(is.na(P), 0, P))
df_TS <- time_limited %>%
left_join(tempdf2, by="Date")
rm(tempdf, tempdf2)
m3 <- MASS::glm.nb(Inf_obs ~ Inf_pred + Era + P + Time, data = df_TS)
summary(m3)
glance(m3)
# DHARMa::testDispersion(m3)Another option is to do an interrupted time series. I’m not as familiar with these, but I do like the output graphs that can create since you can show the counter factual. The formula for this model is:
Inf_obs ~ Inf_pred + Era + P + Time
In this formula
Time = The number of months that have passed since the start of the oberseravtion period
Era = Indicates before / after system wide CHG was implemented
P = The number of months that have passed since CHG implementation occurred. This helps to show if the internvention has a sustained effect
broom::tidy(m3, conf.int = T, exponentiate=T) %>%
relocate(conf.low, conf.high, .before = p.value) %>%
select(-c(std.error, statistic)) %>%
gt() %>%
cols_merge_range(col_begin = conf.low, col_end = conf.high, sep = " - ") %>%
# tab_stub_indent(everything()) %>%
cols_label(estimate = md("Incident Rate Ratio (IRR)"),
term = "Term",
conf.low = "95% CI",
p.value = "P value") %>%
fmt_number(columns = c(estimate, conf.low, conf.high, p.value), n_sigfig = 3) %>%
fmt_scientific(columns = p.value,
rows = p.value < 0.001) %>%
fmt_scientific(columns = conf.low,
rows = conf.low < 0.001 | conf.low > 1000) %>%
fmt_scientific(columns = conf.high,
rows = conf.high < 0.001 | conf.high > 1000) %>%
tab_header(
title = md('Negative binomial: **Interrupted Time Series**'),
subtitle = md("Time = Month; P = Month since implmentation")
) %>%
opt_stylize(5)| Negative binomial: Interrupted Time Series | |||
| Time = Month; P = Month since implmentation | |||
| Term | Incident Rate Ratio (IRR) | 95% CI | P value |
|---|---|---|---|
| (Intercept) | 0.234 | 0.120 - 0.434 | 1.17 × 10−5 |
| Inf_pred | 2.34 | 1.95 - 2.82 | 3.46 × 10−20 |
| EraAfter | 0.617 | 0.265 - 1.40 | 0.253 |
| P | 0.935 | 0.816 - 1.07 | 0.324 |
| Time | 1.04 | 0.957 - 1.14 | 0.352 |
This has a similar magnitude for Era as the other models, but now Era is no longer signifigant.
Here we need to use CL_days as the predictor (not Inf_pred) because the predicted infections from NHSN already incorporate the hospital type in their formula for predicting infections
m4 <- MASS::glm.nb(Inf_obs ~ CL_days + Era + hosp_type, data = time_limited)
summary(m4)
Call:
MASS::glm.nb(formula = Inf_obs ~ CL_days + Era + hosp_type, data = time_limited,
init.theta = 5.42583656, link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4327773 1.2219447 -1.173 0.2410
CL_days 0.0010918 0.0005249 2.080 0.0375 *
EraAfter -0.4409802 0.2153672 -2.048 0.0406 *
hosp_typeCAH / small -0.4348348 1.1719261 -0.371 0.7106
hosp_typeCommunity 0.0805961 0.9834201 0.082 0.9347
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(5.4258) family taken to be 1)
Null deviance: 258.07 on 194 degrees of freedom
Residual deviance: 169.93 on 190 degrees of freedom
AIC: 347.6
Number of Fisher Scoring iterations: 1
Theta: 5.43
Std. Err.: 5.62
2 x log-likelihood: -335.605
# DHARMa::testDispersion(m1)
# sjPlot::plot_model(m4)
broom::tidy(m4, conf.int = T, exponentiate=T) %>%
relocate(conf.low, conf.high, .before = p.value) %>%
select(-c(std.error, statistic)) %>%
knitr::kable()| term | estimate | conf.low | conf.high | p.value |
|---|---|---|---|---|
| (Intercept) | 0.2386452 | 0.0213414 | 2.6280704 | 0.2409809 |
| CL_days | 1.0010924 | 1.0000499 | 1.0021393 | 0.0375328 |
| EraAfter | 0.6434054 | 0.4189051 | 0.9786849 | 0.0406018 |
| hosp_typeCAH / small | 0.6473716 | 0.0626463 | 6.3539743 | 0.7106056 |
| hosp_typeCommunity | 1.0839330 | 0.1552118 | 7.3574545 | 0.9346826 |
m5 <- MASS::glm.nb(Inf_obs ~ CL_days + Era * hosp_type, data = time_limited)
summary(m5)
Call:
MASS::glm.nb(formula = Inf_obs ~ CL_days + Era * hosp_type, data = time_limited,
init.theta = 5.46940686, link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4607682 1.2221900 -1.195 0.2320
CL_days 0.0011359 0.0005275 2.153 0.0313 *
EraAfter -0.6156412 0.3397506 -1.812 0.0700 .
hosp_typeCAH / small -0.6456433 1.2622453 -0.512 0.6090
hosp_typeCommunity 0.0532167 0.9891992 0.054 0.9571
EraAfter:hosp_typeCAH / small 0.6826420 1.0737819 0.636 0.5249
EraAfter:hosp_typeCommunity 0.2574708 0.4427173 0.582 0.5609
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(5.4694) family taken to be 1)
Null deviance: 258.28 on 194 degrees of freedom
Residual deviance: 169.46 on 188 degrees of freedom
AIC: 351.01
Number of Fisher Scoring iterations: 1
Theta: 5.47
Std. Err.: 5.65
2 x log-likelihood: -335.013
broom::tidy(m5, conf.int = T, exponentiate=T) %>%
relocate(conf.low, conf.high, .before = p.value) %>%
select(-c(std.error, statistic)) %>%
knitr::kable()| term | estimate | conf.low | conf.high | p.value |
|---|---|---|---|---|
| (Intercept) | 0.2320579 | 0.0207388 | 2.556486 | 0.2320067 |
| CL_days | 1.0011365 | 1.0000878 | 1.002190 | 0.0312971 |
| EraAfter | 0.5402944 | 0.2726206 | 1.050491 | 0.0699802 |
| hosp_typeCAH / small | 0.5243251 | 0.0381462 | 5.816207 | 0.6089983 |
| hosp_typeCommunity | 1.0546582 | 0.1497819 | 7.227973 | 0.9570963 |
| EraAfter:hosp_typeCAH / small | 1.9790997 | 0.2107256 | 18.614941 | 0.5249484 |
| EraAfter:hosp_typeCommunity | 1.2936540 | 0.5407193 | 3.107744 | 0.5608568 |
library(interactions)
cat_plot(m5, pred = hosp_type, modx = Era, interval = TRUE, data=time_limited)interact_plot(m5, pred = CL_days, modx = Era, plot.points = TRUE, interval = TRUE)# car::vif(m4)m_stratified <- time_limited %>%
split(time_limited$hosp_type) %>%
# group_by(hosp_type) %>%
# nest() %>%
map(~MASS::glm.nb(Inf_obs ~ Inf_pred + Era, data = .x) )
bind_rows(
mutate(broom::tidy(m_stratified$Academic, conf.int = T, exponentiate=T), strata = "Academic"),
mutate(broom::tidy(m_stratified$Community, conf.int = T, exponentiate=T), strata = "Community"),
mutate(broom::tidy(m_stratified$`CAH / small`, conf.int = T, exponentiate=T), strata = "CAH / small")
) %>%
relocate(strata) %>%
relocate(conf.low, conf.high, .before = p.value) %>%
group_by(strata) %>%
select(-c(std.error, statistic)) %>%
gt() %>%
cols_merge_range(col_begin = conf.low, col_end = conf.high, sep = " - ") %>%
# tab_stub_indent(everything()) %>%
cols_label(estimate = md("Incident Rate Ratio (IRR)"),
term = "Term",
conf.low = "95% CI",
p.value = "P value") %>%
fmt_number(columns = c(estimate, conf.low, conf.high), n_sigfig = 3) %>%
fmt_scientific(columns = conf.low,
rows = conf.low < 0.001 | conf.low > 1000) %>%
fmt_scientific(columns = conf.high,
rows = conf.high < 0.001 | conf.high > 1000) %>%
tab_header(
title = 'Regression, stratified by hopsital type'
) %>%
opt_stylize(5) %>%
tab_style(
style = list(
cell_fill(color = "lightyellow") # Set the background color
),
locations = cells_row_groups() # Target the groupname_col
)| Regression, stratified by hopsital type | |||
| Term | Incident Rate Ratio (IRR) | 95% CI | P value |
|---|---|---|---|
| Academic | |||
| (Intercept) | 45.8 | 1.81 - 1.17 × 103 | 0.0201923743 |
| Inf_pred | 0.346 | 0.0964 - 1.21 | 0.0986246727 |
| EraAfter | 0.688 | 0.382 - 1.21 | 0.2004854147 |
| Community | |||
| (Intercept) | 0.253 | 0.120 - 0.511 | 0.0001454131 |
| Inf_pred | 3.44 | 0.807 - 14.8 | 0.0873523253 |
| EraAfter | 0.707 | 0.376 - 1.31 | 0.2722252261 |
| CAH / small | |||
| (Intercept) | 0.0799 | 1.81 × 10−4 - 10.7 | 0.3616403333 |
| Inf_pred | 601 | 7.23 × 10−15 - 6.00 × 1020 | 0.7555758257 |
| EraAfter | 1.11 | 0.122 - 10.1 | 0.9205651567 |
glmmTMB(Inf_obs ~ Inf_pred + Era + (1|Hosp) ,
data = time_limited,
family = "poisson") %>%
summary()
calc_dispersion <- function(mod, verbose=F){
# Identify the model family
fam_raw <- mod$family$family
if(str_detect(fam_raw, "poisson")) family <- "poisson"
if(str_detect(fam_raw, "Negative Binomial")) family <- "NB"
E2 <- resid(mod, type = "pearson")
N <- nrow(mod$model)
p <- length(coef(mod))
if(family=="NB") p <- p + 1 # '+1' is for variance parameter in NB
result <- sum(E2^2) / (N - p)
if(verbose) {
if(result>1) message("Overdisperion")
if(result<1) message("Underdisperion")
}
return(result)
}
calc_dispersion(m1)
DHARMa::testDispersion(m2)For example, March 31st (2024) was on a Sunday, so the week starting on 3/31/24 mostly reflects April 2024↩︎
For example, the CHG dashboard reports compliance at (1) WVU Medicine Children’s Hospital and (2) JW Ruby Memorial Hospital, although CMS considers these to be the same facility (they share the same CMS ID, 510001). To complicate things further, it appears that for NHSN purposes Fairmont is also considered to fall under Ruby↩︎