knitr::opts_chunk$set(warning=FALSE)

Load Data

I have prepared two data sets from American Hospital Annual Survey, years 2017-2019:

One includes 2019 data only.
- In this set, values of different subtypes of ICU beds and ICU physicians are summed up within each hospital. This creates zero values for sums that have hospitals with subtype data missing. I then reset summed values to missing if the hospital is indicated as not responding to the survey (by the RESP variable) in order to approximate true zeroes and true missing information. Then ICU beds and MDs are summed up for each county. The percent of hospital’s with non-response to survey for each year is indicated.
Other includes 2017-2019 summarized data.
- Here, for each hospital for each year I sum up values of different subtypes of ICU beds and ICU physicians are summed up within each hospital just as above, setting values to missing by the hospital’s survey response indicator variable. Then for each hospital I average summed values over the 3 years (alternative are to take the min or max of the set). I also create a variable for the number of times a hospital responded to the survey over the 3 year span. I then sum up the averaged hospital values by county and indicate the percent of hospitals with no survey responses over the 3 years. Hospitals that closed or merged with others over the timespan are removed from this set. This allows us to know that the hospitals included in averaging out values over the yers are actually still open and may just not have responded to a survey in a given year.

################################################################################
#---------------------------Prepare Data----------------------------------------
################################################################################
aha2017_2019 <- readRDS(file = "C:/Users/jkempke/OneDrive - Emory University/AHA Critical Care Areas/Data/prepared_data/aha2017_2019.county.Rds" )
aha2019 <- readRDS(file = "C:/Users/jkempke/OneDrive - Emory University/AHA Critical Care Areas/Data/prepared_data/aha2019.county.Rds" )

aha2017_2019 <- aha2017_2019%>%
  rename(
    intensivists.3year = mean.adult.fte,
    icu.beds.3year = mean.adult.icu.beds,
    missing.3year = percent.missing)

aha2019 <- aha2019 %>%
  rename(missing.2019 = percent.missing,
         intensivists.2019 = adult.fte,
         icu.beds.2019 = adult.icu.beds)

aha.comparison <- left_join(aha2017_2019, aha2019, by = 'GEOID')

Sample Size Comparison:
- The 2019 data includes 2475 observations.
- The 2017-2019 summarized data includes 2476

Examine those with Missing Beds in 2019

Here I examine the data subset of counties in 2019 icu.beds data missing to examine what is potentially gained by using the 3-year aggregated dataset.

Below is the number of observations followed by the selected printout

missing.beds <- aha.comparison%>%
  filter(is.na(icu.beds.2019)==T)%>%
  select(GEOID, intensivists.3year, icu.beds.3year, intensivists.2019)

nrow(missing.beds)

## [1] 1

print(missing.beds)

## # A tibble: 1 x 4
##   GEOID intensivists.3year icu.beds.3year intensivists.2019
##   <chr>              <dbl>          <dbl>             <int>
## 1 72097                  0              0                NA

Compare on Percent Missingness by County

Here I examine differences in the percent of hospitals per county with missing data, categorized by non-response to survey in two representations of the counties - 2019 only vs. aggregate of 2017-2019. The latter allows us to account for hospitals that may have not responded one of the years but are still operating.

Distribution of Percent Missingness per county 2019

summary(aha.comparison$missing.2019)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    0.00    0.00   26.31   50.00  100.00       1

ggplot(data=aha.comparison, aes(x=missing.2019))+
  geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Distribution of Percent Missingness per county 2017-2019

summary(aha.comparison$missing.3year)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00   15.43    0.00  100.00

ggplot(data=aha.comparison, aes(x=missing.3year))+
  geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Below is the number of observations between the two datasets where percetn missingness per county is different, following a summary of the difference between the 1- and 3-year percent missing values, then a visual representation. Here, a negative number means there is a lower percentage of hospitals with non-response in a county in the 3-year aggregated representation of that county compared with the 2019 version.

missing <- aha.comparison%>%
  filter(missing.3year!=missing.2019)%>%
  mutate(intensivist.difference = intensivists.3year-intensivists.2019,
         icu.bed.difference = icu.beds.3year - icu.beds.2019, 
         missing.difference = missing.3year - missing.2019)

nrow(missing)

## [1] 401

summary(missing$missing.difference)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -100.00 -100.00  -67.00  -67.35  -34.00   50.00

ggplot(data=missing, aes(
  x=missing.2019,
  y = missing.3year))+
  geom_point()+
  geom_abline(intercept = 0, slope = 1)

Compare on Counties where sums of instensivists differ

Below is the number of observations, following a summary of the difference between the 1- and 3-year percent missing values, then a visual representation. Again, a county with a negative number indicates that there are fewer numbers of intensivists in the county in the 3-year aggregated representation of the county compared to the 2019 value.

intensivist.comparison <- aha.comparison%>%
  filter(intensivists.3year!=intensivists.2019)%>%
  mutate(intensivist.difference = intensivists.3year-intensivists.2019,
         icu.bed.difference = icu.beds.3year - icu.beds.2019, 
         missing.difference = missing.3year - missing.2019)

nrow(intensivist.comparison)

## [1] 546

summary(intensivist.comparison$intensivist.difference)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -69.000  -3.000  -1.000  -0.152   3.000  74.000

ggplot(data=intensivist.comparison, aes(
  x=intensivists.2019,
  y = intensivists.3year))+
  geom_point()+
  geom_abline(intercept = 0, slope = 1)

Compare on counties where sums of ICU beds differ

Below is the number of observatsions, following a summary of the difference between the 1- and 3-year percent missing values, then a visual representation. Again, a county with a negative number indicates that there are fewer numbers of icu beds in the county in the 3-year aggregated representation of the county compared to the 2019 value.

icu.bed.comparison <- aha.comparison%>%
  filter(icu.beds.3year!=icu.beds.2019)%>%
  mutate(intensivist.difference = intensivists.3year-intensivists.2019,
         icu.bed.difference = icu.beds.3year - icu.beds.2019, 
         missing.difference = missing.3year - missing.2019)

nrow(icu.bed.comparison)

## [1] 590

summary(icu.bed.comparison$icu.bed.difference)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -89.000  -1.000   3.000   8.253   9.000 214.000

ggplot(data=icu.bed.comparison, aes(
  x=icu.beds.2019,
  y = icu.beds.3year))+
  geom_point()+
  geom_abline(intercept = 0, slope = 1)

AHA Critical Care Data Validation: Comparison of 1 vs 3-year

Jordan Kempker

2021-06-09