knitr::opts_chunk$set(warning=FALSE)
I have prepared two data sets from American Hospital Annual Survey, years 2017-2019:
################################################################################
#---------------------------Prepare Data----------------------------------------
################################################################################
aha2017_2019 <- readRDS(file = "C:/Users/jkempke/OneDrive - Emory University/AHA Critical Care Areas/Data/prepared_data/aha2017_2019.county.Rds" )
aha2019 <- readRDS(file = "C:/Users/jkempke/OneDrive - Emory University/AHA Critical Care Areas/Data/prepared_data/aha2019.county.Rds" )
aha2017_2019 <- aha2017_2019%>%
rename(
intensivists.3year = mean.adult.fte,
icu.beds.3year = mean.adult.icu.beds,
missing.3year = percent.missing)
aha2019 <- aha2019 %>%
rename(missing.2019 = percent.missing,
intensivists.2019 = adult.fte,
icu.beds.2019 = adult.icu.beds)
aha.comparison <- left_join(aha2017_2019, aha2019, by = 'GEOID')
Here I examine the data subset of counties in 2019 icu.beds data missing to examine what is potentially gained by using the 3-year aggregated dataset.
Below is the number of observations followed by the selected printout
missing.beds <- aha.comparison%>%
filter(is.na(icu.beds.2019)==T)%>%
select(GEOID, intensivists.3year, icu.beds.3year, intensivists.2019)
nrow(missing.beds)
## [1] 1
print(missing.beds)
## # A tibble: 1 x 4
## GEOID intensivists.3year icu.beds.3year intensivists.2019
## <chr> <dbl> <dbl> <int>
## 1 72097 0 0 NA
Here I examine differences in the percent of hospitals per county with missing data, categorized by non-response to survey in two representations of the counties - 2019 only vs. aggregate of 2017-2019. The latter allows us to account for hospitals that may have not responded one of the years but are still operating.
summary(aha.comparison$missing.2019)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 0.00 0.00 26.31 50.00 100.00 1
ggplot(data=aha.comparison, aes(x=missing.2019))+
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
summary(aha.comparison$missing.3year)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 15.43 0.00 100.00
ggplot(data=aha.comparison, aes(x=missing.3year))+
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Below is the number of observations between the two datasets where percetn missingness per county is different, following a summary of the difference between the 1- and 3-year percent missing values, then a visual representation. Here, a negative number means there is a lower percentage of hospitals with non-response in a county in the 3-year aggregated representation of that county compared with the 2019 version.
missing <- aha.comparison%>%
filter(missing.3year!=missing.2019)%>%
mutate(intensivist.difference = intensivists.3year-intensivists.2019,
icu.bed.difference = icu.beds.3year - icu.beds.2019,
missing.difference = missing.3year - missing.2019)
nrow(missing)
## [1] 401
summary(missing$missing.difference)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -100.00 -100.00 -67.00 -67.35 -34.00 50.00
ggplot(data=missing, aes(
x=missing.2019,
y = missing.3year))+
geom_point()+
geom_abline(intercept = 0, slope = 1)
Below is the number of observations, following a summary of the difference between the 1- and 3-year percent missing values, then a visual representation. Again, a county with a negative number indicates that there are fewer numbers of intensivists in the county in the 3-year aggregated representation of the county compared to the 2019 value.
intensivist.comparison <- aha.comparison%>%
filter(intensivists.3year!=intensivists.2019)%>%
mutate(intensivist.difference = intensivists.3year-intensivists.2019,
icu.bed.difference = icu.beds.3year - icu.beds.2019,
missing.difference = missing.3year - missing.2019)
nrow(intensivist.comparison)
## [1] 546
summary(intensivist.comparison$intensivist.difference)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -69.000 -3.000 -1.000 -0.152 3.000 74.000
ggplot(data=intensivist.comparison, aes(
x=intensivists.2019,
y = intensivists.3year))+
geom_point()+
geom_abline(intercept = 0, slope = 1)
Below is the number of observatsions, following a summary of the difference between the 1- and 3-year percent missing values, then a visual representation. Again, a county with a negative number indicates that there are fewer numbers of icu beds in the county in the 3-year aggregated representation of the county compared to the 2019 value.
icu.bed.comparison <- aha.comparison%>%
filter(icu.beds.3year!=icu.beds.2019)%>%
mutate(intensivist.difference = intensivists.3year-intensivists.2019,
icu.bed.difference = icu.beds.3year - icu.beds.2019,
missing.difference = missing.3year - missing.2019)
nrow(icu.bed.comparison)
## [1] 590
summary(icu.bed.comparison$icu.bed.difference)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -89.000 -1.000 3.000 8.253 9.000 214.000
ggplot(data=icu.bed.comparison, aes(
x=icu.beds.2019,
y = icu.beds.3year))+
geom_point()+
geom_abline(intercept = 0, slope = 1)