The article from April discusses the rise of Covid-19 and high risk counties and available ICU beds to understand the potential threat. It highlights a county called Hilton Head, South Carolina that has 3900 high risk individuals with 3 hospitals and 28 ICU beds.
The file, "mmsa-icu-beds.csv combines data from the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS), a collection of health-related surveys conducted each year of more than 400,000 Americans, and the Kaiser Family Foundation to show the number of people who are at high risk of becoming seriously ill from COVID-19 per ICU bed in each metropolitan area, micropolitan area or metropolitan division for which we have data.
Being high risk is defined by a number of health conditions and behaviors. Based on the CDC’s list of the relevant underlying conditions that put people at higher risk of serious illness from COVID-19, plus the advice of experts from the Cleveland Clinic, the American Lung Association and the American Heart Association, we counted people as at risk if they’re 65 or older; if they have ever been told they have hypertension, coronary heart disease, a myocardial infarction, angina, a stroke, chronic kidney disease, chronic obstructive pulmonary disease, emphysema, chronic bronchitis or diabetes; if they currently have asthma or a BMI over 40; if they smoke cigarettes every day or some days or use e-cigarettes or vaping products every day or some days; or if they’re currently pregnant. We included every individual who meets at least one of these conditions but counted them only once each, so anyone with multiple conditions doesn’t get counted multiple times. We were not able to include a number of conditions for which we did not have location-based data from the BRFSS, such as liver disease, having smoked, vaped or dabbed marijuana in the last 30 days, and getting cancer treatment or being on immunosuppression medications.-“fivethirtyeight”(2020, April 22)covid-geography https://github.com/fivethirtyeight/data/tree/master/covid-geography
##Loading Libraries
First, I downloaded libraries that will help me load, manipulate, and visualize the data file located in GitHub.
library(readr)
library(stringr)
library(ggplot2)
theUrl <- 'https://raw.githubusercontent.com/fivethirtyeight/data/4a989f20bc0a44ed990cb9ca0be20d06b095145c/covid-geography/mmsa-icu-beds.csv'
dfCovidOriginal <- read_delim(file=theUrl, delim=',')
## Parsed with column specification:
## cols(
## MMSA = col_character(),
## total_percent_at_risk = col_character(),
## high_risk_per_ICU_bed = col_double(),
## high_risk_per_hospital = col_double(),
## icu_beds = col_double(),
## hospitals = col_double(),
## total_at_risk = col_double()
## )
Once the data is properly loaded, I familiarized myself with the data set. Then, I plotted two significant variables from the data set to see a relationship, which were icu_beds and total_at_risk.
head(dfCovidOriginal)
## # A tibble: 6 x 7
## MMSA total_percent_a~ high_risk_per_I~ high_risk_per_h~ icu_beds hospitals
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 San ~ 52.88% NA NA NA NA
## 2 Manh~ 47.29% 4490. 8980. 8 4
## 3 Hilt~ 62.72% 3904. 36439. 28 3
## 4 Kahu~ 59.13% 3861. 19303. 20 4
## 5 Spar~ 66.12% 3786. 85188. 45 2
## 6 Bato~ 66.60% 3460. 39001. 124 11
## # ... with 1 more variable: total_at_risk <dbl>
summary(dfCovidOriginal)
## MMSA total_percent_at_risk high_risk_per_ICU_bed
## Length:136 Length:136 Min. : 413.7
## Class :character Class :character 1st Qu.:1487.8
## Mode :character Mode :character Median :1923.7
## Mean :1947.4
## 3rd Qu.:2321.1
## Max. :4489.8
## NA's :1
## high_risk_per_hospital icu_beds hospitals total_at_risk
## Min. : 6770 Min. : 8.0 Min. : 1.00 Min. : 17942
## 1st Qu.:31005 1st Qu.: 89.5 1st Qu.: 5.00 1st Qu.: 158748
## Median :42460 Median : 221.0 Median : 9.00 Median : 396082
## Mean :43787 Mean : 360.2 Mean : 13.76 Mean : 667189
## 3rd Qu.:56583 3rd Qu.: 489.5 3rd Qu.: 18.00 3rd Qu.: 932793
## Max. :91771 Max. :2777.0 Max. :100.00 Max. :6165102
## NA's :1 NA's :1 NA's :1
ggplot(data=dfCovidOriginal) +
geom_point(mapping=aes(x=icu_beds,y=total_at_risk)) +
geom_smooth(mapping=aes(x=icu_beds,y=total_at_risk))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
The article made a great point at looking into counties that were at high risk but I wanted to take a more generalized approach and look at the state’s aggregate total at risk and highest risk per ICU bed. In order to take that approach, I need to split the column where the county and state were stored together.
location <- str_split(string=dfCovidOriginal$MMSA, pattern=', ')
mLocation <- data.frame(Reduce(rbind, location))
head(mLocation)
## X1 X2
## init San Juan-Carolina-Caguas PR
## X Manhattan KS
## X.1 Hilton Head Island-Bluffton-Beaufort SC
## X.2 Kahului-Wailuku-Lahaina HI
## X.3 Spartanburg SC
## X.4 Baton Rouge LA
names(mLocation) <- c('County','State')
row.names(mLocation) <- NULL
dfCovid <- cbind(mLocation,dfCovidOriginal)
dfCovid$total_percent_at_risk <- parse_number(dfCovid$total_percent_at_risk)
dfCovid$total_percent_at_risk
## [1] 52.88 47.29 62.72 59.13 66.12 66.60 57.72 68.32 67.11 55.96 52.17 60.33
## [13] 60.60 48.90 60.63 53.80 57.99 52.69 55.37 58.12 56.62 57.99 56.85 50.99
## [25] 58.26 61.40 55.82 61.67 60.48 48.01 73.79 50.03 51.35 59.69 68.10 60.83
## [37] 59.45 71.11 54.38 59.63 55.88 51.71 60.85 58.67 62.84 67.32 58.76 61.20
## [49] 57.26 60.98 63.56 54.84 58.02 55.19 61.57 58.80 57.75 61.13 52.49 66.46
## [61] 54.67 63.34 55.40 65.12 60.09 61.17 51.27 51.13 54.51 62.33 62.75 61.89
## [73] 61.84 58.80 65.61 62.98 53.58 55.77 62.63 59.99 63.54 61.97 63.70 55.92
## [85] 60.18 73.48 54.88 62.22 65.45 65.74 80.73 61.32 62.98 55.19 52.32 65.10
## [97] 60.29 56.62 51.14 56.02 61.91 67.61 72.66 63.04 62.39 62.66 54.83 58.69
## [109] 66.84 62.40 66.27 66.87 38.92 59.01 53.81 65.09 65.98 62.52 64.20 66.48
## [121] 59.79 61.69 61.09 72.31 65.67 64.19 63.39 68.20 69.07 51.26 60.22 75.97
## [133] 60.34 57.86 56.38 54.72
head(dfCovid)
## County State
## 1 San Juan-Carolina-Caguas PR
## 2 Manhattan KS
## 3 Hilton Head Island-Bluffton-Beaufort SC
## 4 Kahului-Wailuku-Lahaina HI
## 5 Spartanburg SC
## 6 Baton Rouge LA
## MMSA total_percent_at_risk
## 1 San Juan-Carolina-Caguas, PR 52.88
## 2 Manhattan, KS 47.29
## 3 Hilton Head Island-Bluffton-Beaufort, SC 62.72
## 4 Kahului-Wailuku-Lahaina, HI 59.13
## 5 Spartanburg, SC 66.12
## 6 Baton Rouge, LA 66.60
## high_risk_per_ICU_bed high_risk_per_hospital icu_beds hospitals total_at_risk
## 1 NA NA NA NA 923725.20
## 2 4489.849 8979.698 8 4 35918.79
## 3 3904.164 36438.860 28 3 109316.58
## 4 3860.557 19302.785 20 4 77211.14
## 5 3786.116 85187.600 45 2 170375.20
## 6 3459.733 39000.621 124 11 429006.83
Now the data can be used to make aggregations, with the help from R libraries. I produced two data frames that illustrate the states with the highest total at risk and total high risk per ICU bed.
library(magrittr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dfICU <- dfCovid %>%
group_by(State) %>%
summarize(totalHighRiskPerICUBed=sum(high_risk_per_ICU_bed)) %>%
arrange(desc(totalHighRiskPerICUBed))
## `summarise()` ungrouping output (override with `.groups` argument)
dfAtRisk <- dfCovid %>%
group_by(State) %>%
summarize(totalAtRisk=sum(total_at_risk)) %>%
arrange(desc(totalAtRisk))
## `summarise()` ungrouping output (override with `.groups` argument)
head(dfICU)
## # A tibble: 6 x 2
## State totalHighRiskPerICUBed
## <chr> <dbl>
## 1 FL 21129.
## 2 TX 19089.
## 3 SC 14110.
## 4 KS 9596.
## 5 CA 8765.
## 6 NY 8145.
head(dfAtRisk)
## # A tibble: 6 x 2
## State totalAtRisk
## <chr> <dbl>
## 1 CA 9310216.
## 2 TX 8475415.
## 3 FL 8069962.
## 4 NY-NJ 6165102.
## 5 IL-IN-WI 3831636.
## 6 PA 3189176.
I merged the data frames by State, ordered, and created a subset of the State’s with the highest numbers.
dfCovid2 <- merge(dfICU,dfAtRisk, by.x='State', by.y='State')
dfCovid2 <- dfCovid2[order(-dfCovid2$totalHighRiskPerICUBed),]
dfTopCovidStates <- dfCovid2[1:10,]
row.names(dfTopCovidStates) <- NULL
Now that the data set is created and can be truly analyzed to the relationship between total at risk and total high risk per ICU bed.
dfTopCovidStates
## State totalHighRiskPerICUBed totalAtRisk
## 1 FL 21128.549 8069962.4
## 2 TX 19089.416 8475415.1
## 3 SC 14110.259 1617604.9
## 4 KS 9596.329 486499.0
## 5 CA 8764.865 9310216.2
## 6 NY 8144.906 2748891.3
## 7 MA 7083.407 2185511.9
## 8 SD 5933.772 207426.8
## 9 OH 5933.591 2724163.7
## 10 UT 5781.716 856044.3
ggplot(data=dfTopCovidStates) +
geom_bar(mapping=aes(x=State, y=totalAtRisk), stat='identity',fill='#f68060') +
coord_flip()
ggplot(data=dfTopCovidStates) +
geom_bar(mapping=aes(x=State, y=totalHighRiskPerICUBed), stat='identity',fill='#f68060') +
coord_flip()
Following my analysis at a state level of Covid-19’s potential threat, the data illustrates that states do not have porportional health care facilities for their high risk population. Here, we can see the potential threat of Covid-19 can be devastating to states. They will not be able to accomondate given the total number of high risk per ICU bed.