How One High-Risk Community In Rural South Carolina Is Bracing For COVID-19

Introduction

The article from April discusses the rise of Covid-19 and high risk counties and available ICU beds to understand the potential threat. It highlights a county called Hilton Head, South Carolina that has 3900 high risk individuals with 3 hospitals and 28 ICU beds.

The file, "mmsa-icu-beds.csv combines data from the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS), a collection of health-related surveys conducted each year of more than 400,000 Americans, and the Kaiser Family Foundation to show the number of people who are at high risk of becoming seriously ill from COVID-19 per ICU bed in each metropolitan area, micropolitan area or metropolitan division for which we have data.

Being high risk is defined by a number of health conditions and behaviors. Based on the CDC’s list of the relevant underlying conditions that put people at higher risk of serious illness from COVID-19, plus the advice of experts from the Cleveland Clinic, the American Lung Association and the American Heart Association, we counted people as at risk if they’re 65 or older; if they have ever been told they have hypertension, coronary heart disease, a myocardial infarction, angina, a stroke, chronic kidney disease, chronic obstructive pulmonary disease, emphysema, chronic bronchitis or diabetes; if they currently have asthma or a BMI over 40; if they smoke cigarettes every day or some days or use e-cigarettes or vaping products every day or some days; or if they’re currently pregnant. We included every individual who meets at least one of these conditions but counted them only once each, so anyone with multiple conditions doesn’t get counted multiple times. We were not able to include a number of conditions for which we did not have location-based data from the BRFSS, such as liver disease, having smoked, vaped or dabbed marijuana in the last 30 days, and getting cancer treatment or being on immunosuppression medications.-“fivethirtyeight”(2020, April 22)covid-geography https://github.com/fivethirtyeight/data/tree/master/covid-geography

##Loading Libraries

First, I downloaded libraries that will help me load, manipulate, and visualize the data file located in GitHub.

library(readr)
library(stringr)
library(ggplot2)

theUrl <- 'https://raw.githubusercontent.com/fivethirtyeight/data/4a989f20bc0a44ed990cb9ca0be20d06b095145c/covid-geography/mmsa-icu-beds.csv'
dfCovidOriginal <- read_delim(file=theUrl, delim=',')

## Parsed with column specification:
## cols(
##   MMSA = col_character(),
##   total_percent_at_risk = col_character(),
##   high_risk_per_ICU_bed = col_double(),
##   high_risk_per_hospital = col_double(),
##   icu_beds = col_double(),
##   hospitals = col_double(),
##   total_at_risk = col_double()
## )

Looking Under the Hood of the Data

Once the data is properly loaded, I familiarized myself with the data set. Then, I plotted two significant variables from the data set to see a relationship, which were icu_beds and total_at_risk.

head(dfCovidOriginal)

## # A tibble: 6 x 7
##   MMSA  total_percent_a~ high_risk_per_I~ high_risk_per_h~ icu_beds hospitals
##   <chr> <chr>                       <dbl>            <dbl>    <dbl>     <dbl>
## 1 San ~ 52.88%                        NA               NA        NA        NA
## 2 Manh~ 47.29%                      4490.            8980.        8         4
## 3 Hilt~ 62.72%                      3904.           36439.       28         3
## 4 Kahu~ 59.13%                      3861.           19303.       20         4
## 5 Spar~ 66.12%                      3786.           85188.       45         2
## 6 Bato~ 66.60%                      3460.           39001.      124        11
## # ... with 1 more variable: total_at_risk <dbl>

summary(dfCovidOriginal)

##      MMSA           total_percent_at_risk high_risk_per_ICU_bed
##  Length:136         Length:136            Min.   : 413.7       
##  Class :character   Class :character      1st Qu.:1487.8       
##  Mode  :character   Mode  :character      Median :1923.7       
##                                           Mean   :1947.4       
##                                           3rd Qu.:2321.1       
##                                           Max.   :4489.8       
##                                           NA's   :1            
##  high_risk_per_hospital    icu_beds        hospitals      total_at_risk    
##  Min.   : 6770          Min.   :   8.0   Min.   :  1.00   Min.   :  17942  
##  1st Qu.:31005          1st Qu.:  89.5   1st Qu.:  5.00   1st Qu.: 158748  
##  Median :42460          Median : 221.0   Median :  9.00   Median : 396082  
##  Mean   :43787          Mean   : 360.2   Mean   : 13.76   Mean   : 667189  
##  3rd Qu.:56583          3rd Qu.: 489.5   3rd Qu.: 18.00   3rd Qu.: 932793  
##  Max.   :91771          Max.   :2777.0   Max.   :100.00   Max.   :6165102  
##  NA's   :1              NA's   :1        NA's   :1

ggplot(data=dfCovidOriginal) +
  geom_point(mapping=aes(x=icu_beds,y=total_at_risk)) +
  geom_smooth(mapping=aes(x=icu_beds,y=total_at_risk))

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).

Extract State and County

The article made a great point at looking into counties that were at high risk but I wanted to take a more generalized approach and look at the state’s aggregate total at risk and highest risk per ICU bed. In order to take that approach, I need to split the column where the county and state were stored together.

location <- str_split(string=dfCovidOriginal$MMSA, pattern=', ')
mLocation <- data.frame(Reduce(rbind, location))
head(mLocation)

##                                        X1 X2
## init             San Juan-Carolina-Caguas PR
## X                               Manhattan KS
## X.1  Hilton Head Island-Bluffton-Beaufort SC
## X.2               Kahului-Wailuku-Lahaina HI
## X.3                           Spartanburg SC
## X.4                           Baton Rouge LA

names(mLocation) <- c('County','State')
row.names(mLocation) <- NULL

dfCovid <- cbind(mLocation,dfCovidOriginal)
dfCovid$total_percent_at_risk <- parse_number(dfCovid$total_percent_at_risk)
dfCovid$total_percent_at_risk

##   [1] 52.88 47.29 62.72 59.13 66.12 66.60 57.72 68.32 67.11 55.96 52.17 60.33
##  [13] 60.60 48.90 60.63 53.80 57.99 52.69 55.37 58.12 56.62 57.99 56.85 50.99
##  [25] 58.26 61.40 55.82 61.67 60.48 48.01 73.79 50.03 51.35 59.69 68.10 60.83
##  [37] 59.45 71.11 54.38 59.63 55.88 51.71 60.85 58.67 62.84 67.32 58.76 61.20
##  [49] 57.26 60.98 63.56 54.84 58.02 55.19 61.57 58.80 57.75 61.13 52.49 66.46
##  [61] 54.67 63.34 55.40 65.12 60.09 61.17 51.27 51.13 54.51 62.33 62.75 61.89
##  [73] 61.84 58.80 65.61 62.98 53.58 55.77 62.63 59.99 63.54 61.97 63.70 55.92
##  [85] 60.18 73.48 54.88 62.22 65.45 65.74 80.73 61.32 62.98 55.19 52.32 65.10
##  [97] 60.29 56.62 51.14 56.02 61.91 67.61 72.66 63.04 62.39 62.66 54.83 58.69
## [109] 66.84 62.40 66.27 66.87 38.92 59.01 53.81 65.09 65.98 62.52 64.20 66.48
## [121] 59.79 61.69 61.09 72.31 65.67 64.19 63.39 68.20 69.07 51.26 60.22 75.97
## [133] 60.34 57.86 56.38 54.72

head(dfCovid)

##                                 County State
## 1             San Juan-Carolina-Caguas    PR
## 2                            Manhattan    KS
## 3 Hilton Head Island-Bluffton-Beaufort    SC
## 4              Kahului-Wailuku-Lahaina    HI
## 5                          Spartanburg    SC
## 6                          Baton Rouge    LA
##                                       MMSA total_percent_at_risk
## 1             San Juan-Carolina-Caguas, PR                 52.88
## 2                            Manhattan, KS                 47.29
## 3 Hilton Head Island-Bluffton-Beaufort, SC                 62.72
## 4              Kahului-Wailuku-Lahaina, HI                 59.13
## 5                          Spartanburg, SC                 66.12
## 6                          Baton Rouge, LA                 66.60
##   high_risk_per_ICU_bed high_risk_per_hospital icu_beds hospitals total_at_risk
## 1                    NA                     NA       NA        NA     923725.20
## 2              4489.849               8979.698        8         4      35918.79
## 3              3904.164              36438.860       28         3     109316.58
## 4              3860.557              19302.785       20         4      77211.14
## 5              3786.116              85187.600       45         2     170375.20
## 6              3459.733              39000.621      124        11     429006.83

Aggregate

Now the data can be used to make aggregations, with the help from R libraries. I produced two data frames that illustrate the states with the highest total at risk and total high risk per ICU bed.

library(magrittr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

dfICU <- dfCovid %>% 
  group_by(State) %>% 
  summarize(totalHighRiskPerICUBed=sum(high_risk_per_ICU_bed)) %>% 
  arrange(desc(totalHighRiskPerICUBed))

## `summarise()` ungrouping output (override with `.groups` argument)

dfAtRisk <- dfCovid %>% 
  group_by(State) %>% 
  summarize(totalAtRisk=sum(total_at_risk)) %>% 
  arrange(desc(totalAtRisk))

## `summarise()` ungrouping output (override with `.groups` argument)

head(dfICU)

## # A tibble: 6 x 2
##   State totalHighRiskPerICUBed
##   <chr>                  <dbl>
## 1 FL                    21129.
## 2 TX                    19089.
## 3 SC                    14110.
## 4 KS                     9596.
## 5 CA                     8765.
## 6 NY                     8145.

head(dfAtRisk)

## # A tibble: 6 x 2
##   State    totalAtRisk
##   <chr>          <dbl>
## 1 CA          9310216.
## 2 TX          8475415.
## 3 FL          8069962.
## 4 NY-NJ       6165102.
## 5 IL-IN-WI    3831636.
## 6 PA          3189176.

Join and Subset

I merged the data frames by State, ordered, and created a subset of the State’s with the highest numbers.

dfCovid2 <- merge(dfICU,dfAtRisk, by.x='State', by.y='State')
dfCovid2 <- dfCovid2[order(-dfCovid2$totalHighRiskPerICUBed),]

dfTopCovidStates <- dfCovid2[1:10,]
row.names(dfTopCovidStates) <- NULL

Analyze

Now that the data set is created and can be truly analyzed to the relationship between total at risk and total high risk per ICU bed.

dfTopCovidStates

##    State totalHighRiskPerICUBed totalAtRisk
## 1     FL              21128.549   8069962.4
## 2     TX              19089.416   8475415.1
## 3     SC              14110.259   1617604.9
## 4     KS               9596.329    486499.0
## 5     CA               8764.865   9310216.2
## 6     NY               8144.906   2748891.3
## 7     MA               7083.407   2185511.9
## 8     SD               5933.772    207426.8
## 9     OH               5933.591   2724163.7
## 10    UT               5781.716    856044.3

ggplot(data=dfTopCovidStates) + 
  geom_bar(mapping=aes(x=State, y=totalAtRisk), stat='identity',fill='#f68060') +
  coord_flip()

ggplot(data=dfTopCovidStates) + 
  geom_bar(mapping=aes(x=State, y=totalHighRiskPerICUBed), stat='identity',fill='#f68060') +
  coord_flip()

Conclusion

Following my analysis at a state level of Covid-19’s potential threat, the data illustrates that states do not have porportional health care facilities for their high risk population. Here, we can see the potential threat of Covid-19 can be devastating to states. They will not be able to accomondate given the total number of high risk per ICU bed.

DATA607 - Assignment 1

Diego Correa

8/29/2020