Coronaviruses are a group of viruses that affect respiratory and intestinal health in humans and animals. Although COVID-19 may have less noticeable effects on our day-to-day lives as it moves towards endemic status, other coronaviruses may emerge over the coming years.
Given the potential for other epidemics, examining COVID-19 data may help policy makers identify factors that exascerbate the effects of novel coronaviruses. Two factors relevant to residents of the United States are poverty and obesity, both of which are increasing in the United States.
The purpose of this analysis was to determine if poverty and obesity are associated with increased contagion and death caused by COVID-19.
Data were retrieved from
After importing the data, Base R and the Tidyverse and Lubridate libraries were used to review and clean the data.
# Manipulating data to make it easier to work with
us_cases <- us_cases %>%
pivot_longer(cols = -(UID: Combined_Key),
names_to = "date",
values_to = "cases") %>%
select(Admin2: cases) %>%
mutate(date = mdy(date)) %>%
select(-c(Lat, Long_))
# Manipulating data to make it easier to work with
us_deaths <- us_deaths %>%
pivot_longer(cols = -(UID:Population),
names_to = "date",
values_to = "deaths") %>%
select(Admin2:deaths) %>%
mutate(date = mdy(date)) %>%
select(-c(Lat, Long_))
# Combining data frames
us <- us_cases %>%
full_join(us_deaths)
# Importing population data
uid <- read_csv(uid_lookup_url) %>%
select(-c(Lat, Long_, Combined_Key, code3, iso2, iso3, Admin2))
# Organizing data by states and calculating case/death per thousand
us_by_state <- us %>%
group_by(Province_State, Country_Region, date) %>%
summarize(cases = sum(cases), deaths = sum(deaths),
Population = sum(Population)) %>%
mutate(deaths_per_mill = deaths * 1000000 / Population) %>%
select(Province_State, Country_Region, date, cases, deaths, deaths_per_mill,
Population) %>%
ungroup()
# Summing cases and deaths by state, death per million
us_totals <- us_by_state %>%
group_by(Country_Region, date) %>%
summarize(cases = sum(cases), deaths = sum(deaths),
Population = sum(Population)) %>%
mutate(deaths_per_mill = deaths * 1000000 / Population) %>%
select(Country_Region, date, cases, deaths, deaths_per_mill, Population) %>%
ungroup()
# Cases and deaths per thousand, checking for missing data
us_state_totals <- us_by_state %>%
group_by(Province_State) %>%
summarize(deaths = max(deaths), cases = max(cases),
population = max(Population),
cases_per_thou = 1000 * cases / population,
deaths_per_thou = 1000 * deaths / population) %>%
filter(cases > 0, population > 0)
# Removing territories from data set
us_state_only <- us_state_totals[-c(3, 13, 38, 43, 51),]
# Adding poverty rate data
us_state_only$pov_rate <- c(15.98, 10.34, 14.12, 16.08, 12.58, 9.78, 9.78,
11.44, 15.45, 13.34, 14.28, 9.26, 11.94, 11.99,
12.91, 11.11, 11.44, 16.61, 18.65, 11.07, 9.02,
9.85, 13.71, 9.33, 19.58, 13.01, 12.78, 10.37,
12.78, 7.42, 9.67, 18.55, 13.58, 13.98, 10.53,
13.62, 15.27, 12.36, 11.95, 11.58, 14.68,
12.81, 14.62, 14.22, 9.13, 10.78, 10.01, 10.19,
17.10, 10.97, 10.76)
# Adding obesity rate data
us_state_only$obesity <- c(39, 31.9, 30.9, 36.4, 30.3, 24.2, 29.2, 36.5, 24.3,
28.4, 34.3, 24.5, 31.1, 32.4, 36.8, 36.5, 35.3, 36.6,
38.1, 31, 31, 24.4, 35.2, 30.7, 39.7, 34, 28.5, 34,
28.7, 29.9, 27.7, 30.9, 26.3, 33.6, 33.1, 35.5, 36.4,
28.1, 31.5, 30.1, 36.2, 33.2, 35.6, 35.8, 28.6, 26.3,
32.2, 28, 39.1, 32.3, 30.7)
# Rounding data to two decimal places
us_state_only$cases_per_thou <- round(us_state_only$cases_per_thou, 2)
us_state_only$deaths_per_thou <- round(us_state_only$deaths_per_thou, 2)
The scatter plots indicate the data are non-linear; therefore, Spearman rank correlation coefficients (SRCC) was used to examine the relationship between the variables.
The values from the SRCC as of September 13th, 2022 can be found in the table below.
Note: The rho and p values may have changed from the writing of this report due to the addition of new data
Guidelines from the medical literature were used to interpret the correlation coefficients. The coefficients were categorized as (a) None, (b) Poor, (c) Fair, (d) Moderate, (e) Very Strong, or (f) Perfect.
# COR for poverty and cases
cor.test(us_state_only$pov_rate, us_state_only$cases_per_thou,
method = 'spearman', exact = F, alternative = 'greater')
# COR for poverty and death
cor.test(us_state_only$pov_rate, us_state_only$deaths_per_thou,
method = 'spearman', exact = F, alternative = 'greater')
# COR for obesity and cases
cor.test(us_state_only$obesity, us_state_only$cases_per_thou,
method = 'spearman', exact = F, alternative = 'greater')
# COR for obesity and death
cor.test(us_state_only$obesity, us_state_only$deaths_per_thou,
method = 'spearman', exact = F, alternative = 'greater')
| Variables | Rho | Interpretation | P_Value |
|---|---|---|---|
| Poverty x Cases per Thousand | .310 | Fair | .01342 |
| Poverty x Death per Thousand | .711 | Moderate | 2.54e-09 |
| Obesity x Cases per Thousand | .332 | Fair | .008622 |
| Obesity x Death per Thousand | .512 | Fair | 6.095e-05 |
The results indicate three of the variable pairings (e.g., obesity and cases per thousand) had fair relationships and poverty and deaths per thousand had a moderate relationship.
The results of the analysis indicate poverty might be an effective predictor for death from COVID-19. Data, such as these, may help policy makers allocate resources in a more equitable manner during subsequent epidemics.
To maximize the utlity of an analysis, its limitations must be highlighted. There are at least two limitations that should be considered. First, is the absence of data on other medical conditions (e.g., diabetes) that may have impacted a person’s response to the virus. Second, the data may have been biased by the data collection procedures. Specifically, it is unclear how deaths of people with multiple medical conditions were catergorized. For instance, if a person died with late stage cancer and COVID-19, was the death automatically attributed to COVID-19? If so, then it is possible that the total deaths from COVID-19 were overcounted, which would affect the results of the analysis.
In addition to adding new variables of interest, like other medical conditions, researchers should continue to examine the relationship between coronaviruses and poverty to determine if it can be used to predict responses to future epidemics.
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gridExtra_2.3 DT_0.22 lubridate_1.8.0 forcats_0.5.1
## [5] stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4 readr_2.1.2
## [9] tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] lattice_0.20-45 assertthat_0.2.1 digest_0.6.29 utf8_1.2.2
## [5] R6_2.5.1 cellranger_1.1.0 backports_1.4.1 reprex_2.0.1
## [9] evaluate_0.15 highr_0.9 httr_1.4.2 pillar_1.7.0
## [13] rlang_1.0.2 curl_4.3.2 readxl_1.4.0 rstudioapi_0.13
## [17] jquerylib_0.1.4 Matrix_1.4-1 rmarkdown_2.13 labeling_0.4.2
## [21] splines_4.0.4 htmlwidgets_1.5.4 bit_4.0.4 munsell_0.5.0
## [25] broom_0.8.0 compiler_4.0.4 modelr_0.1.8 xfun_0.30
## [29] pkgconfig_2.0.3 mgcv_1.8-40 htmltools_0.5.2 tidyselect_1.1.2
## [33] fansi_1.0.3 crayon_1.5.1 tzdb_0.3.0 dbplyr_2.1.1
## [37] withr_2.5.0 grid_4.0.4 nlme_3.1-157 jsonlite_1.8.0
## [41] gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2 magrittr_2.0.3
## [45] scales_1.2.0 cli_3.2.0 stringi_1.7.6 vroom_1.5.7
## [49] farver_2.1.0 fs_1.5.2 xml2_1.3.3 bslib_0.3.1
## [53] ellipsis_0.3.2 generics_0.1.2 vctrs_0.3.8 tools_4.0.4
## [57] bit64_4.0.5 glue_1.6.2 hms_1.1.1 parallel_4.0.4
## [61] fastmap_1.1.0 yaml_2.3.5 colorspace_2.0-3 rvest_1.0.2
## [65] knitr_1.38 haven_2.4.3 sass_0.4.1