Outline

Introduction
Project Background
Methodology
Results
Conclusions and Limitations
References

Introduction

Project Background and Goals

Research on the Covid-19 pandemic has illuminated health disparities nationally and globally. We now know minority groups and the poor are disproportionately contracting the virus, and need to address this issue within the health care system. Access to nutrition is determined by income and may also have an impact on the effects of Covid-19. These insights are useful; however, we at present do not know the long-term effects of the nutritional disparities on Covid-19. This makes studying nutritional markers increasingly important. In addition, recently obesity has been found to be a preexisting condition for Covid-19. In my project, I want to investigate the relationship between nutrition and the survival rate associated with Covid-19. I want to see if there is a difference in survival rates among countries with similar GDP per capita in 2018 based on nutrition. This will give us insights on the behaviors that may contribute to surviving Covid-19.

Hypothesis:

Based on existing research on obesity and its implications for Covid‐19 mortality, I expect that countries with higher rates of obesity will have higher death rates, and lower survival rates of Covid-19.

Method

Step 1: Load the necessary packages.

library(tidyverse)
library(dplyr)
library(ggmap)
library(RColorBrewer)
library(spData)
library(sf)
library(ggpubr)
knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

Step 2: Load the Data

Where is this data coming from? https://www.kaggle.com/

Gross Domestic Product Data Description from Kaggle

"Gross Domestic Product (GDP) is the monetary value of all finished goods and services made within a country during a specific period. GDP provides an economic snapshot of a country, used to estimate the size of an economy and growth rate. This dataset contains the GDP based on Purchasing Power Parity (PPP).

GDP comparisons using PPP are arguably more useful than those using nominal GDP when assessing a nation’s domestic market because PPP takes into account the relative cost of local goods, services and inflation rates of the country, rather than using international market exchange rates which may distort the real differences in per capita income."

Covid-19 Data Description from Kaggle

"In this dataset, I have combined data of different types of food, world population obesity and undernourished rate, and global COVID-19 cases count from around the world in order to learn more about how a healthy eating style could help combat the Corona Virus. And from the dataset, we can gather information regarding diet patterns from countries with lower COVID infection rate, and adjust our own diet accordingly.

In each of the 4 datasets below, I have calculated fat quantity, energy intake (kcal), food supply quantity (kg), and protein for different categories of food (all calculated as percentage of total intake amount). I’ve also added on the obesity and undernourished rate (also in percentage) for comparison. The end of the datasets also included the most up to date confirmed/deaths/recovered/active cases (also in percentage of current population for each country)."

data(world)
world1 <- st_as_sf(world)

setwd("C:/Users/Gabby Thom/Downloads/R Data Science/FINAL Project (35% of Grade)")
covid_fat=read.csv("COVID-19 Healthy Diet Dataset/Fat_Supply_Quantity_Data.csv")
gdp=read.csv("GDP.csv")

knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

Step 3: Remove missing data

# A) Remove any countries from the GDP data set that do not contain data from 2018,

gdp=filter(gdp, X2018 != "NA")

#B) Remove any countries that don't contain information for Obesity which serves as a nutrition marker variable or the death and recovery rates for Covid-19.  

covid_fat = filter(covid_fat, Obesity != "NA")
covid_fat = filter(covid_fat, Deaths != "NA")
covid_fat = filter(covid_fat, Recovered != "NA")

#C) Make sure that the two datasets contain the same countries.
#Use inner join to combine the datasets in order to make correlations easier!

covid_gdp = inner_join(gdp, covid_fat, by = "Country")
knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

Step 4:Connect Covid data to world spatial polygons.

covid_gdp1 <- covid_gdp %>%
  dplyr::select(Deaths, Obesity, X2018 ,Country, Recovered) %>%
  left_join(world1, by = c("Country" = "name_long")) %>%
  st_as_sf()
knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

Step 5: Prepare the country polygon data.

#This will allow us to project the relationship obesity and Covid-19 mortality rates on a world map. 

world1 <- st_as_sf(world)
knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

Step 6: Compute Correlations

How does the obesity prevalence in a country relate to Covid mortality?

cor.test(covid_gdp$Obesity, covid_gdp$Deaths, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  covid_gdp$Obesity and covid_gdp$Deaths
## t = 6.7365, df = 142, p-value = 3.741e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3573083 0.6068465
## sample estimates:
##       cor 
## 0.4921212
ggscatter(covid_gdp1, x="Obesity", y="Deaths", color = "red",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "Obesity in percent per Country", ylab = "Deaths due to Covid in percent per Country")+ #This function plots a scatter plot using obesity rates on the x axis and Covid mortality rate on the y axis. Then it adds a regression line with a shaded region to display the corresponding confidence interval. 
          ggtitle("Obesity versus Covid Mortality")

knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

A correlation of 0.49 is not very strong, however, it does suggest some relationship between Obesity and Covid mortality. Given that the relationship is positive, it implies that, higher obesity rates in a country correspond to higher rates of Covid mortality.

How does the obesity prevalence in a country relate to Covid recovery?

cor.test(covid_gdp$Obesity, covid_gdp$Recovered, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  covid_gdp$Obesity and covid_gdp$Recovered
## t = 5.5409, df = 142, p-value = 1.418e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2771684 0.5474479
## sample estimates:
##       cor 
## 0.4216285
ggscatter(covid_gdp1, x="Obesity", y="Recovered", color = "purple",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "Obesity in percent per Country", ylab = "Recovered due to Covid in percent per Country")+ #This function plots a scatter plot using obesity rate x axis and Covid recovery rate on the y axis. Then it adds a regression line with a shaded region to display the corresponding confidence interval. 
          ggtitle("Obesity versus Covid Recovery")

knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

A correlation of 0.42 is not very strong, however, it does suggest some relationship between Obesity rates and Covid recovery. Given that the relationship is positive, it implies that, higher obesity rates in a country correspond to higher rates of Covid recovery.

How does a country’s GDP per capita relate to Covid mortality?

cor.test(covid_gdp$X2018, covid_gdp$Deaths, method ="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  covid_gdp$X2018 and covid_gdp$Deaths
## t = 5.535, df = 142, p-value = 1.458e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2767542 0.5471337
## sample estimates:
##       cor 
## 0.4212596
ggscatter(covid_gdp1, x="X2018", y="Deaths", color = "lightgreen",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "Gross Domestic Product per Capita 2018 by Country", ylab = "Deaths in percent due to Covid per Country")+ #This function plots a scatter plot using GDP per capita on the x axis and Covid mortality rate on the y axis. Then it adds a regression line with a shaded region to display the corresponding confidence interval. 
          ggtitle("Gross Domestic Product per Capita versus Covid Mortality")

knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

A correlation of 0.42 is not very strong, however, it does suggest some relationship between GDP per capita and Covid mortality. Given that the relationship is positive, it implies that, higher GDP per capita in a country correspond to higher rates of Covid mortality.

Results

Step 7: Create ggplots and tables

 ggplot() +
    geom_sf(data = world1)+ #puts down base map
    geom_sf(data = covid_gdp1, aes(fill = Obesity))+ #fills each country polygon with the corresponding Obesity prevalence rate
    scale_fill_distiller(palette="YlOrBr", trans="log", direction=1, breaks= c(40, 30, 20, 10))+
    xlab(label = "longitude")+
    ylab(label = "laditude")+
    ggtitle("Obesity Rates by Country", subtitle = "as of 2020")

ggplot() +
    geom_sf(data = world1)+ #puts down base map
    geom_sf(data = covid_gdp1, aes(fill = X2018))+
    scale_fill_distiller(palette="Greens", trans="log", direction=1)+ #fills each country polygon with the corresponding GDP per Capita
    xlab(label = "longitude")+
    ylab(label = "laditude")+
    ggtitle("GDP per Capita by Country", subtitle = "as of 2018")

 ggplot() +
    geom_sf(data = world1)+ #puts down base map
    geom_sf(data = covid_gdp1, aes(fill = Recovered))+ #fills each country polygon with the corresponding Covid Mortality Rate
    scale_fill_distiller(palette="PuRd",direction=1)+
    xlab(label = "longitude")+
    ylab(label = "laditude")+
    ggtitle("Covid Recovery Rates by Country", subtitle = "as of 2020 written in percentages")

 ggplot() +
    geom_sf(data = world1)+ #puts down base map
    geom_sf(data = covid_gdp1, aes(fill = Deaths))+ #fills each country polygon with the corresponding Covid Mortality Rate
    scale_fill_distiller(palette="BuPu",direction=1)+
    xlab(label = "longitude")+
    ylab(label = "laditude")+
    ggtitle("Covid Mortality Rates by Country", subtitle = "as of 2020 written in percentages")

knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

Conclusions and Limitations

I found that there is a relationship between obesity and Covid mortality. I also found a correlation between GDP per capita and Covid mortality. Furthermore, I found a relationship between obesity and recovery rates of Covid-19. Although these relationship were not very strong, the fact that they were all positive invalidates some of my previous assumptions. I thought that higher rates of preexisting conditions such as obesity would lead to higher mortality rates. While I found evidence to support this hypothesis, my thoughts on the relationship GDP per Capita and Covid mortality were incorrect. My hypothesis on the relationship between obesity rates and Covid recovery rates was also false. This leads me to believe that controlling for more variables may be important for understanding these relationships.

These relationships may present an opportunity to investigate nutritional markers as they relate to Covid-19. It may be especially helpful to create a competitive variable that encompasses the nutritional value of the food that is consumed in a country. Then using the composition variable you could regress nutrition on Covid mortality controlling for GDP per capita. This would be important to control for because access to nutritious food is often related to income level.

One issue I encountered when looking at theses datasets is there was missing data for several countries leaving less countries available to contribute to the correlations. An additional problem was the lack of data on GDP per capita for 2019 and 2020. Since many countries have gone through economic crisis since the onset of the pandemic, I think that looking at GDP per capita throughout the pandemic may have been more insightful.

References

https://www.kaggle.com/mariaren/covid19-healthy-diet-dataset

https://www.kaggle.com/nitishabharathi/gdp-per-capita-all-countries

Dietz, W., & Santos‐Burgoa, C. (2020). Obesity and its Implications for COVID-19 Mortality. Obesity, 28(6), 1005–1005. https://doi.org/https://doi.org/10.1002/oby.22818