I want to perform a study on health trends. I did not have an opportunity in the past to work with health related data. I want to take advantage of the many data sources currently out there. With so much data, It makes sense to use some sort of relational data base such as MySQL to store the data. Data can also be collected using an API for HealthDataGov.

The data will most likely come from the sources in these following links: https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics https://www.data.gov/health/ https://www.ehdp.com/vitalnet/datasets.htm

Using API: https://cran.r-project.org/web/packages/rHealthDataGov/rHealthDataGov.pdf

I want to answer questions such as but not limited to: -Does region play a big factor in the overall health? -How does the overall nutrition vary by region? -Is there a relationship between well being and population density? -Does commute have an effect on someones well being?

I believe these to be important research questions necessary to understand the current state of Health care and wellness today. The project is to be carried out using a typical data science workflow presented to us in DATA 607. The project will include EDA in addition to modeling. (Modeling to be determined after EDA)