What’s the Health Lifestlye of the Residents of New York State?

Introduction: Collection and download of Health Data 2016

The Data set Collected was produced through the 2016 Health Data survey, using social explorer as an output service. With my knowledge of demographics, New York State has approximately a population of 19.9 million people ranking it one of the top most populated states in the United States. Using the collected data, that I have reproduced we can evaluate healthy and non-healthy living within New York State.

Load Packages & Data

library(readr)
HealthdataNY <- read_csv("C:\\Users\\Cespi\\Documents\\712\\HealthSurvey NY.csv")
head(HealthdataNY)

Renaming, Dropping, and Summary of Variables:

The health data collected includes various variables which are associated to the health living styles of humans in New York State. In my data I have renamed the variables that I will be analyzing. Renaming the variables was needed in order to illustrate each variable easily to all individuals. *** ##### The variables chosen are as follows:

  • County- which is formated uner the same name. This variable differentiates each respondent by their living location within New York State. For my research, I hope to see if geographical location has an influence on a persons life.
  • Adult Diabetics- This variable was renamed as Ad_Diabet. The variable is a Percent of Adult diabetics. As further research is done, I hope to find a correlation between diabetics and the food environment index variable.
  • Adult limited access to food- This variable was renamed as Ad_limitfds. It represents the Percent of Persons with Limited Access to Healthy Foods.
  • Adult Access to Exercise Opportunities- This varibale was renamed as Access_Exer_Opport and represents the Percent of Persons with Access to Exercise Opportunities. I hypothesis that this variable will have a positive effect on individuals and we will see a higher rate of persons using exercise in the larger more populated counties in New York.
  • Obese Adults- renamed as Obese_Adults represents the Percent Obese Persons 20 Years and Over. As an independent variable I hope to see how this effects diabetes and the Food environment index.
  • Physically Inactive- renamed as PhysInactive represents the Percent Percent Physically Inactive Persons 20 years and over. *Food Environment Index- renamed as FEI represents the food environment index. This variable is a compilation of two evaluations; Limited access to healthy foods estimates the percentage of the population who are low income and do not live close to a grocery store and the Food insecurity which approximates the percent of persons that had limited access to food this past year. I hypothesis that this variable will have a great effect to persons with diabetes.
Sorting The Data:

After, evaluating the variables and renaming them to be more readable to persons that aren’t familiar with the survey. I have chosen the above variables to evaluate and have sorted the data. This has allowed me to produce a data set which only captures the variables needed. The data set below demonstrates the sorting of my data and the renaming of my variables.

library(dplyr)
Healthy_Living <- rename (HealthdataNY,
          "County" = Geo_NAME,
          "Ad_Diabet" = SE_T009_001,
          "Ad_limitFds" = SE_T012_001,
          "Access_Exer_Opport" = SE_T012_002,
          "Obese_Adults" = SE_T012_003,
          "PhysInactive" = SE_T012_004,
          "FEI" = SE_T013_001)
Healthy_Living<-select(Healthy_Living, County, Ad_Diabet, 
        Ad_limitFds, Access_Exer_Opport, 
        Obese_Adults, PhysInactive, FEI)

Experimenting with different methods to sort and rename data

The following data set was generated using a one step coding allowing me to sort and rename my data in fewer steps. The exciting part to R is the numerous ways to code.

Healthy_Living<- HealthdataNY%>%
  rename ("County" = Geo_NAME,
          "Ad_Diabet" = SE_T009_001,
          "Ad_limitFds" = SE_T012_001,
          "Access_Exer_Opport" = SE_T012_002,
          "Obese_Adults" = SE_T012_003,
          "PhysInactive" = SE_T012_004,
          "FEI" = SE_T013_001)%>%
  select( County, Ad_Diabet, 
        Ad_limitFds, Access_Exer_Opport, 
        Obese_Adults, PhysInactive, FEI)
head(Healthy_Living)

Scatter Plot

The next step taken after sorting through my data, I wanted to evaluate the correlation between Adult Diabetics and the Food Environment Index. I have hypothesized that the scatter box will show a negative relationship considering that if the food environment index is higher the percentage of persons with diabetes is lower. Though, I hoped for a negative relationship it seems as though the plot shows no linear relationship between both variables. Instead, I see that even if food is available to people in their counties, they still have approximately the same rates of diabetics.

head(Healthy_Living)
plot(Healthy_Living$Ad_Diabet, Healthy_Living$FEI, xlab= "FEI", ylab = "Ad_Diabet", main= "The correlation between Adult Diabetes & Food Enviornment Index", xlim=c(5,10.5), ylim=c(6,12.5), col="blue")