The Data set Collected was produced through the 2016 Health Data survey, using social explorer as an output service. With my knowledge of demographics, New York State has approximately a population of 19.9 million people ranking it one of the top most populated states in the United States. Using the collected data, that I have reproduced we can evaluate healthy and non-healthy living within New York State.
Load Packages & Data
library(readr)
HealthdataNY <- read_csv("C:\\Users\\Cespi\\Documents\\712\\HealthSurvey NY.csv")
head(HealthdataNY)
The health data collected includes various variables which are associated to the health living styles of humans in New York State. In my data I have renamed the variables that I will be analyzing. Renaming the variables was needed in order to illustrate each variable easily to all individuals. *** ##### The variables chosen are as follows:
After, evaluating the variables and renaming them to be more readable to persons that aren’t familiar with the survey. I have chosen the above variables to evaluate and have sorted the data. This has allowed me to produce a data set which only captures the variables needed. The data set below demonstrates the sorting of my data and the renaming of my variables.
library(dplyr)
Healthy_Living <- rename (HealthdataNY,
"County" = Geo_NAME,
"Ad_Diabet" = SE_T009_001,
"Ad_limitFds" = SE_T012_001,
"Access_Exer_Opport" = SE_T012_002,
"Obese_Adults" = SE_T012_003,
"PhysInactive" = SE_T012_004,
"FEI" = SE_T013_001)
Healthy_Living<-select(Healthy_Living, County, Ad_Diabet,
Ad_limitFds, Access_Exer_Opport,
Obese_Adults, PhysInactive, FEI)
The following data set was generated using a one step coding allowing me to sort and rename my data in fewer steps. The exciting part to R is the numerous ways to code.
Healthy_Living<- HealthdataNY%>%
rename ("County" = Geo_NAME,
"Ad_Diabet" = SE_T009_001,
"Ad_limitFds" = SE_T012_001,
"Access_Exer_Opport" = SE_T012_002,
"Obese_Adults" = SE_T012_003,
"PhysInactive" = SE_T012_004,
"FEI" = SE_T013_001)%>%
select( County, Ad_Diabet,
Ad_limitFds, Access_Exer_Opport,
Obese_Adults, PhysInactive, FEI)
head(Healthy_Living)
The next step taken after sorting through my data, I wanted to evaluate the correlation between Adult Diabetics and the Food Environment Index. I have hypothesized that the scatter box will show a negative relationship considering that if the food environment index is higher the percentage of persons with diabetes is lower. Though, I hoped for a negative relationship it seems as though the plot shows no linear relationship between both variables. Instead, I see that even if food is available to people in their counties, they still have approximately the same rates of diabetics.
head(Healthy_Living)
plot(Healthy_Living$Ad_Diabet, Healthy_Living$FEI, xlab= "FEI", ylab = "Ad_Diabet", main= "The correlation between Adult Diabetes & Food Enviornment Index", xlim=c(5,10.5), ylim=c(6,12.5), col="blue")