Introduction:

The County Health Rankings & Roadmaps (CHRR) program is a collaboration between the Robert Wood Johnson Foundation and the University of Wisconsin Population Health Institute. The goals of the County Health Rankings & Roadmaps (CHRR) program are to: 1-Build awareness of the multiple factors that influence health; 2-Provide a reliable, sustainable source of local data to communities to help them identify opportunities to improve their health; 3-Engage and activate local leaders from many sectors in creating sustainable community change; 4-Connect & empower community leaders working to improve health.

My Research Questions:

From this data I have selected eleven NY Conties with required variables from which I can see how various factors are contributing to our health and over all life style. My main research questions are “How Lifestyle is effecting NY residents health conditions?” “Are there any correlation between selected factors and health?” and which county has most overall unhealthy conditions? I can study this by analyzing food environment, exercise habits, Diets, smoking, alcohol variables etc. and is it contributing NY resident’s health i.e mental and physical health, diabetics, obessity issue etc.

Importing Data as CSV:

library(readr)
NYCountyHealth<-read_csv("/Users/kanwallatif/Documents/NYCountyData1.csv", col_names = TRUE)
head(NYCountyHealth)
## # A tibble: 6 x 31
##   Geo_NAME SE_T001_001 SE_T001_002 SE_T002_001 SE_T003_001 SE_T004_001
##   <chr>          <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
## 1 Albany …         3.3         3.4        12.2        8.57         300
## 2 Bronx C…         5           4.3        28.7        9.69         737
## 3 Columbi…         3.2         3.5        11.6        7.50          30
## 4 Nassau …         3           2.9        12.1        8.02        1982
## 5 New Yor…         3.5         3.5        14.1        8.62        2239
## 6 Niagara…         3.4         3.5        12.9        8.26          93
## # … with 25 more variables: SE_T004_002 <dbl>, SE_T004_003 <dbl>,
## #   SE_T005_001 <dbl>, SE_T006_001 <dbl>, SE_T006_002 <dbl>,
## #   SE_T006_003 <dbl>, SE_T007_001 <dbl>, SE_T007_002 <dbl>,
## #   SE_T008_001 <dbl>, SE_T008_002 <dbl>, SE_T008_003 <dbl>,
## #   SE_T008_004 <dbl>, SE_T009_001 <dbl>, SE_T009_002 <dbl>,
## #   SE_T010_001 <dbl>, SE_T010_002 <dbl>, SE_T010_003 <dbl>,
## #   SE_T011_001 <dbl>, SE_T011_002 <dbl>, SE_T012_001 <dbl>,
## #   SE_T012_002 <dbl>, SE_T012_003 <dbl>, SE_T012_004 <dbl>,
## #   SE_T012_005 <dbl>, SE_T013_001 <dbl>

Renaming variabes: I have renamed my varialbes so that they are understandable. Following are codes and results.

library(dplyr)
NYCountyHealthData <-rename(NYCountyHealth, "NYCounty"=Geo_NAME, "Physically_Unhealthy"=SE_T001_001, "Mentally_Unhealthy"=SE_T001_002, "Fair_Poor_Health"=SE_T002_001, "Low_Birthweight"=SE_T003_001, "Primary_Care_Physicians"=SE_T004_001, "Mental_Health_Providers"=SE_T004_002, "Dentist"=SE_T004_003, "Limited_Access_Doc"=SE_T005_001, "Persons_Without_Insurance_Under_19_Years"=SE_T006_001, "Persons_Without_Insurance_Between_18_64_Years"=SE_T006_002, "Persons_Without_Insurance_Under_65_Years"=SE_T006_003, "Premature_Deaths_under_75_Years"=SE_T007_001, "Potential_Life_Lost_under_75"=SE_T007_002, "Infant_Mortality"=SE_T008_001, "Child_Mortality"=SE_T008_002, "Premature_Age_adjusted_Mortality"=SE_T008_003, "Drug_Poisoning_Mortality"=SE_T008_004, "Diabetics"=SE_T009_001, "Diabetic_Test"=SE_T009_002, "Sexual_Activity_Teen Births"=SE_T010_001, "Chlamydia_Cases"=SE_T010_002, "HIV_Prevalence"=SE_T010_003, "Current_Smokers"=SE_T011_001, "Drinking_Adults"=SE_T011_002, "Limited_Access_HealthyFoods"=SE_T012_001, "Access_Exercise_Opportunities"=SE_T012_002, "Obese_Persons"=SE_T012_003, "Physically_InactivePersons"=SE_T012_004, "Children_Eligible_for_Free_Lunch_under_18"=SE_T012_005, "Food_Environment_Index"=SE_T013_001)
head(NYCountyHealthData)
## # A tibble: 6 x 31
##   NYCounty Physically_Unhe… Mentally_Unheal… Fair_Poor_Health
##   <chr>               <dbl>            <dbl>            <dbl>
## 1 Albany …              3.3              3.4             12.2
## 2 Bronx C…              5                4.3             28.7
## 3 Columbi…              3.2              3.5             11.6
## 4 Nassau …              3                2.9             12.1
## 5 New Yor…              3.5              3.5             14.1
## 6 Niagara…              3.4              3.5             12.9
## # … with 27 more variables: Low_Birthweight <dbl>,
## #   Primary_Care_Physicians <dbl>, Mental_Health_Providers <dbl>,
## #   Dentist <dbl>, Limited_Access_Doc <dbl>,
## #   Persons_Without_Insurance_Under_19_Years <dbl>,
## #   Persons_Without_Insurance_Between_18_64_Years <dbl>,
## #   Persons_Without_Insurance_Under_65_Years <dbl>,
## #   Premature_Deaths_under_75_Years <dbl>,
## #   Potential_Life_Lost_under_75 <dbl>, Infant_Mortality <dbl>,
## #   Child_Mortality <dbl>, Premature_Age_adjusted_Mortality <dbl>,
## #   Drug_Poisoning_Mortality <dbl>, Diabetics <dbl>, Diabetic_Test <dbl>,
## #   `Sexual_Activity_Teen Births` <dbl>, Chlamydia_Cases <dbl>,
## #   HIV_Prevalence <dbl>, Current_Smokers <dbl>, Drinking_Adults <dbl>,
## #   Limited_Access_HealthyFoods <dbl>,
## #   Access_Exercise_Opportunities <dbl>, Obese_Persons <dbl>,
## #   Physically_InactivePersons <dbl>,
## #   Children_Eligible_for_Free_Lunch_under_18 <dbl>,
## #   Food_Environment_Index <dbl>

Selecting Variables: Now I have selected following variables I am intertested in to study and further analysis.

NYCountyHealthData<-select(NYCountyHealthData, NYCounty, Physically_Unhealthy, Mentally_Unhealthy, Fair_Poor_Health, Limited_Access_Doc,Diabetics, Diabetic_Test, Current_Smokers, Drinking_Adults, Limited_Access_HealthyFoods, Access_Exercise_Opportunities, Obese_Persons, Physically_InactivePersons, Food_Environment_Index)
head(NYCountyHealthData)
## # A tibble: 6 x 14
##   NYCounty Physically_Unhe… Mentally_Unheal… Fair_Poor_Health
##   <chr>               <dbl>            <dbl>            <dbl>
## 1 Albany …              3.3              3.4             12.2
## 2 Bronx C…              5                4.3             28.7
## 3 Columbi…              3.2              3.5             11.6
## 4 Nassau …              3                2.9             12.1
## 5 New Yor…              3.5              3.5             14.1
## 6 Niagara…              3.4              3.5             12.9
## # … with 10 more variables: Limited_Access_Doc <dbl>, Diabetics <dbl>,
## #   Diabetic_Test <dbl>, Current_Smokers <dbl>, Drinking_Adults <dbl>,
## #   Limited_Access_HealthyFoods <dbl>,
## #   Access_Exercise_Opportunities <dbl>, Obese_Persons <dbl>,
## #   Physically_InactivePersons <dbl>, Food_Environment_Index <dbl>

Best way to Rename & Select Variables: Other Smarter way is through “PIPE” to combine renaming variables and selecting required variables.

NYCountyHealthData<- NYCountyHealth%>% rename("NYCounty"=Geo_NAME, "Physically_Unhealthy"=SE_T001_001, "Mentally_Unhealthy"=SE_T001_002, "Fair_Poor_Health"=SE_T002_001, "Low_Birthweight"=SE_T003_001, "Primary_Care_Physicians"=SE_T004_001, "Mental_Health_Providers"=SE_T004_002, "Dentist"=SE_T004_003, "Limited_Access_Doc"=SE_T005_001, "Persons_Without_Insurance_Under_19_Years"=SE_T006_001, "Persons_Without_Insurance_Between_18_64_Years"=SE_T006_002, "Persons_Without_Insurance_Under_65_Years"=SE_T006_003, "Premature_Deaths_under_75_Years"=SE_T007_001, "Potential_Life_Lost_under_75"=SE_T007_002, "Infant_Mortality"=SE_T008_001, "Child_Mortality"=SE_T008_002, "Premature_Age_adjusted_Mortality"=SE_T008_003, "Drug_Poisoning_Mortality"=SE_T008_004, "Diabetics"=SE_T009_001, "Diabetic_Test"=SE_T009_002, "Sexual_Activity_Teen Births"=SE_T010_001, "Chlamydia_Cases"=SE_T010_002, "HIV_Prevalence"=SE_T010_003, "Current_Smokers"=SE_T011_001, "Drinking_Adults"=SE_T011_002, "Limited_Access_HealthyFoods"=SE_T012_001, "Access_Exercise_Opportunities"=SE_T012_002, "Obese_Persons"=SE_T012_003, "Physically_InactivePersons"=SE_T012_004, "Children_Eligible_for_Free_Lunch_under_18"=SE_T012_005, "Food_Environment_Index"=SE_T013_001)%>% select (NYCounty, Physically_Unhealthy, Mentally_Unhealthy, Fair_Poor_Health, Limited_Access_Doc,Diabetics, Diabetic_Test, Current_Smokers, Drinking_Adults, Limited_Access_HealthyFoods, Access_Exercise_Opportunities, Obese_Persons, Physically_InactivePersons, Food_Environment_Index)
head(NYCountyHealthData)
## # A tibble: 6 x 14
##   NYCounty Physically_Unhe… Mentally_Unheal… Fair_Poor_Health
##   <chr>               <dbl>            <dbl>            <dbl>
## 1 Albany …              3.3              3.4             12.2
## 2 Bronx C…              5                4.3             28.7
## 3 Columbi…              3.2              3.5             11.6
## 4 Nassau …              3                2.9             12.1
## 5 New Yor…              3.5              3.5             14.1
## 6 Niagara…              3.4              3.5             12.9
## # … with 10 more variables: Limited_Access_Doc <dbl>, Diabetics <dbl>,
## #   Diabetic_Test <dbl>, Current_Smokers <dbl>, Drinking_Adults <dbl>,
## #   Limited_Access_HealthyFoods <dbl>,
## #   Access_Exercise_Opportunities <dbl>, Obese_Persons <dbl>,
## #   Physically_InactivePersons <dbl>, Food_Environment_Index <dbl>

Generating new Variables: For further analysis and make it simple I have combined Physically and Mentally unhealty condition and took a mean and renamed as Overall_Unhealthy_Condition.

NYCountyHealthData1=NYCountyHealthData%>%
  mutate(Overall_Unhealthy_Condition=(Physically_Unhealthy+Mentally_Unhealthy)/2)
head(NYCountyHealthData1)
## # A tibble: 6 x 15
##   NYCounty Physically_Unhe… Mentally_Unheal… Fair_Poor_Health
##   <chr>               <dbl>            <dbl>            <dbl>
## 1 Albany …              3.3              3.4             12.2
## 2 Bronx C…              5                4.3             28.7
## 3 Columbi…              3.2              3.5             11.6
## 4 Nassau …              3                2.9             12.1
## 5 New Yor…              3.5              3.5             14.1
## 6 Niagara…              3.4              3.5             12.9
## # … with 11 more variables: Limited_Access_Doc <dbl>, Diabetics <dbl>,
## #   Diabetic_Test <dbl>, Current_Smokers <dbl>, Drinking_Adults <dbl>,
## #   Limited_Access_HealthyFoods <dbl>,
## #   Access_Exercise_Opportunities <dbl>, Obese_Persons <dbl>,
## #   Physically_InactivePersons <dbl>, Food_Environment_Index <dbl>,
## #   Overall_Unhealthy_Condition <dbl>

Generating Summary Variables: I have also summarized “Overall_Unhealthy_Condition” to see mean average days per month of selected NY Counties feeling overall mentally and physically unhealthy. The mean calculated below is 3.51 days per month.

summarise(NYCountyHealthData1,MeanUnhealthyCondition=mean(Overall_Unhealthy_Condition))
## # A tibble: 1 x 1
##   MeanUnhealthyCondition
##                    <dbl>
## 1                   3.51

Now I really want to see what factors are impacting (Positive & Negative) NY County resident’s Health. I can analyze it through Correlation graph. My hypothesis is: Smoking has negative impact on health, Obesed person is more likely to have diabates, Physically inactive persons have poor health in terms of both mentally and physically.

Looking for relationships Among various Variables: First, I would like to see a relationship between smoking and health conditions of NY residents who are over the age of 18 years.

library(ggplot2)
ggplot(data =NYCountyHealthData1, aes(Current_Smokers,Overall_Unhealthy_Condition))+geom_point()+labs(title = "Smoking & NY County Resident's health")

cor(NYCountyHealthData1$Current_Smokers,NYCountyHealthData1$Overall_Unhealthy_Condition)
## [1] 0.8006203

Since we can see positive trend in above graph and value is positive as well i.e 0.8, therefore, smokers are in unhealthy condition.

Now I would like to see a relationship between Obesed person and diabates

library(ggplot2)
ggplot(data =NYCountyHealthData1, aes(Diabetics,Obese_Persons))+geom_point()+labs(title = "Relationship between Diatetes and Obesity")

cor(NYCountyHealthData1$Diabetics,NYCountyHealthData1$Obese_Persons)
## [1] 0.3865245

Since value and trend is positive, therefore, obesity causes diabetes.

Now

library(ggplot2)
ggplot(data =NYCountyHealthData1, aes(Diabetics,Physically_InactivePersons))+geom_point()+labs(title = "Relationship between Diatetes and Those who are Physically Inactive")

cor(NYCountyHealthData1$Diabetics,NYCountyHealthData1$Physically_InactivePersons)
## [1] 0.6986125

Since value and trend is positive, therefore, Inactive persons have more likely to be diabetic.

library(ggplot2)
ggplot(data =NYCountyHealthData1, aes(Overall_Unhealthy_Condition,Drinking_Adults))+geom_point()+labs(title = "Relationship between Overall Unhealthy Condition and Drinking Alcohol")

cor(NYCountyHealthData1$Overall_Unhealthy_Condition,NYCountyHealthData1$Drinking_Adults)
## [1] -0.6543259

Since value is negative i.e -0.65 and no positive trend observed, therefore, Alcohol helps person to be in good health.

Graphical View of NY Counties: I am also intertested to see overall unhealthy conditions w.r.t all counties. From below graph we can see County Bronx is on highest level of unhealthy conditions, then comes Queen county and so on. On the least side is nassau county as compared to all counties.

NYCountyHealthData1$NYCounty <-factor(NYCountyHealthData1$NYCounty, levels = NYCountyHealthData1$NYCounty[order(-NYCountyHealthData1$Overall_Unhealthy_Condition)])
 ggplot(data = NYCountyHealthData1, aes(x=NYCounty, y=Overall_Unhealthy_Condition, fill=NYCounty))+
   geom_bar(stat = "identity")+
   xlab("NYCounty")+
   ylab("Unhealthy Condition")+
   theme(axis.text.x = element_blank())+
   theme(legend.position = "right")

Conclusion:

It has been shown from the above analysis that different lifestyle are impacting NY resident’s health conditions. Trends have been seen among smoking, Physically & Mentally health, Diabetics, Alcohol, Obesity, Physically Inactivity. Moreover, it has resulted that average of 3.51 days in a month NY residents feel overall unhealthy. Hence, our lifestyles impact our health. Results also showed that highest unhealthy condistions are found in Bronx county and least is nassua County.