** I decided to work with 500 Cities: Local Data for Better Health, 2021 and 2019 release.This is the complete dataset for the 500 Cities project model-based small area estimates for 27 measures of chronic disease related to unhealthy behaviors (5), health outcomes (13), and use of preventive services (9). Data were provided by the Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch. The 2021 dataset includes 2018 and 2019 year. And 2019 release has data from 2016 and 2017. **
Load Library
library(readr)
library(tidyr) #To clean data
library(readxl) #To import data
library(tidyverse) #To manipulate data
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v dplyr 1.0.7
## v tibble 3.1.6 v stringr 1.4.0
## v purrr 0.3.4 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(maps) #To get the U.S. map
## Warning: package 'maps' was built under R version 4.1.3
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
library(mapdata) #To be able to map latitude and longitude on the map graph
## Warning: package 'mapdata' was built under R version 4.1.3
library(ggmap) #Another package to create the map
## Warning: package 'ggmap' was built under R version 4.1.3
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(dplyr) #To manipulate data
library(ggplot2) #To create graphs
library(DT) #To create tables
## Warning: package 'DT' was built under R version 4.1.3
library(ggcorrplot) #To create the correlation plot
## Warning: package 'ggcorrplot' was built under R version 4.1.3
library(outliers) #To find outliers in the data
## Warning: package 'outliers' was built under R version 4.1.3
Data Cleaning
** At first, I imported the dataset. Then I filter to include the information for city on the geographic level and exclude all census tract to lower the size of dataset. I focussed on Age_Adjusted Prevalence, so I filtered the data by Data ValueTypeID filter to work with Mental Health among adult whose mental health not god for 14 or more than 14 days. I will filter it by Measure = “Mental health not good for >=14 days among adults aged >=18 Years” **
Data_2021 <- read.csv("C:/Users/shahi/Desktop/blogpost/PLACES__Local_Data_for_Better_Health__County_Data_2021_release.csv")
Filter_Data_2021 <- Data_2021
Filter_Data_2021 <- Filter_Data_2021 %>% filter(DataValueTypeID=="AgeAdjPrv")
Data_Mental_Health_2021 <- Filter_Data_2021 %>% filter(Measure=="Mental health not good for >=14 days among adults aged >=18 years")
head(Data_Mental_Health_2021)
** Here I filter the data to create new column for different Data_ValueTypeID_Value so we can calculate the average valuetypid and also for the total of the population whcih will sum the number of population for all cities. I will also do some data aggregation and mutate so I can have a final dataset with only state with TotalPopulation, state name, datasource, category, and geolocation. **
Filter_Data_2021_2 <- Data_Mental_Health_2021 %>% select(StateAbbr,StateDesc,DataSource,Category,
Measure,Data_Value_Type,Data_Value,TotalPopulation,Geolocation,CategoryID)
Filter_Data_2021_2 <- na.omit(Filter_Data_2021_2)
sv_21 <- aggregate(TotalPopulation~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Filter_Data_2021_2, FUN=sum)
sv2_21 <- aggregate(Data_Value~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Filter_Data_2021_2, FUN=mean)
sv5_21 <- Filter_Data_2021_2 %>% distinct(StateDesc, .keep_all = TRUE)
sv5_21 <- sv5_21 %>% select(StateAbbr,StateDesc,DataSource,Category,Measure,Data_Value_Type,Geolocation)
sv5_21 <- na.omit(sv5_21)
sv5_21 <- sv5_21[order(sv5_21$StateDesc),]
combined_data_21 <- cbind(sv_21,sv2_21,sv5_21)
combined_data2_21 <- combined_data_21[ -c(8:13,15:20,22:27,29:34) ]
combined_data2_21 <- combined_data2_21 %>% mutate_if(is.numeric, ~round(., 1))
head(combined_data2_21)
** I have already dropped a lot of unwanted variables from the dataset and keep only those variable that I need for analysis. **
** I downloaeded the shapefile from the United Census Bureau to read the US state boundaries shapefile and to plot the choropleth **
Drawing Plot
library(rgdal)
## Warning: package 'rgdal' was built under R version 4.1.3
## Loading required package: sp
## Please note that rgdal will be retired by the end of 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
##
## rgdal: version: 1.5-30, (SVN revision 1171)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.4.1, released 2021/12/27
## Path to GDAL shared files: C:/Users/shahi/Documents/R/win-library/4.1/rgdal/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/shahi/Documents/R/win-library/4.1/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-6
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
setwd("C:/Users/shahi/Desktop/blogpost")
State_boundaries_21 <- readOGR('C:/Users/shahi/Desktop/blogpost/cb_2017_us_state_500k.shp')
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\shahi\Desktop\blogpost\cb_2017_us_state_500k.shp", layer: "cb_2017_us_state_500k"
## with 56 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
** Now, I will organize the final dataset that I will use in my choropleth. After I did all the data manipulation, I will merge the dataset with polygons data to have a polygons where all values are included and also the name of state. I will have here a LargeSpatialPolygon, where I will merge the dataset here and I will use it to map. **
Joint_Data_21 <- full_join(State_boundaries_21@data, combined_data2_21, by = c("STUSPS" = "StateAbbr"))
tmp.l2 <- na.omit(Joint_Data_21)
test_21 <- tmp.l2 %>% distinct(STATENS, .keep_all = TRUE)
head(test_21)
** Here I will draw plots for specific leaflet such as name of the state, population, source of data, measure and measure category,age-adjusted prevalence.I also added popup display which will show information about specific dataset. When clicked on the specific state, it will show the population count and other information. **
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.1.3
library(RColorBrewer)
library(classInt)
bins <- c(1000000, 300000, 500000, 900000, 2000000, 3000000, 4000000, 50000000, 60000000, 70000000, Inf)
#breaks_qt <- classIntervals(test_21$TotalPopulation, n = 5, style = "quantile")
pal <- colorBin("YlOrRd", domain = test_21$TotalPopulation, bins = bins)
labels <- sprintf("<strong>%s</strong>",State_boundaries_21$NAME) %>% lapply(htmltools::HTML)
leaflet(State_boundaries_21) %>%
setView(-96, 37.8, 4) %>%
addProviderTiles("OpenStreetMap") %>%
addPolygons(data = State_boundaries_21,
fillColor = ~pal(test_21$TotalPopulation),
weight = 2,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7,
highlightOptions = highlightOptions(
weight = 5,
color = "#666",
dashArray = "",
fillOpacity = 0.7,
bringToFront = TRUE),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"),
popup = ~paste("<strong>State: </strong>",NAME, paste("<br><strong>Measure: </strong>", test_21$Measure), paste("<br><strong>Data Source: </strong>", test_21$DataSource),paste("<br><strong>Category: </strong>", test_21$Category),paste("<br><strong>Population (Measure): </strong>", test_21$TotalPopulation), paste("<br><strong>Data Value Type : </strong>", test_21$Data_Value_Type),paste("<br><strong>Age-Adjusted Prevalence % : </strong>", test_21$Data_Value))) %>%
addLegend(pal = pal, values = ~test_21$TotalPopulation, opacity = 0.7, title = "Population Count",
position = "bottomright")
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette YlOrRd is 9
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette YlOrRd is 9
## Returning the palette you asked for with that many colors
** Among them Florida and West Virginia have the highest prevalence rate (28.8%), Ohio – 27.8%, North Dakota – 27%, Nevada (25.4%), Wyoming (25.3%), California (25.1%), Wisconsin (24.8%), Mississippi (24.7%), Oregon (24.6%) in 2018-2019. Comparing to 2016-2017, the states North Carolina and Utah (16.8%) which had the highest prevalence rate in USA is replaced by Florida and West Virginia (28,8%). The overall age-adjusted prevalence rate is increasing rapidly, and mental health condition has also changed in these states. North Dakota, Wyoming, California, Nevada, Wisconsin, Ohio, Tennessee has added newly in high prevalence rate of poor mental health. So, there may be some reasons behind this poor mental health outcome. **
Analysis of 2019 Data
Data <- read.csv("C:/Users/shahi/Desktop/blogpost/500_Cities__Local_Data_for_Better_Health__2019_release.csv")
Filter_Data <- Data
Filter_Data <- Filter_Data %>% filter(DataValueTypeID=="AgeAdjPrv")
Data_Mental_Health <- Filter_Data %>% filter(Measure=="Mental health not good for >=14 days among adults aged >=18 Years")
head(Data_Mental_Health)
Filter_Data_2 <- Data_Mental_Health %>% select(StateAbbr,StateDesc,DataSource,Category,
Measure,Data_Value_Type,Data_Value,PopulationCount,GeoLocation,CategoryID)
Filter_Data_2 <- na.omit(Filter_Data_2)
sv <- aggregate(PopulationCount~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Filter_Data_2, FUN=sum)
sv2 <- aggregate(Data_Value~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Filter_Data_2, FUN=mean)
sv5 <- Filter_Data_2 %>% distinct(StateDesc, .keep_all = TRUE)
sv5 <- sv5 %>% select(StateAbbr,StateDesc,DataSource,Category,Measure,Data_Value_Type,GeoLocation)
sv5 <- na.omit(sv5)
sv5 <- sv5[order(sv5$StateDesc),]
combined_data <- cbind(sv,sv2,sv5)
combined_data2 <- combined_data[ -c(8:13,15:20,22:27,29:34) ]
combined_data2 <- combined_data2 %>% mutate_if(is.numeric, ~round(., 1))
head(combined_data2)
library(rgdal)
setwd("C:/Users/shahi/Desktop/blogpost")
State_boundaries <- readOGR('C:/Users/shahi/Desktop/blogpost/cb_2017_us_state_500k.shp')
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\shahi\Desktop\blogpost\cb_2017_us_state_500k.shp", layer: "cb_2017_us_state_500k"
## with 56 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
Joint_Data <- full_join(State_boundaries@data, combined_data2, by = c("STUSPS" = "StateAbbr"))
tmp.l2 <- na.omit(Joint_Data)
test <- tmp.l2 %>% distinct(STATENS, .keep_all = TRUE)
head(test)
library(leaflet)
library(RColorBrewer)
library(classInt)
bin19 <- c(10000, 100000, 200000, 400000, 900000, 1000000, 3000000, 4000000, 50000000, 60000000, Inf)
#breaks_qt <- classIntervals(test_21$TotalPopulation, n = 5, style = "quantile")
pal <- colorBin("YlOrRd", domain = test$PopulationCount, bins = bin19)
labels <- sprintf("<strong>%s</strong>",State_boundaries$NAME) %>% lapply(htmltools::HTML)
leaflet(State_boundaries) %>%
setView(-96, 37.8, 4) %>%
addProviderTiles("OpenStreetMap") %>%
addPolygons(data = State_boundaries,
fillColor = ~pal(test$PopulationCount),
weight = 2,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7,
highlightOptions = highlightOptions(
weight = 5,
color = "#666",
dashArray = "",
fillOpacity = 0.7,
bringToFront = TRUE),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"),
popup = ~paste("<strong>State: </strong>",NAME, paste("<br><strong>Measure: </strong>", test$Measure), paste("<br><strong>Data Source: </strong>", test$DataSource),paste("<br><strong>Category: </strong>", test$Category),paste("<br><strong>Population (Measure): </strong>", test$PopulationCount), paste("<br><strong>Data Value Type : </strong>", test$Data_Value_Type),paste("<br><strong>Age-Adjusted Prevalence % : </strong>", test$Data_Value))) %>%
addLegend(pal = pal, values = ~test$PopulationCount, opacity = 0.7, title = "Population Count",
position = "bottomright")
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette YlOrRd is 9
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(max(3, n), palette): n too large, allowed maximum for palette YlOrRd is 9
## Returning the palette you asked for with that many colors
** I found North Carolina & Utah (16.8%), Pennsylvania & West Virginia (16.5%), Washington (16.2%), New Jersey(15.9%), Alabama & Maryland (15.7%), Arizona (15.5%), Mississippi (15.3%), and Texas (15%) have high age-adjusted prevalence rate for mental health than other states of USA in 2016-2017 period. **
REPORT
** In this project, I wanted to know how people are suffering from mental stress in different states of USA. I got results for all the states. I used 2021 release and 2019 release to observe the differences of mental stress over the year. The percentage of depression is changing over the time in different states. For example, in TEXAS, the population count was 1836343 and 15% of them were suffering from mental stress. However, in 2021 release, the total population count was 19453561 and among them 14.7% are suffering from mental stress. So, the mental stress rate is increasing but varies from state to state. **
** Limitation: I wanted to observe geographic hotspot of mental stress before pandemic, during pandemic. But due to unavailability of the recent data, I could not do the analysis. I could not control the analysis for SES variable. **
** Future Direction: After publishing data for 2020-2022, I will compare the pre-pandemic and pandemic period mental stress over the states, and if possible to cities. Also, I want to know how Pandemic had an effect geographically over the United States. **
---
title:  "blog post 4"
author: "Mahmuda Sultana"
date:   "5/4/2022"
output: 
   html_document:
     df_print: paged
     fig_height: 7
     fig_width: 7
     toc: yes
     toc_float: yes
     code_download: true
---


** I decided to work with 500 Cities: Local Data for Better Health, 2021 and 2019 release.This is the complete dataset for the 500 Cities project model-based small area estimates for 27 measures of chronic disease related to unhealthy behaviors (5), health outcomes (13), and use of preventive services (9). Data were provided by the Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch. The 2021 dataset includes 2018 and 2019 year. And 2019 release has data from 2016 and 2017. **


### Load Library

```{r}
library(readr)
library(tidyr) #To clean data
library(readxl) #To import data
library(tidyverse) #To manipulate data
library(maps) #To get the U.S. map
library(mapdata) #To be able to map latitude and longitude on the map graph
library(ggmap) #Another package to create the map
library(dplyr) #To manipulate data
library(ggplot2) #To create graphs
library(DT) #To create tables
library(ggcorrplot) #To create the correlation plot
library(outliers) #To find outliers in the data
```


### Data Cleaning

** At first, I imported the dataset. Then I filter to include the information for city on the geographic level and exclude all census tract to lower the size of dataset. I focussed on Age_Adjusted Prevalence, so I filtered the data by Data ValueTypeID filter to work with Mental Health among adult whose mental health not god for 14 or more than 14 days. I will filter it by Measure = "Mental health not good for >=14 days among adults aged >=18 Years" **

```{r}
Data_2021 <- read.csv("C:/Users/shahi/Desktop/blogpost/PLACES__Local_Data_for_Better_Health__County_Data_2021_release.csv")
```


```{r}
Filter_Data_2021 <- Data_2021
Filter_Data_2021 <- Filter_Data_2021 %>% filter(DataValueTypeID=="AgeAdjPrv")
Data_Mental_Health_2021 <- Filter_Data_2021 %>% filter(Measure=="Mental health not good for >=14 days among adults aged >=18 years")

head(Data_Mental_Health_2021)
```


** Here I filter the data to create new column for different Data_ValueTypeID_Value so we can calculate the average valuetypid and also for the total of the population whcih will sum the number of population for all cities. I will also do some data aggregation and mutate so I can have a final dataset with only state with TotalPopulation, state name, datasource, category, and geolocation. **


```{r}

Filter_Data_2021_2 <- Data_Mental_Health_2021 %>% select(StateAbbr,StateDesc,DataSource,Category,
Measure,Data_Value_Type,Data_Value,TotalPopulation,Geolocation,CategoryID) 
Filter_Data_2021_2 <- na.omit(Filter_Data_2021_2)

sv_21 <- aggregate(TotalPopulation~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Filter_Data_2021_2, FUN=sum)

sv2_21 <- aggregate(Data_Value~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Filter_Data_2021_2, FUN=mean)


sv5_21 <- Filter_Data_2021_2 %>% distinct(StateDesc, .keep_all = TRUE)

sv5_21 <- sv5_21 %>% select(StateAbbr,StateDesc,DataSource,Category,Measure,Data_Value_Type,Geolocation) 

sv5_21 <- na.omit(sv5_21)

sv5_21 <- sv5_21[order(sv5_21$StateDesc),]

combined_data_21 <- cbind(sv_21,sv2_21,sv5_21)
combined_data2_21 <- combined_data_21[ -c(8:13,15:20,22:27,29:34) ]
combined_data2_21 <- combined_data2_21 %>% mutate_if(is.numeric, ~round(., 1))
head(combined_data2_21)

```



** I have already dropped a lot of unwanted variables from the dataset and keep only those variable that I need for analysis. **


** I downloaeded the shapefile from the United Census Bureau to read the US state boundaries shapefile and to plot the choropleth **


### Drawing Plot

```{r}
library(rgdal)
setwd("C:/Users/shahi/Desktop/blogpost")
State_boundaries_21 <- readOGR('C:/Users/shahi/Desktop/blogpost/cb_2017_us_state_500k.shp')

```




** Now, I will organize the final dataset that I will use in my choropleth. After I did all the data manipulation, I will merge the dataset with polygons data to have a polygons where all values are included and also the name of state. I will have here a LargeSpatialPolygon, where I will merge the dataset here and I will use it to map. **


```{r}

Joint_Data_21 <- full_join(State_boundaries_21@data, combined_data2_21,  by = c("STUSPS" = "StateAbbr"))
tmp.l2 <- na.omit(Joint_Data_21)
test_21 <- tmp.l2 %>% distinct(STATENS, .keep_all = TRUE)
head(test_21)

```


** Here I will draw plots for specific leaflet such as name of the state, population, source of data, measure and measure category,age-adjusted prevalence.I also added popup display which will show information about specific dataset. When clicked on the specific state, it will show the population count and other information. **


```{r}

library(leaflet)
library(RColorBrewer)
library(classInt)

bins <- c(1000000, 300000, 500000, 900000, 2000000, 3000000, 4000000, 50000000, 60000000, 70000000, Inf)
#breaks_qt <- classIntervals(test_21$TotalPopulation, n = 5, style = "quantile")
pal <- colorBin("YlOrRd", domain = test_21$TotalPopulation, bins = bins)

labels <- sprintf("<strong>%s</strong>",State_boundaries_21$NAME) %>% lapply(htmltools::HTML)

 leaflet(State_boundaries_21) %>% 
  setView(-96, 37.8, 4) %>% 
  addProviderTiles("OpenStreetMap") %>%
  addPolygons(data = State_boundaries_21,
    fillColor = ~pal(test_21$TotalPopulation),
    weight = 2,
    opacity = 1,
    color = "white",
    dashArray = "3",
    fillOpacity = 0.7,
    
    highlightOptions = highlightOptions(
      weight = 5,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.7,
      bringToFront = TRUE),
    label = labels,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "15px",
      direction = "auto"),
    popup = ~paste("<strong>State: </strong>",NAME, paste("<br><strong>Measure: </strong>", test_21$Measure), paste("<br><strong>Data Source: </strong>", test_21$DataSource),paste("<br><strong>Category: </strong>", test_21$Category),paste("<br><strong>Population (Measure): </strong>", test_21$TotalPopulation), paste("<br><strong>Data Value Type : </strong>", test_21$Data_Value_Type),paste("<br><strong>Age-Adjusted Prevalence % : </strong>", test_21$Data_Value))) %>%
  addLegend(pal = pal, values = ~test_21$TotalPopulation, opacity = 0.7, title = "Population Count",
    position = "bottomright")

```


** Among them Florida and West Virginia have the highest prevalence rate (28.8%), Ohio – 27.8%, North Dakota – 27%, Nevada (25.4%), Wyoming (25.3%), California (25.1%), Wisconsin (24.8%), Mississippi (24.7%), Oregon (24.6%) in 2018-2019. Comparing to 2016-2017, the states North Carolina and Utah (16.8%) which had the highest prevalence rate in USA is replaced by Florida and West Virginia (28,8%). The overall age-adjusted prevalence rate is increasing rapidly, and mental health condition has also changed in these states. North Dakota, Wyoming, California, Nevada, Wisconsin, Ohio, Tennessee has added newly in high prevalence rate of poor mental health. So, there may be some reasons behind this poor mental health outcome. **






#### Analysis of 2019 Data


```{r}
Data <- read.csv("C:/Users/shahi/Desktop/blogpost/500_Cities__Local_Data_for_Better_Health__2019_release.csv")
```


```{r}
Filter_Data <- Data
Filter_Data <- Filter_Data %>% filter(DataValueTypeID=="AgeAdjPrv")
Data_Mental_Health <- Filter_Data %>% filter(Measure=="Mental health not good for >=14 days among adults aged >=18 Years")

head(Data_Mental_Health)

```




```{r}

Filter_Data_2 <- Data_Mental_Health %>% select(StateAbbr,StateDesc,DataSource,Category,
Measure,Data_Value_Type,Data_Value,PopulationCount,GeoLocation,CategoryID) 
Filter_Data_2 <- na.omit(Filter_Data_2)

sv <- aggregate(PopulationCount~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Filter_Data_2, FUN=sum)

sv2 <- aggregate(Data_Value~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Filter_Data_2, FUN=mean)


sv5 <- Filter_Data_2 %>% distinct(StateDesc, .keep_all = TRUE)

sv5 <- sv5 %>% select(StateAbbr,StateDesc,DataSource,Category,Measure,Data_Value_Type,GeoLocation) 

sv5 <- na.omit(sv5)

sv5 <- sv5[order(sv5$StateDesc),]

combined_data <- cbind(sv,sv2,sv5)
combined_data2 <- combined_data[ -c(8:13,15:20,22:27,29:34) ]
combined_data2 <- combined_data2 %>% mutate_if(is.numeric, ~round(., 1))
head(combined_data2)

```




```{r}
library(rgdal)
setwd("C:/Users/shahi/Desktop/blogpost")
State_boundaries <- readOGR('C:/Users/shahi/Desktop/blogpost/cb_2017_us_state_500k.shp')

```





```{r}

Joint_Data <- full_join(State_boundaries@data, combined_data2,  by = c("STUSPS" = "StateAbbr"))
tmp.l2 <- na.omit(Joint_Data)
test <- tmp.l2 %>% distinct(STATENS, .keep_all = TRUE)
head(test)

```




```{r}

library(leaflet)
library(RColorBrewer)
library(classInt)

bin19 <- c(10000, 100000, 200000, 400000, 900000, 1000000, 3000000, 4000000, 50000000, 60000000, Inf)
#breaks_qt <- classIntervals(test_21$TotalPopulation, n = 5, style = "quantile")
pal <- colorBin("YlOrRd", domain = test$PopulationCount, bins = bin19)

labels <- sprintf("<strong>%s</strong>",State_boundaries$NAME) %>% lapply(htmltools::HTML)

 leaflet(State_boundaries) %>% 
  setView(-96, 37.8, 4) %>% 
  addProviderTiles("OpenStreetMap") %>%
  addPolygons(data = State_boundaries,
    fillColor = ~pal(test$PopulationCount),
    weight = 2,
    opacity = 1,
    color = "white",
    dashArray = "3",
    fillOpacity = 0.7,
    
    highlightOptions = highlightOptions(
      weight = 5,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.7,
      bringToFront = TRUE),
    label = labels,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      textsize = "15px",
      direction = "auto"),
    popup = ~paste("<strong>State: </strong>",NAME, paste("<br><strong>Measure: </strong>", test$Measure), paste("<br><strong>Data Source: </strong>", test$DataSource),paste("<br><strong>Category: </strong>", test$Category),paste("<br><strong>Population (Measure): </strong>", test$PopulationCount), paste("<br><strong>Data Value Type : </strong>", test$Data_Value_Type),paste("<br><strong>Age-Adjusted Prevalence % : </strong>", test$Data_Value))) %>%
  addLegend(pal = pal, values = ~test$PopulationCount, opacity = 0.7, title = "Population Count",
    position = "bottomright")


```



** I found North Carolina & Utah (16.8%), Pennsylvania & West Virginia (16.5%), Washington (16.2%), New Jersey(15.9%), Alabama & Maryland (15.7%), Arizona (15.5%), Mississippi (15.3%), and Texas (15%) have high age-adjusted prevalence rate for mental health than other states of USA in 2016-2017 period. **










###### REPORT  


** In this project, I wanted to know how people are suffering from mental stress in different states of USA. I got results for all the states. I used 2021 release and 2019 release to observe the differences of mental stress over the year. The percentage of depression is changing over the time in different states. For example, in TEXAS, the population count was 1836343 and 15% of them were suffering from mental stress. However, in 2021 release, the total population count was 19453561 and among them 14.7% are suffering from mental stress. So, the mental stress rate is increasing but varies from state to state.  **


** Limitation: I wanted to observe geographic hotspot of mental stress before pandemic, during pandemic. But due to unavailability of the recent data, I could not do the analysis. I could not control the analysis for SES variable. **


** Future Direction: After publishing data for 2020-2022, I will compare the pre-pandemic and pandemic period mental stress over the states, and if possible to cities. Also, I want to know how Pandemic had an effect geographically over the United States. **






