1 Introduction

For this assignment, I started first by looking a dataset to use and I decided to go with the better Health Dataset in the healthdata.gov website. The purpose of the 500 Cities Project is to provide city- and census tract-level small area estimates for chronic disease risk factors, health outcomes, and clinical preventive service use for the largest 500 cities in the United States. As I will not need the all dataset, I will do some manipulation to it and only include some variable I will need for this assignment.

Data_Health_US <- read.csv("//Users/mikelapika/Desktop/Better_Health__Dataset.csv", header=TRUE)
head(Data_Health_US)

2 Data Manipulation

After I import the dataset and filter it to only include the information for city on the geographical level and I will exclude all census track to lower the size of the datset. I will only focus on Age_Ajusted Prevalence, so I will filter the data by DataValueTypeID Filter the dataset to keep only data related to current lack of health insurance among adult, I will filter it by Short_Question_Text=“Health Insurance”.

library(tidyr)
library(tidyverse)
Data_Health_US <- Data_Health_US %>% filter(DataValueTypeID=="AgeAdjPrv")
Data_Health_Insurance2 <- Data_Health_US %>% filter(Short_Question_Text=="Health Insurance")
head(Data_Health_Insurance2)

Here we will filter the data to create new column for different Data_ValueTypeID_Value so we can calculate the average valuetypid and also for the total of the population whcih will sum the number of population for all cities. I will also do some data aggregation and mutate so I can have a final dataset with only state with average low and high confidence limit, populationcount, state name, datasource, category, and geolocation.

library(dplyr)
library(data.table)
library(dplyr)

Data_test <- Data_Health_Insurance2 %>% select(StateAbbr,StateDesc,DataSource,Category,
Measure,Data_Value_Type,Data_Value,Low_Confidence_Limit,High_Confidence_Limit,PopulationCount,GeoLocation,CategoryID) 
Data_test <- na.omit(Data_test)
sv <- aggregate(PopulationCount~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Data_test, FUN=sum) 
sv2 <- aggregate(Data_Value~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Data_test, FUN=mean) 
sv3 <- aggregate(Low_Confidence_Limit~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Data_test, FUN=mean) 
sv4 <- aggregate(High_Confidence_Limit~StateAbbr+StateDesc+DataSource+Category+Measure+Data_Value_Type, data=Data_test, FUN=mean) 
sv5 <- Data_test %>% distinct(StateDesc, .keep_all = TRUE)
sv5 <- sv5 %>% select(StateAbbr,StateDesc,DataSource,Category,Measure,Data_Value_Type,GeoLocation) 
sv5 <- na.omit(sv5)
sv5 <- sv5[order(sv5$StateDesc),]
mydata <- cbind(sv,sv2,sv3,sv4,sv5)
mydata2 <- mydata[ -c(8:13,15:20,22:27,29:34) ]
mydata2 <- mydata2 %>% mutate_if(is.numeric, ~round(., 1))
head(mydata2)

As I am moving forward, you can see that I drop a lot of variable in this dataset as I wanted to reduce the size of the dataset and select only variable needed for this analysis and I will drop all unwanted column.

3 Choropleth

I download and installed the shapefile from the United Census Bureau so I can be able to read the US state boundaries shapefile I will use to plot the choropleth. I started first by setting the work directory to a specific place to help me read exactly the file I wanted and avoid reading another shapefiles as I played around with multiple others.

library(rgdal)
setwd("/Users/mikelapika/Desktop/")
State_boundaries <- readOGR('/Users/mikelapika/Desktop/CB_2017_us_state_500k/cb_2017_us_state_500k.shp')
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/mikelapika/Desktop/CB_2017_us_state_500k/cb_2017_us_state_500k.shp", layer: "cb_2017_us_state_500k"
## with 56 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER

3.1 Merging Shapefile and Dataset

Here, I will get the final dataset I will use in my future choropleth. After I did all the data manipulation, I will merge the dataset with polygons data to have a polygons where all values are included and also the name of state. I will have here a LargeSpatialPolygon, where I will merge the dataset here and I will use it to map.

Joint_Data <- full_join(State_boundaries@data, mydata2,  by = c("STUSPS" = "StateAbbr"))
tmp.l2 <- na.omit(Joint_Data)
test <- tmp.l2 %>% distinct(STATENS, .keep_all = TRUE)
head(test)

3.2 Map created with Merge Dataset

I will display differents useful plots with the merged dataset and I will add a popup to display information needed respectively for a specific leaflet such as name of the state, population, source of data, measure uses for this dataset, the category of this measure, low and high confidence limit and age-adjusted prevalence.

3.2.1 Leaflet by Population

Here I will draw leaflet with polygons and I will add a popup which will display some usefull informations about the dataset. When you click on a specific state, it shows the population count used for this measure, the measure uses for this data, it also tell you about the source of the data, the category of this measure, the data value type with a specific value related to it, and it will also give you importatant information about Age Adjusted prevalence.

library(leaflet)
library(RColorBrewer)
library(classInt)
breaks_qt <- classIntervals(test$PopulationCount, n = 5, style = "quantile")
pal <- colorQuantile("PiYG", test$PopulationCount, n = 5)
labels <- sprintf("<strong>%s</strong>",State_boundaries$NAME) %>% lapply(htmltools::HTML)
 leaflet() %>% 
  setView(-96, 37.8, 4) %>% addProviderTiles("OpenStreetMap") %>% 
  addPolygons(data = State_boundaries,fillColor = ~pal(test$PopulationCount),weight = 2,opacity = 1,color = "white",dashArray = "3",fillOpacity = 0.7,
  highlight = highlightOptions( weight = 5,color = "#666",dashArray = "",fillOpacity = 0.8,bringToFront = TRUE),
  label = labels,labelOptions = labelOptions( style = list("font-weight" = "normal", padding = "3px 8px"),textsize = "15px",direction = "auto"),  popup = ~paste("<strong>State: </strong>",NAME, paste("<br><strong>Measure: </strong>", test$Measure), paste("<br><strong>Data Source: </strong>", test$DataSource),paste("<br><strong>Category: </strong>", test$Category),paste("<br><strong>Population (Measure): </strong>", test$PopulationCount), paste("<br><strong>Data Value Type : </strong>", test$Data_Value_Type),paste("<br><strong>Age-Adjusted Prevalence % : </strong>", test$Data_Value) ))%>%
  addLegend(colors = brewer.pal(5,"PiYG"), labels = paste0("Up to ", as.character(round(breaks_qt$brks[-1]))), title = "Population Count",position = "bottomright") 

3.3 Leaflet by Different Other Attributes

On this plots, I pretty much did the same things, the only difference here is just the result in the popup and the legend, I made 3 different plots and they are respectively for high confidence limit, low confidence limit and Age-Adjusted Prevalence. The popup gives you information regarding the attributes and the legend matches what I would like to display in each plot.

3.3.1 High Confidence Limit

library(RColorBrewer)
library(classInt)
mybins <- c(0, 10, 20, 50, 100, 200, 500, Inf)
breaks_qt2 <- classIntervals(test$High_Confidence_Limit, n = 4, style = "quantile")
pal2 <- colorQuantile("Reds", test$High_Confidence_Limit, n = 4)
labels <- sprintf("<strong>%s</strong>",State_boundaries$NAME) %>% lapply(htmltools::HTML)
 leaflet() %>% 
  setView(-96, 37.8, 4) %>% addProviderTiles("OpenStreetMap") %>% 
  addPolygons(data = State_boundaries,fillColor = ~pal2(test$High_Confidence_Limit),weight = 2,opacity = 1,color = "white",dashArray = "3",fillOpacity = 0.7,
  highlight = highlightOptions( weight = 5,color = "#666",dashArray = "",fillOpacity = 0.7,bringToFront = TRUE),
  label = labels,labelOptions = labelOptions( style = list("font-weight" = "normal", padding = "3px 8px"),textsize = "15px",direction = "auto"),  popup = ~paste("<strong>State: </strong>",NAME,paste("<br><strong>Measure: </strong>", test$Measure), paste("<br><strong>Data Source: </strong>", test$DataSource),paste("<br><strong>Category: </strong>", test$Category),paste("<br><strong>High Confidence Limit % : </strong>", test$High_Confidence_Limit)))%>% 
   addLegend(colors = brewer.pal(4,"Reds"), labels = paste0("Up to ", as.character(round(breaks_qt2$brks[-1]))), title = "High Confidence Limit %",position = "bottomright") 

3.3.2 Low Confidence Limit

library(RColorBrewer)
library(classInt)
mybins <- c(0, 10, 20, 50, 100, 200, 500, Inf)
breaks_qt3 <- classIntervals(test$Low_Confidence_Limit, n = 4, style = "quantile")
pal3 <- colorQuantile("Dark2", test$Low_Confidence_Limit, n = 4)
labels <- sprintf( "<strong>%s</strong>",State_boundaries$NAME) %>% lapply(htmltools::HTML)
leaflet() %>% 
  setView(-96, 37.8, 4) %>% addProviderTiles("OpenStreetMap") %>% 
  addPolygons(data = State_boundaries,fillColor = ~pal3(test$Low_Confidence_Limit),weight = 2,opacity = 1,color = "white",dashArray = "3",fillOpacity = 0.7,highlight = highlightOptions( weight = 5,color = "#666",dashArray = "",fillOpacity = 0.7,bringToFront = TRUE),
  label = labels,labelOptions = labelOptions( style = list("font-weight" = "normal", padding = "3px 8px"),textsize = "15px",direction = "auto"),  popup = ~paste("<strong>State: </strong>",NAME,paste("<br><strong>Measure: </strong>", test$Measure),paste("<br><strong>Data Source: </strong>", test$DataSource),paste("<br><strong>Category: </strong>", test$Category),paste("<br><strong>Low Confidence Limit % : </strong>", test$Low_Confidence_Limit)))%>%
addLegend(colors = brewer.pal(4,"Dark2"), labels = paste0("Up to ", as.character(round(breaks_qt3$brks[-1]))),title = "Low Confidence Limit %",position = "bottomright") 

3.3.3 Age-Adjusted Prevalence

library(RColorBrewer)
library(classInt)
mybins <- c(0, 10, 20, 50, 100, 200, 500, Inf)
breaks_qt4 <- classIntervals(test$Data_Value, n = 4, style = "quantile")
pal4 <- colorQuantile("Set1", test$Data_Value, n = 4)
labels <- sprintf("<strong>%s</strong>",State_boundaries$NAME) %>% lapply(htmltools::HTML)
leaflet() %>% 
 setView(-96, 37.8, 4) %>% addProviderTiles("OpenStreetMap") %>% addPolygons(data = State_boundaries,fillColor = ~pal4(test$Data_Value),weight = 2,opacity = 1,color = "white",dashArray = "3",fillOpacity = 0.7,highlight = highlightOptions( weight = 5,color = "#666",dashArray = "",fillOpacity = 0.7,bringToFront = TRUE),
 label = labels,labelOptions = labelOptions( style = list("font-weight" = "normal", padding = "3px 8px"),textsize = "15px",direction = "auto"),  popup = ~paste("<strong>State: </strong>",NAME,paste("<br><strong>Measure: </strong>", test$Measure),paste("<br><strong>Data Source: </strong>", test$DataSource),paste("<br><strong>Category: </strong>", test$Category),paste("<br><strong>Data Value Type : </strong>", test$Data_Value_Type),paste("<br><strong>Age-Adjusted Prevalence % : </strong>", test$Data_Value) ))%>%
addLegend(colors = brewer.pal(4,"Set1"), labels = paste0("Up to ", as.character(round(breaks_qt4$brks[-1]))),title = "Age-Adjusted Prevalence %",position = "bottomright")