library(leaflet)
library(sf)
library(tidyverse)
library(tidyr)
setwd("C:\\Users\\Shea\\Documents\\data110\\csvs")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Category == "Prevention") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
#names(latlong_clean)
prevention <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
The new dataset “Prevention” is a manageable dataset now.
Find Groupings
#unique(prevention$Measure)
Filter chunk here
insu <- filter(prevention, MeasureId=="ACCESS2" &
GeographicLevel=="Census Tract")
Add Location Classification
NE=c("PA","NJ","NY","RI","CT","MA","VT","NH","ME")
MW=c("ND","SD","NE","KS","MN","IA","MO","WI","IL","MI","IN","OH")
S=c("TX","OK","AR","LA","MS","AL","TN","KY","WV","DC","MD","DE","VA","NC","SC","GA","FL")
W=c("WA","MT","OR","ID","WY","CA","NV","UT","CO","AZ","NM","AK", "HI")
ins <- mutate(insu, region = case_when(StateAbbr %in% NE ~ "North East", StateAbbr %in% MW ~ "Mid West", StateAbbr %in% S ~ "South", StateAbbr %in% W ~ "West"))
First plot chunk here
ggplot(ins, aes(x=PopulationCount, y=Data_Value, color = region)) +
geom_point(size = 2, alpha = 0.15) +
scale_color_viridis_d()+
facet_wrap(~region) +
labs(x = "Value Per Capita", y="Population", title = "People Lacking Insurance Percentage by \nPopulation of Their Area and Region of the USA") +
guides(color = guide_legend(title="Region",override.aes = list(size = 2,alpha = 1))) +
theme_bw()
## Warning: Removed 762 rows containing missing values (`geom_point()`).
First map chunk here
colnames(ins)[12]="latitude"
colnames(ins)[13]="longitude"
leaflet() |>
setView(lng = -98.5, lat = 40, zoom = 3) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = ins,
lng = as.numeric(ins$longitude),
lat = as.numeric(ins$latitude),
radius = ins$Data_Value*20
)
Refined map chunk here
popupins <- paste0(
"<b>Region: </b>", ins$region, "<br>",
"<b>State: </b>", ins$StateDesc, "<br>",
"<b>City: </b>", ins$CityName, "<br>",
"<b>Population of Census Tract: </b>", ins$PopulationCount, "<br>",
"<b>Percentage of Lack of Health Insurance: </b>", ins$Data_Value, "<br>"
)
leaflet() |>
setView(lng = -98.5, lat = 40, zoom = 3) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = ins,
lng = as.numeric(ins$longitude),
lat = as.numeric(ins$latitude),
radius = ins$Data_Value*20,
popup = popupins
)
In a paragraph, describe the plots you created and what they show.
In the first plot I created, the facet wrap shows the percentage of those without healthcare in a census tract. The sections are separated by region of the US. The reason I chose to look at the census tracts instead of the cities is because i felt it gave a more complete look when looking at the maps. The second plot is a map with the data overlaid by each data point. The circle size shows the percentage of those without Health Insurance. In the second map, I added a popup that shows the region, stats, city, population, and percentage of those who lack health insurance in that census tract. What I noticed from this data is that the South is the only region that has any regions above 50% without insurance and it has a lot of them. Also, the regions is the Mid-West are a lot less populated. Unsurprisingly the Northeast which includes New York and the West which includes California have the census tracts with the highest population.