Load the libraries and set the working directory

library(leaflet)
library(sf)
library(tidyverse)
library(tidyr)
setwd("C:\\Users\\Shea\\Documents\\data110\\csvs")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Category == "Prevention") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017)

What variables are included? (can any of them be removed?)

#names(latlong_clean)

Remove the variables that will not be used in the assignment

prevention <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with the cleaned “Prevention” dataset

1. Once you run the above code, filter this dataset one more time for any particular subset.

Find Groupings

#unique(prevention$Measure)

Filter chunk here

insu <- filter(prevention, MeasureId=="ACCESS2" &
                 GeographicLevel=="Census Tract")

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

Add Location Classification

NE=c("PA","NJ","NY","RI","CT","MA","VT","NH","ME")
MW=c("ND","SD","NE","KS","MN","IA","MO","WI","IL","MI","IN","OH")
S=c("TX","OK","AR","LA","MS","AL","TN","KY","WV","DC","MD","DE","VA","NC","SC","GA","FL")
W=c("WA","MT","OR","ID","WY","CA","NV","UT","CO","AZ","NM","AK", "HI")
ins <- mutate(insu, region = case_when(StateAbbr %in% NE ~ "North East", StateAbbr %in% MW ~ "Mid West", StateAbbr %in% S ~ "South", StateAbbr %in% W ~ "West"))

First plot chunk here

ggplot(ins, aes(x=PopulationCount, y=Data_Value, color = region)) +
 geom_point(size = 2, alpha = 0.15) +
 scale_color_viridis_d()+
 facet_wrap(~region) +
 labs(x = "Value Per Capita", y="Population", title = "People Lacking Insurance Percentage by \nPopulation of Their Area and Region of the USA") +
 guides(color = guide_legend(title="Region",override.aes = list(size = 2,alpha = 1))) +
 theme_bw()
## Warning: Removed 762 rows containing missing values (`geom_point()`).

3. Now create a map of your subsetted dataset.

First map chunk here

colnames(ins)[12]="latitude"
colnames(ins)[13]="longitude"

leaflet() |>
 setView(lng = -98.5, lat = 40, zoom = 3) |>
 addProviderTiles("Esri.WorldStreetMap") |>
 addCircles(
 data = ins,
 lng = as.numeric(ins$longitude),
 lat = as.numeric(ins$latitude),
 radius = ins$Data_Value*20
)

4. Refine your map to include a mousover tooltip

Refined map chunk here

popupins <- paste0(
  "<b>Region: </b>", ins$region, "<br>",
  "<b>State: </b>", ins$StateDesc, "<br>",
  "<b>City: </b>", ins$CityName, "<br>",
  "<b>Population of Census Tract: </b>", ins$PopulationCount, "<br>",
  "<b>Percentage of Lack of Health Insurance: </b>", ins$Data_Value, "<br>"
)

leaflet() |>
 setView(lng = -98.5, lat = 40, zoom = 3) |>
 addProviderTiles("Esri.WorldStreetMap") |>
 addCircles(
 data = ins,
 lng = as.numeric(ins$longitude),
 lat = as.numeric(ins$latitude),
 radius = ins$Data_Value*20,
 popup = popupins
)

5. Write a paragraph

In a paragraph, describe the plots you created and what they show.

In the first plot I created, the facet wrap shows the percentage of those without healthcare in a census tract. The sections are separated by region of the US. The reason I chose to look at the census tracts instead of the cities is because i felt it gave a more complete look when looking at the maps. The second plot is a map with the data overlaid by each data point. The circle size shows the percentage of those without Health Insurance. In the second map, I added a popup that shows the region, stats, city, population, and percentage of those who lack health insurance in that census tract. What I noticed from this data is that the South is the only region that has any regions above 50% without insurance and it has a lot of them. Also, the regions is the Mid-West are a lot less populated. Unsurprisingly the Northeast which includes New York and the West which includes California have the census tracts with the highest population.