library(tidyverse)
library(tidyr)
library(leaflet)
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)GIS Assignment
Load libraries and dataset
Split GeoLocation into lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
#head(latlong)Questions
1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.
Filter chunk here (you may need multiple chunks)
latlong_filtered <- latlong |>
filter(Year == 2017) |>
filter(MeasureId == "CSMOKING") |>
filter(StateAbbr %in% c("MD", "VA", "DC"))
head(latlong_filtered)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 VA Virginia Hampton City BRFSS Unhealthy…
2 2017 VA Virginia Roanoke Census Tract BRFSS Unhealthy…
3 2017 DC District of C Washington Census Tract BRFSS Unhealthy…
4 2017 DC District of C Washington Census Tract BRFSS Unhealthy…
5 2017 DC District of C Washington Census Tract BRFSS Unhealthy…
6 2017 DC District of C Washington Census Tract BRFSS Unhealthy…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
dim(latlong_filtered)[1] 906 25
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
ggplot(latlong_filtered, aes(x = PopulationCount, y = Data_Value, color = StateAbbr)) +
geom_point(alpha = 0.5) +
labs(title = "DMV Smoking Prevalence by Small Area Population (2017)",
x = "Population",
y = "% Current Smokers",
caption = "Source: CDC") +
facet_wrap(~StateAbbr) +
xlim(0, 10000) +
theme_bw() +
scale_color_manual(values = c("VA" = "darkred",
"MD" = "darkcyan",
"DC" = "darkviolet")) +
theme(legend.position = "none") #https://www.geeksforgeeks.org/r-language/remove-legend-in-ggplot2-in-r/3. Now create a map of your subsetted dataset.
First map chunk here
#create palette
pal <- colorFactor(palette = c("darkviolet", "darkcyan","darkred"),
levels = c("DC", "MD", "VA"), latlong_filtered$StateAbbr)
#create map
leaflet() |>
setView(lng = -77.5, lat = 38, zoom = 6.5) |>
addProviderTiles("Stadia.AlidadeSmoothDark") |>
addCircles(
data = latlong_filtered,
radius = sqrt(1.333^latlong_filtered$Data_Value)*2,
color = ~pal(latlong_filtered$StateAbbr)
)4. Refine your map to include a mouse-click tooltip
Refined map chunk here
#create tooltip
tooltip <- paste0(
"<b>State: </b>", latlong_filtered$StateAbbr, "<br>",
"<b>Current Smokers: </b>", latlong_filtered$Data_Value, "%<br>",
"<b>Area Population: </b>", latlong_filtered$PopulationCount, "<br>"
)
#apply to map
leaflet() |>
setView(lng = -77.5, lat = 38, zoom = 6.5) |>
addProviderTiles("Stadia.AlidadeSmoothDark") |>
addCircles(
data = latlong_filtered,
radius = sqrt(1.333^latlong_filtered$Data_Value)*2,
color = ~pal(latlong_filtered$StateAbbr),
popup = tooltip
)5. Write a paragraph
In a paragraph, describe the plots you created and the insights they show.
I filtered the cities500 dataset to only include information on smoking in the DMV in 2017. With this filtered dataset, I created a faceted scatterplot of smoking prevalence (%) versus the number of people within the measured areas separated by state. I created this plot to see if the percentage of smokers was related to the number of people in an area, and while the correlation is not very strong it is interesting to see that there tends to be fewer smokers as area population increases in DC and VA. This is less apparent in Maryland, though a weakness in this dataset is that Maryland’s data only contains the city of Baltimore.
I also created a map to visualize this data in the areas being recorded, with color representing the state and the points increasing in size relative to percentage of smokers. The map shows that the highest percentages of smokers tend to be towards the center of major cities, including Baltimore, Richmond, and Norfolk. I expected DC would be the same, however the highest percentages of DC smokers lie in the southeast suburbs. The percentages in the city are relatively low at ~20% rather than the SE’s ~40%.