library(tidyverse)
library(tidyr)
library(leaflet)
setwd("~/Desktop/Desktop - Jackie’s MacBook Pro/DATA 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.
1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.
Filter chunk here (you may need multiple chunks)
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Year == 2017) |> # I did 2017 because any other year had little to no info
filter(StateAbbr == "NY") |>
filter(Category == "Health Outcomes") |>
filter(GeographicLevel == "Census Tract") |>
filter(CityName == "Buffalo" | # Filtering the top 3 cities in data except NYC based on pop
CityName == "Yonkers" |
CityName == "Rochester" ) |>
filter(Short_Question_Text == "Coronary Heart Disease" | # Filtering for extreme diseases
Short_Question_Text == "Cancer (except skin)" |
Short_Question_Text == "Chronic Kidney Disease"|
Short_Question_Text == "Stroke") new_york <- latlong_clean |> # Kept this code because I found to useful to remove these columns
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
# non map plot
ggplot(new_york, aes(x=CityName, y=Data_Value, fill = Short_Question_Text)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ Short_Question_Text) +
labs(title = "Chronic Diseases' Prevalence by 3 NY's Cities (2017)",
caption = "Source: 500 Cities Project by CDC",
x = "City",
y = "Prevalence Value",
fill = "Disease") +
scale_fill_manual(values = c("#fb5607", "#ff006e","#8338ec", "#3a86ff")) +
theme_sub_axis_bottom()Warning: Removed 16 rows containing missing values or values outside the scale range
(`geom_bar()`).
3. Now create a map of your subsetted dataset.
First map chunk here
pal <- colorFactor(palette = c("#3a86ff", "#8338ec", "#ff006e","#fb5607"),
levels = c("Stroke", "Coronary Heart Disease", "Chronic Kidney Disease", "Cancer (except skin)"), new_york$Short_Question_Text)leaflet() |>
setView(lng = -75.91, lat = 42.09, zoom =7) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = new_york,
radius = sqrt(5^new_york$Data_Value) * 2,
color = "#03045e",
stroke = TRUE,
fillColor = ~pal(Short_Question_Text),
fillOpacity = .6
)Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
Refined map chunk here
tooltip_ny <- paste0(
"<b>2017</b>","<br>",
"<br>",
"<b>City: </b>", new_york$CityName, "<br>",
"<b>Population: </b>", new_york$PopulationCount, "<br>",
"<b>Disease: </b>", new_york$Short_Question_Text, "<br>",
"<b>Prevalence: </b>", new_york$Data_Value, "<br>",
"<b>Measure: </b>", new_york$Measure, "<br>"
)leaflet() |>
setView(lng = -75.91, lat = 42.09, zoom =6.5) |>
addProviderTiles("CyclOSM") |>
addCircles(
data = new_york,
radius = sqrt(6^new_york$Data_Value) * 2,
color = "#03045e",
stroke = TRUE,
fillColor = ~pal(Short_Question_Text),
fillOpacity = .5,
popup = tooltip_ny
)Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
In a paragraph, describe the plots you created and the insights they show.
I first cleaned my data to select the top 3 cities in the data with the highest population (excluding NYC) and filtered for the four chronic diseases that were most life threatening which were Cancer, Kidney, Stroke, and Heart Disease. I also only used data from 2017 because when viewing the other years, there was little to no data available. I used these filtering because I am going to NYC this weekend and had interest in their data.
For my first chart, I displayed the data using bar graph + faucet_wrap to show the prevalence values for each disease for each city. I found that the Yonkers city tended to have the lowest values compared to the other cities. This may be due to population size, lack of data, or simply they have better health disease prevention. I also found that Cancer and Heart Disease were almost doubled in prevalence values compared to the other diseases. I believe this is because those diseases are far more extreme and life threatening.
For my map visualization, I used the coordinates for a city named Binghamton to set my map view because it was located about in the center of the 3 cities selected. I also color coded the data circles by disease type so it is easier to distinguish when viewing. Then, I added a tooltip popup to display important information such as city name, population, disease name, prevalence value, and the measure the data was taken in to provide insight. I used storke = TRUE to keep the outline of my circles. I also used the “radius = sqrt(6^” to make the data points with the highest values stand out.
When rendering my map, I similarly found that Cancer and Heart Disease had the largest circles compared to the rest of the data. Their values was 10 and above. These large points were seen in Buffalo and Worchester. Buffalo city was seen to have the most amounts of larger circles while Worchester had the least.