library(tidyverse)
library(tidyr)
setwd('/Users/darrenabou/Desktop/Spring 26/Data110/DATASETS')
cities500 <- read_csv('/Users/darrenabou/Desktop/Spring 26/Data110/DATASETS/500CitiesLocalHealthIndicators.cdc copy.csv')
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017) |>
filter(StateAbbr == "MA") |>
filter(Category == "Unhealthy Behaviors")
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MA Massachusetts Newton City BRFSS Unhealth…
2 2017 MA Massachusetts Worcester City BRFSS Unhealth…
3 2017 MA Massachusetts Springfield Census Tract BRFSS Unhealth…
4 2017 MA Massachusetts New Bedford Census Tract BRFSS Unhealth…
5 2017 MA Massachusetts Boston Census Tract BRFSS Unhealth…
6 2017 MA Massachusetts Brockton Census Tract BRFSS Unhealth…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
latlong_clean2 <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MA Massachuse… Newton City Unhealt… 2545560 Obesit…
2 2017 MA Massachuse… Worcest… City Unhealt… 2582000 No lei…
3 2017 MA Massachuse… Springf… Census Tract Unhealt… 2567000… No lei…
4 2017 MA Massachuse… New Bed… Census Tract Unhealt… 2545000… No lei…
5 2017 MA Massachuse… Boston Census Tract Unhealt… 2507000… Binge …
6 2017 MA Massachuse… Brockton Census Tract Unhealt… 2509000… No lei…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “latlong_clean2” is a manageable dataset now.
For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.
1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.
Filter chunk here (you may need multiple chunks)
my_subset <- latlong_clean2 |>
filter(GeographicLevel == "Census Tract") |>
filter(CityName %in% c("Boston", "Worcester")) |>
drop_na(Data_Value, lat, long)2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here The plot below shows the distribution of crude prevalence (%) for each of the four unhealthy behavior measures across census tracts in Boston and Worcester.
# non map plot
ggplot(my_subset, aes(x = CityName, y = Data_Value, fill = CityName)) +
geom_boxplot(alpha = 0.7) +
facet_wrap(~ Short_Question_Text, scales = "free_y") +
scale_fill_viridis_d(option = "D") +
labs(
title = "Unhealthy Behavior Prevalence: Boston vs. Worcester",
subtitle = "Census Tract-Level Crude Prevalence · 2017",
x = "City",
y = "Crude Prevalence (%)",
caption = "Source: CDC 500 Cities Project"
) +
theme_bw() +
theme(
strip.text = element_text(face = "bold"),
axis.text.x = element_text(size = 10)
)3. Now create a map of your subsetted dataset.
First map chunk here
The map plots each census tract as a circle whose radius scales with the obesity prevalence rate. Larger circles indicate tracts where a greater share of adults are obese
# leaflet()
library(leaflet)
obesity_tracts <- my_subset |>
filter(Short_Question_Text == "Obesity")
ma_lat <- 42.34
ma_lon <- -71.25
# leaflet()
leaflet() |>
setView(lng = ma_lon, lat = ma_lat, zoom = 10) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = obesity_tracts,
lat = ~lat,
lng = ~long,
radius = ~Data_Value * 30,
color = "#14010d",
fillColor = "#f2079c",
fillOpacity = 0.35
)4. Refine your map to include a mouse-click tooltip
Refined map chunk here
The refined map shows all four measures at once, color-coded by behavior type using a four-color factor palette (mirroring the earthquake magnitude-type palette from the tutorial).
pal <- colorFactor(
palette = c("#7538a1", "#3880a1", "#29b0d9", "#2f9c7b"),
levels = c("Obesity", "Current Smoking", "Binge Drinking", "Physical Inactivity"),
domain = my_subset$Short_Question_Text
)
infectedhealth <- paste0(
"<b>City: </b>", my_subset$CityName, "<br>",
"<b>Measure: </b>", my_subset$Short_Question_Text, "<br>",
"<b>Crude Prevalence: </b>", my_subset$Data_Value, "%<br>",
"<b>Level: </b>", my_subset$GeographicLevel, "<br>"
)leaflet(my_subset) |>
setView(lng = ma_lon, lat = ma_lat, zoom = 10) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
lat = ~lat,
lng = ~long,
stroke = FALSE,
fillColor = ~pal(Short_Question_Text),
radius = ~Data_Value * 30,
fillOpacity = 0.75,
popup = infectedhealth
) |>
addLegend(
position = "bottomright",
pal = pal,
values = ~Short_Question_Text,
title = "Unhealthy Behavior",
opacity = 0.1
)5. Write a paragraph
In a paragraph, describe the plots you created and the insights they show.
The boxplot reveals a clear pattern of health disparity between Boston and Worcester across all four unhealthy behavior measures.Worcester has notably higher median prevalence for obesity, physical inactivity , and current smoking.Boston, however, shows slightly higher median binge drinking.The leaflet map brings this neighborhood-level story into geographic focus: the largest circles — representing the highest obesity and physical inactivity rates — cluster in Worcester’s urban core and in Boston. Clicking any point on the map reveals the exact crude prevalence for that specific tract and measure, making the interactive map a practical tool for identifying the neighborhoods where public health interventions