Healthy Cities GIS Assignment

Author

Gamaliel Ngouafon

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
setwd('/Users/darrenabou/Desktop/Spring 26/Data110/DATASETS')
cities500 <- read_csv('/Users/darrenabou/Desktop/Spring 26/Data110/DATASETS/500CitiesLocalHealthIndicators.cdc copy.csv')
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017) |>
  filter(StateAbbr == "MA") |>
  filter(Category == "Unhealthy Behaviors")
head(latlong_clean)
# A tibble: 6 × 25
   Year StateAbbr StateDesc     CityName    GeographicLevel DataSource Category 
  <dbl> <chr>     <chr>         <chr>       <chr>           <chr>      <chr>    
1  2017 MA        Massachusetts Newton      City            BRFSS      Unhealth…
2  2017 MA        Massachusetts Worcester   City            BRFSS      Unhealth…
3  2017 MA        Massachusetts Springfield Census Tract    BRFSS      Unhealth…
4  2017 MA        Massachusetts New Bedford Census Tract    BRFSS      Unhealth…
5  2017 MA        Massachusetts Boston      Census Tract    BRFSS      Unhealth…
6  2017 MA        Massachusetts Brockton    Census Tract    BRFSS      Unhealth…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

latlong_clean2 <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)
# A tibble: 6 × 18
   Year StateAbbr StateDesc   CityName GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>       <chr>    <chr>           <chr>    <chr>    <chr>  
1  2017 MA        Massachuse… Newton   City            Unhealt… 2545560  Obesit…
2  2017 MA        Massachuse… Worcest… City            Unhealt… 2582000  No lei…
3  2017 MA        Massachuse… Springf… Census Tract    Unhealt… 2567000… No lei…
4  2017 MA        Massachuse… New Bed… Census Tract    Unhealt… 2545000… No lei…
5  2017 MA        Massachuse… Boston   Census Tract    Unhealt… 2507000… Binge …
6  2017 MA        Massachuse… Brockton Census Tract    Unhealt… 2509000… No lei…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new dataset “latlong_clean2” is a manageable dataset now.

For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.

1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.

Filter chunk here (you may need multiple chunks)

my_subset <- latlong_clean2 |>
  filter(GeographicLevel == "Census Tract") |>
  filter(CityName %in% c("Boston", "Worcester")) |>
  drop_na(Data_Value, lat, long)

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here The plot below shows the distribution of crude prevalence (%) for each of the four unhealthy behavior measures across census tracts in Boston and Worcester.

# non map plot
ggplot(my_subset, aes(x = CityName, y = Data_Value, fill = CityName)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(~ Short_Question_Text, scales = "free_y") +
  scale_fill_viridis_d(option = "D") +
  labs(
    title    = "Unhealthy Behavior Prevalence: Boston vs. Worcester",
    subtitle = "Census Tract-Level Crude Prevalence · 2017",
    x        = "City",
    y        = "Crude Prevalence (%)",
    caption  = "Source: CDC 500 Cities Project"
  ) +
  theme_bw() +
  theme(
    strip.text  = element_text(face = "bold"),
    axis.text.x = element_text(size = 10)
  )

3. Now create a map of your subsetted dataset.

First map chunk here

The map plots each census tract as a circle whose radius scales with the obesity prevalence rate. Larger circles indicate tracts where a greater share of adults are obese

# leaflet()
library(leaflet)
obesity_tracts <- my_subset |>
  filter(Short_Question_Text == "Obesity")

ma_lat <- 42.34
ma_lon <- -71.25

# leaflet()
leaflet() |>
  setView(lng = ma_lon, lat = ma_lat, zoom = 10) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data        = obesity_tracts,
    lat         = ~lat,
    lng         = ~long,
    radius      = ~Data_Value * 30,
    color       = "#14010d",
    fillColor   = "#f2079c",
    fillOpacity = 0.35
  )

4. Refine your map to include a mouse-click tooltip

Refined map chunk here

The refined map shows all four measures at once, color-coded by behavior type using a four-color factor palette (mirroring the earthquake magnitude-type palette from the tutorial).

pal <- colorFactor(
  palette = c("#7538a1", "#3880a1", "#29b0d9", "#2f9c7b"),
  levels  = c("Obesity", "Current Smoking", "Binge Drinking", "Physical Inactivity"),
  domain  = my_subset$Short_Question_Text
)

infectedhealth <- paste0(
  "<b>City: </b>",             my_subset$CityName,            "<br>",
  "<b>Measure: </b>",          my_subset$Short_Question_Text, "<br>",
  "<b>Crude Prevalence: </b>", my_subset$Data_Value,          "%<br>",
  "<b>Level: </b>",            my_subset$GeographicLevel,     "<br>"
)
leaflet(my_subset) |>
  setView(lng = ma_lon, lat = ma_lat, zoom = 10) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    lat         = ~lat,
    lng         = ~long,
    stroke      = FALSE,
    fillColor   = ~pal(Short_Question_Text),
    radius      = ~Data_Value * 30,
    fillOpacity = 0.75,
    popup       = infectedhealth
  ) |>
  addLegend(
    position = "bottomright",
    pal      = pal,
    values   = ~Short_Question_Text,
    title    = "Unhealthy Behavior",
    opacity  = 0.1
  )

5. Write a paragraph

In a paragraph, describe the plots you created and the insights they show.

The boxplot reveals a clear pattern of health disparity between Boston and Worcester across all four unhealthy behavior measures.Worcester has notably higher median prevalence for obesity, physical inactivity , and current smoking.Boston, however, shows slightly higher median binge drinking.The leaflet map brings this neighborhood-level story into geographic focus: the largest circles — representing the highest obesity and physical inactivity rates — cluster in Worcester’s urban core and in Boston. Clicking any point on the map reveals the exact crude prevalence for that specific tract and measure, making the interactive map a practical tool for identifying the neighborhoods where public health interventions