Healthy Cities GIS Assignment

Author

Oluwatosin Akinmoladun

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
setwd("C:/Users/tosin/Downloads")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)

# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter chunk here (you may need multiple chunks)

latlong_ca_aap <- latlong |>
  filter(
    StateAbbr == "CA",
    Data_Value_Type == "Age-adjusted prevalence",
    Year == 2017,
    Measure == "Coronary heart disease among adults aged >=18 Years"
  )  |>
  filter(!is.na(lat) & !is.na(long) & !is.na(Data_Value))

top_cities <- latlong_ca_aap |>
  count(CityName, sort = TRUE)|>
  slice_head(n = 5) |>
  pull(CityName)

Create a histogram plot to explore patterns in your filtered dataset.

ggplot(latlong_ca_aap, aes(x = Data_Value)) +
  geom_histogram(binwidth = 0.5, fill = "royalblue", color = "white", alpha = 0.7) +
  labs(
    title = "Distribution of Coronary Heart Disease Prevalence\nCalifornia Cities (2017)",
    x = "Prevalence (%)",
    y = "Number of Observations"
  ) +
  theme_classic()

Create a map to visualize your filtered dataset.

library(leaflet)

leaflet(latlong_ca_aap) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircleMarkers(
    lng = ~long,
    lat = ~lat,
    radius = 5,
    color = "orchid",
    fillColor = "orchid",
    fillOpacity = 0.7,
    stroke = TRUE
  ) |>
  setView(
    lng = mean(latlong_ca_aap$long, na.rm = TRUE),
    lat = mean(latlong_ca_aap$lat, na.rm = TRUE),
    zoom = 6.5
  )

Refined Map with mouseclick

library(leaflet)

leaflet(latlong_ca_aap) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircleMarkers(
    lng = ~long,
    lat = ~lat,
    radius = 5,
    color = "orchid",
    fillColor = "orchid",
    fillOpacity = 0.7,
    stroke = TRUE,
    popup = ~paste0(
      "<strong>City: </strong>", CityName, "<br>",
      "<strong>CHD Prevalence: </strong>", round(Data_Value, 2), "%"
    )
  ) |>
  setView(
    lng = mean(latlong_ca_aap$long, na.rm = TRUE),
    lat = mean(latlong_ca_aap$lat, na.rm = TRUE),
    zoom = 6.5
  )

5. Write a paragraph

The histogram provides a clear overview of how coronary heart disease prevalence varies among California cities, illustrating whether most cities have similar rates or if there is a wide range of values. This allows the audience to quickly spot patterns, such as clusters of cities with higher or lower prevalence, and to identify any unusual cases that stand out from the rest. The interactive Leaflet map builds on this analysis by adding a geographical component teach city is represented by a orchid marker placed at its actual location on the map. When a user clicks on a marker, a popup appears showing the city’s name and its specific prevalence value. This interactivity makes it easy to explore the data spatially, helping users identify regional patterns, clusters, or outliers that might not be as obvious from the histogram alone.