Healthy Cities GIS Assignment

Author

Iris Wu

Load the libraries and dataset

library(tidyverse)
library(tidyr)
library(leaflet)
library(knitr)
setwd("C:/Users/rsaidi/Dropbox/Rachel/MontColl/Datasets/Datasets")
#setwd("C:/Users/iwu80/OneDrive/Documents/Files/School/DATA 110 R Assignments")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")

Separate the latitudes and longitudes

cities2 <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(cities2)

# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

1. Filter the dataset

#create a dataset for dental visits 
cities_dental <- cities2 |>
  #remove the United States
  filter(StateDesc != "United States") |>
  #filter for West Coast States 
  filter(StateAbbr == "CA"| StateAbbr == "OR"| StateAbbr == "WA") |>
  #filter for dental visits
  filter(Short_Question_Text == "Dental Visit" ) |>
  #geographic level should be city 
  filter(GeographicLevel == "City") |>
  #show only crude prevalence 
  filter(DataValueTypeID == "CrdPrv")

#create another dataset for teeth loss 
cities_teeth <- cities2 |>
  filter(StateDesc != "United States") |>
  filter(StateAbbr == "CA" | StateAbbr == "OR"| StateAbbr == "WA") |>
  #filter for teeth loss 
  filter(Short_Question_Text == "Teeth Loss") |>
  filter(GeographicLevel == "City") |>
  filter(DataValueTypeID == "CrdPrv")

#join the datasets 
cities_join <- left_join(cities_dental, cities_teeth, by = c("CityName"))

#remove unnecessary columns from the dataset 
cities_final <- cities_join |>
  select(StateAbbr.x, Year.x, CityName, Category.x, Measure.x, DataValueTypeID.x, Data_Value.x, Measure.y, Data_Value.y, PopulationCount.x, lat.x, long.x) |>
#rename some columns 
rename("State" = "StateAbbr.x") |>
rename("Population" = "PopulationCount.x") |>
  #I kept getting in an error in Chunk 9, so I asked Google Gemini, and it suggested that I rename the latitude and longitude columns as well (see end of document for more details)
  rename("lat" = "lat.x") |>
  rename("long" = "long.x")

2. Plot the data

ggplot(cities_final, 
       aes(x = Data_Value.x, 
           y = Data_Value.y, color = State, 
           size = Population)) +
  geom_point(alpha = .7) +
  geom_smooth(method = lm, se = FALSE, lty = 2) +
  scale_color_manual(values = c("deepskyblue", "forestgreen", "purple4" )) +
  labs(title = "Dental Health in West Coast Cities, 2016", x = "Prevalence of Dental Visits Among Adults Age 18+ (%)", y = "Prevalence of Teeth Loss \n Among Adults Age 65+ (%)", caption = "Source: Centers for Disease Control and Prevention") +
  theme_minimal(base_family = "serif") +
  theme(plot.title = element_text(face = "bold", hjust = .5)) +
  theme(plot.caption = element_text(face = "italic", hjust = .5)) +
  guides(size = "none")

3. Create a map of the dataset

#set the latitude and longitude values for the West Coast (I googled the lat/long for the northernmost and southernmost points of the contiguous West Coast and played around with the numbers until the map centered on the correct area.)
west_lat <- 34.0222263
west_long <- -118.4051088

#set a palette for the legend (code adapted from Lucy Murray's project on asbestos: https://rpubs.com/rsaidi/617763)
colors <- colorBin(palette = "Dark2", domain = cities_final$Data_Value.x)
#map the data
leaflet() |>
  setView(lng = west_long, lat = west_lat, zoom = 5.7) |>
  addProviderTiles("OpenStreetMap") |>
  addCircles(data = cities_final, radius = 5000, color = ~colors(cities_final$Data_Value.x), fillOpacity = 1, stroke = TRUE) |>
  addLegend(pal = colors, values = cities_final$Data_Value.x, position = "bottomleft", title = "Prevalence of Dental Visits among Adults Age 18+ (%)")

Assuming "long" and "lat" are longitude and latitude, respectively

4. Include a tooltip

#create a popup 

popupdental <- paste0("<b>Year: </b>", cities_final$Year.x, "<br>", "<b>City: </b>", cities_final$CityName, "<br>", "<b>Population: </b>", cities_final$Population, "<br>", "<b> Prevalence of Dental Visits Among Adults Age 18+ (%): </b>", cities_final$Data_Value.x, "<br>", "<b> Prevlance of Teeth Loss Among Adults Age 65+ (%): </b>", cities_final$Data_Value.y, "<br>")

#add the popup to the map
leaflet() |>
  setView(lng = west_long, lat = west_lat, zoom = 5.7) |>
  addProviderTiles("OpenStreetMap") |>
  addCircles(data = cities_final, 
             radius = 5000,
             color = ~colors(cities_final$Data_Value.x), stroke = TRUE,
             fillOpacity = 1,
             popup = popupdental) |>
  addLegend(pal = colors, values = cities_final$Data_Value.x, position = "bottomleft", title = "Prevalence of Dental Visits among Adults Age 18+ (%)")

Assuming "long" and "lat" are longitude and latitude, respectively

5. Paragraph

The first plot illustrates the negative correlation between prevalence of dental visits and prevalence of teeth loss in cities on the West Coast. As the prevalence of dental visits among the population increases, the prevalence of teeth loss decreases, which is not surprising. This inverse relationship is true in all three West Coast states. The map plot shows the geographic distribution of dental visit prevalence. Cities with the highest prevalence rates (70% or above) are clustered in the Bay Area. High income levels, among other factors, likely contribute to this phenomenon. Unsurprisingly, prevalence rates are also high in the other major cities, like Seattle and Portland. Outside these cities, however, the prevalence rates are more mixed. Overall, these plots suggest that going to the dentist is important (although it’s typically unpleasant). There are many possibilities for further exploration of the data. The relationship between dental visit prevalence or teeth loss prevalence and income level, dental insurance rates, and number of dental clinics can provide more insights into the state of dental health in the US.

Use of AI Disclaimer

When creating the map plot (Chunk 9), I kept getting the error “Error in guessLatLongCols(names(obj)) : Couldn’t infer longitude/latitude columns” and I could not figure out why, so I copied the error message to Google Gemini (2.5 Flash, 11/8/2025 version).

Prompt used : “What does this error mean? Error in guessLatLongCols(names(obj)) :Couldn’t infer longitude/latitude columns”

Gemini answer: “This error, Error in guessLatLongCols(names(obj)) : Couldn’t infer longitude/latitude columns, typically occurs in R when a function designed to work with geographical data (like plotting or mapping functions) cannot automatically identify the columns in your data frame that contain the longitude and latitude values. To resolve this, you need to either rename your columns or explicitly tell the function which columns to use.”

I used this answer to rename the “lat.x” and “long.x” columns in my joined dataset, cities_final, to “lat” and “long”.