Healthy Cities GIS Assignment

Author

Catherine Z. Matenje

Healthcare-seeking behaviors in Mississippi

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)

library(leaflet)

setwd("C:/Users/cathe/OneDrive/Desktop/Montgomery College Transition/2025-2026 MONTGOMERY COLLEGE TRANSITION/MC COURSES 25-26/Spring 2026/DATA 110/01. Assignments")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

#checking the data 
 
head(cities500)
# A tibble: 6 × 24
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 17 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, GeoLocation <chr>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

lat is + : north of the equator lat is - : south of the equator long +: east of the prime meridian long - : west of the prime meridian (North = + lat, South = - lat, East = + lng, West = - lng)

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |> #removed US
  filter(Data_Value_Type == "Crude prevalence") |> #filtered for crude prevalence
  filter(Year == 2017) |>
  filter(StateAbbr == "CT") |> #filtered for CT
  filter(Category == "Unhealthy Behaviors")
head(latlong_clean)
# A tibble: 6 × 25
   Year StateAbbr StateDesc   CityName   GeographicLevel DataSource Category    
  <dbl> <chr>     <chr>       <chr>      <chr>           <chr>      <chr>       
1  2017 CT        Connecticut Bridgeport Census Tract    BRFSS      Unhealthy B…
2  2017 CT        Connecticut Danbury    City            BRFSS      Unhealthy B…
3  2017 CT        Connecticut Norwalk    Census Tract    BRFSS      Unhealthy B…
4  2017 CT        Connecticut Bridgeport Census Tract    BRFSS      Unhealthy B…
5  2017 CT        Connecticut Hartford   Census Tract    BRFSS      Unhealthy B…
6  2017 CT        Connecticut Waterbury  Census Tract    BRFSS      Unhealthy B…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

latlong_clean2 <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)
# A tibble: 6 × 18
   Year StateAbbr StateDesc   CityName GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>       <chr>    <chr>           <chr>    <chr>    <chr>  
1  2017 CT        Connecticut Bridgep… Census Tract    Unhealt… 0908000… Obesit…
2  2017 CT        Connecticut Danbury  City            Unhealt… 918430   Obesit…
3  2017 CT        Connecticut Norwalk  Census Tract    Unhealt… 0955990… Obesit…
4  2017 CT        Connecticut Bridgep… Census Tract    Unhealt… 0908000… Curren…
5  2017 CT        Connecticut Hartford Census Tract    Unhealt… 0937000… Obesit…
6  2017 CT        Connecticut Waterbu… Census Tract    Unhealt… 0980000… Obesit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new data set “latlong_clean2” is a manageable data set now.

For your assignment, work with a cleaned data set where you perform your own cleaning and filtering.

1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.

Filter chunk here (you may need multiple chunks)

First round of filtering the dataset

# cleaning latlong data set to fit my preference first
latlong_3 <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017)

head(latlong_3)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
4  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
5  2017 CA        California Inglewood Census Tract    BRFSS      Health Outcom…
6  2017 CA        California Lakewood  City            BRFSS      Unhealthy Beh…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Exploring the data set to identify variables of interest

#Exploring variable names, measures and categories
names(latlong_3)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       
unique(latlong_3$Measure)
 [1] "Arthritis among adults aged >=18 Years"                                                               
 [2] "Current smoking among adults aged >=18 Years"                                                         
 [3] "Obesity among adults aged >=18 Years"                                                                 
 [4] "Diagnosed diabetes among adults aged >=18 Years"                                                      
 [5] "Current asthma among adults aged >=18 Years"                                                          
 [6] "Chronic kidney disease among adults aged >=18 Years"                                                  
 [7] "Coronary heart disease among adults aged >=18 Years"                                                  
 [8] "Stroke among adults aged >=18 Years"                                                                  
 [9] "Cholesterol screening among adults aged >=18 Years"                                                   
[10] "No leisure-time physical activity among adults aged >=18 Years"                                       
[11] "High blood pressure among adults aged >=18 Years"                                                     
[12] "Binge drinking among adults aged >=18 Years"                                                          
[13] "Cancer (excluding skin cancer) among adults aged >=18 Years"                                          
[14] "Visits to doctor for routine checkup within the past Year among adults aged >=18 Years"               
[15] "Physical health not good for >=14 days among adults aged >=18 Years"                                  
[16] "Chronic obstructive pulmonary disease among adults aged >=18 Years"                                   
[17] "Mental health not good for >=14 days among adults aged >=18 Years"                                    
[18] "Current lack of health insurance among adults aged 18\x9664 Years"                                    
[19] "High cholesterol among adults aged >=18 Years who have been screened in the past 5 Years"             
[20] "Taking medicine for high blood pressure control among adults aged >=18 Years with high blood pressure"
unique(latlong_3$Category)
[1] "Health Outcomes"     "Unhealthy Behaviors" "Prevention"         

I explored the data set to understand what variables and measures were available.This helped me select meaningful health indicators for my analysis

Creating filtered and subsetted data for my plots

# non map plot

# Create my own filtered dataset focusing on Mississippi and prevention behaviors
my_subset <- latlong_3 |>
  filter(StateAbbr == "MS") |>   # focus on Mississippi
  filter(Category == "Prevention") |> # focus on health-seeking behaviors
  filter(Measure %in% c(
    "Visits to doctor for routine checkup within the past Year among adults aged >=18 Years",
    "Cholesterol screening among adults aged >=18 Years",
    "Taking medicine for high blood pressure control among adults aged >=18 Years with high blood pressure"
  )) |>
  drop_na(lat, long, Data_Value)   # remove missing values

# Check number of observations (must be under 900)
nrow(my_subset)
[1] 231
head(my_subset)
# A tibble: 6 × 25
   Year StateAbbr StateDesc   CityName GeographicLevel DataSource Category  
  <dbl> <chr>     <chr>       <chr>    <chr>           <chr>      <chr>     
1  2017 MS        Mississippi Gulfport Census Tract    BRFSS      Prevention
2  2017 MS        Mississippi Jackson  City            BRFSS      Prevention
3  2017 MS        Mississippi Jackson  Census Tract    BRFSS      Prevention
4  2017 MS        Mississippi Gulfport Census Tract    BRFSS      Prevention
5  2017 MS        Mississippi Jackson  Census Tract    BRFSS      Prevention
6  2017 MS        Mississippi Jackson  Census Tract    BRFSS      Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

I selected these measures because they reflect health-seeking behaviors and engagement with the healthcare system, which from my knowledge of public health in the US tends to be lower in southern states. I chose Mississippi because I have a friend who grew up there and works in healthcare and has shared what they have observed and patient populations they have encountered working in ICUs there.

Adjusting labels for my variables/measures of interest as they are too long and will be cut off in the plots

my_subset2 <- my_subset |>
  mutate(Measure = case_when(
    Measure == "Cholesterol screening among adults aged >=18 Years" ~ "Cholesterol Screening",
    Measure == "Taking medicine for high blood pressure control among adults aged >=18 Years with high blood pressure" ~ "BP Medication Use",
    Measure == "Visits to doctor for routine checkup within the past Year among adults aged >=18 Years" ~ "Routine Checkup"
  ))

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

### Density plot looking at distribution of 3 health-seeking behavior measures

ggplot(my_subset2, aes(x = Data_Value, fill = Measure )) +
  geom_density(alpha = 0.5) +
  scale_fill_viridis_d(option = "plasma") +
  labs(
    title = "Distribution of Health-Seeking Behaviors in Mississippi (Adults >= 18 years)",
    x = "Prevalence (%)",
    y = "Density"
  ) +
  theme_minimal()

The density plot shows that all three health-seeking behaviors are relatively high across Mississippi cities, with most values ranging between about 70% and 85%. Cholesterol screening and doctor visits tend to have slightly higher prevalence, while taking blood pressure medication shows more variation and includes some lower values. Overall, the distributions suggest that while healthcare engagement is generally high, there is still variability across cities.

Boxplot

For my own understanding, I created a boxplot to better understand the distributions

ggplot(my_subset2, aes(x = Measure, y = Data_Value, fill = Measure)) +
  geom_boxplot() +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Comparison of Health-Seeking Behaviors in Mississippi (Adults >= 18 years)",
    x = "",
    y = "Prevalence (%)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The boxplot shows that cholesterol screening and BP medication use have higher median prevalence compared to routine check-ups. Most values appear to cluster within a narrow range which, however, a few lower outliers suggest that some areas may have reduced access to or lower participation in certain health seeking behaviors.

3. Now create a map of your subsetted dataset.

First map chunk here

leaflet(data = my_subset2) |>
  addTiles() |>
  addCircleMarkers(
    ~long, ~lat,
    radius = 5,
    color = ~colorNumeric("plasma", Data_Value)(Data_Value),
    stroke = FALSE,
    fillOpacity = 0.4,
    popup = ~paste0(
      "<b>City:</b> ", CityName, "<br>",
      "<b>Measure:</b> ", Measure, "<br>",
      "<b>Prevalence:</b> ", round(Data_Value, 1), "%"
    )
  )

4. Refine your map to include a mouse-click tooltip

Refined map chunk here

pal <- colorNumeric("plasma", domain = my_subset2$Data_Value)

leaflet(data = my_subset2) |>
  addTiles() |>
  addCircleMarkers(
    ~long, ~lat,
    radius = ~sqrt(PopulationCount) / 20, 
    color = ~pal(Data_Value),
    stroke = FALSE,
    fillOpacity = 0.4,                    
    popup = ~paste0(
      "<b>City:</b> ", CityName, "<br>",
      "<b>Measure:</b> ", Measure, "<br>",
      "<b>Prevalence:</b> ", round(Data_Value, 1), "%", "<br>",
      "<b>Population:</b> ", PopulationCount
    )
  ) |>
  addLegend(
    position = "bottomright",
    pal = pal,
    values = ~Data_Value,
    title = "Prevalence (%)"
  )

I decided to add PopulationCount to scale the size of the points on the map because it provides context about how many people are represented in each location. So my map now shows prevalence (the percentage of people engaging in a health-seeking behavior), population size indicating the number of individuals in that location. This makes the map more informative by showing both the intensity of the behavior (color) and the potential public health impact (size).

5. Write a paragraph

In a paragraph, describe the plots you created and the insights they show.

For this assignment, I created two plots and a geographic map to explore health-seeking behaviors in Mississippi. The density plot shows that cholesterol screening and routine check-ups have higher prevalence rates. However, the boxplot shows that BP medication use and cholesterol screening have higher prevalence rates, between 75% and 85% than routine check- ups. Overall, while preventive care behaviors are common, there are still differences and variability that may reflect variations by location, income, and other factors. Lower routine checkups could relate to issues accessing healthcare facilities, perhaps for rural populations. However, further analysis would be needed to better understand these specific circumstances.

The map provides additional insights by showing where these behaviors occur geographically. Areas with higher prevalence are shown with warmer colors, and I scaled the size of each point using PopulationCount to reflect how many people are represented in each location. This allows the map to highlight not only where prevalence is high, but also where there is a larger population of people. Most cities show relatively high prevalence (around 70–85%), which is supported by my earlier plots, however, some locations display lower values, which suggests variation due to other factors at play. Notably, larger circles are concentrated around urban areas and larger cities such as Jackson and Gulfport, while smaller, more dispersed points perhaps represent rural areas. Overall, the results suggest that although health-seeking behaviors are high across Mississippi, geographic differences may indicate disparities in access to healthcare services and resources across urban and rural communities.