Healthy Cities GIS Assignment

Author

Cody Paulay Simmons

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
library(leaflet)
cities500 <- read_csv("/Users/cody/Downloads/500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017) |>
  filter(StateAbbr == "CT") |>
  filter(Category == "Unhealthy Behaviors")
head(latlong_clean)
# A tibble: 6 × 25
   Year StateAbbr StateDesc   CityName   GeographicLevel DataSource Category    
  <dbl> <chr>     <chr>       <chr>      <chr>           <chr>      <chr>       
1  2017 CT        Connecticut Bridgeport Census Tract    BRFSS      Unhealthy B…
2  2017 CT        Connecticut Danbury    City            BRFSS      Unhealthy B…
3  2017 CT        Connecticut Norwalk    Census Tract    BRFSS      Unhealthy B…
4  2017 CT        Connecticut Bridgeport Census Tract    BRFSS      Unhealthy B…
5  2017 CT        Connecticut Hartford   Census Tract    BRFSS      Unhealthy B…
6  2017 CT        Connecticut Waterbury  Census Tract    BRFSS      Unhealthy B…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

latlong_clean2 <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)
# A tibble: 6 × 18
   Year StateAbbr StateDesc   CityName GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>       <chr>    <chr>           <chr>    <chr>    <chr>  
1  2017 CT        Connecticut Bridgep… Census Tract    Unhealt… 0908000… Obesit…
2  2017 CT        Connecticut Danbury  City            Unhealt… 918430   Obesit…
3  2017 CT        Connecticut Norwalk  Census Tract    Unhealt… 0955990… Obesit…
4  2017 CT        Connecticut Bridgep… Census Tract    Unhealt… 0908000… Curren…
5  2017 CT        Connecticut Hartford Census Tract    Unhealt… 0937000… Obesit…
6  2017 CT        Connecticut Waterbu… Census Tract    Unhealt… 0980000… Obesit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with a cleaned dataset.

1. Once you run the above code and learn how to filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations.

Filter in Hawaii

hi <- cities500 |>
  filter(StateAbbr == "HI")

down to 6,721 observations

Filter in unhealthy behaviors

hi_unhbeh <- hi |>
  filter(CategoryID == "UNHBEH")

down to 1,205 observations

Filter in physical inactivity

hi_lpa <- hi_unhbeh |>
  filter(MeasureId == "LPA")

down to 241 observations of physical inactivity cases in Hawaii. I am actually curious about Maryland, too.

md_lpa <- cities500 |>
  filter(StateAbbr == "MD",
         CategoryID == "UNHBEH",
         MeasureId == "LPA")

I am surprised to see Maryland having lower observations than Hawaii.

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

Lat and Long values for Honolulu, Hawaii

hi_lon <- -157.8583
hi_lat <- 21.3069

I miss Hawaii

Bar plot of population by unhealthy behaviors in Hawaii

ggplot(hi_unhbeh) +
  geom_bar(aes(x = MeasureId, 
               y = PopulationCount),
           stat = "identity") + #ChatGPT assistance
  theme_minimal() +
  labs(title = "Population Count by Unhealthy Behavors in Hawaii",
       x = "Measure ID",
       y = "Population Count")

3. Now create a map of your subsetted dataset.

First map chunk here

I dont like this plot at all but we are getting there

I have to do something about the population numbers and make them real numbers

Summarize the total population count per MeasureId

hi_summary <- hi_unhbeh |>
  group_by(MeasureId) |>
  summarize(total_population = sum(PopulationCount,
                                   na.rm = TRUE))

Renovated Plot

ggplot(data = hi_summary) +
  geom_bar(aes(x = MeasureId,
               y = total_population),
           stat = "identity") +
  scale_y_continuous(labels = scales::comma) +  # ChatGPT assistance: Makes y-axis readable
  labs(title = "Population Count by Unhealthy Behavors in Hawaii",
       x = "Measure ID",
       y = "Total Population Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45))

There is something wrong with the bars at the same height. Each row in the dataset has a different measureid but they all have the same total population count value: 2,859,621. That means the population is being duplicated across the dataset. I have to change the dataset in the summary to data value, not the population count

Updated Dataset

This one uses the data value variable, not the population count. Then make the mean, average, out of the data value summary

hi_summary2 <- hi_unhbeh |>
  group_by(MeasureId,) |>
  summarize(avg_value = mean(Data_Value, 
                             na.rm = TRUE))

Rerenovated Plot

ggplot(data = hi_summary2) +
  geom_bar(aes(x = MeasureId,
               y = avg_value),
           stat = "identity") +
  labs(title = "Average Percentage of Unhealthy Behaviors in Hawaii",
       x = "Behaviors",
       y = "Average Percentage %") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) #ChatGPT assistance: hjust function to lower the label on x axis

Leaflet Map Plot

New Dataset for Leaflet with splitted GeoLocation (lat, long) into two columns: lat and long with filtered hawaii state and unhealthy behavior.

hi_latlong <- latlong|>
  filter(StateAbbr == "HI",
         CategoryID == "UNHBEH")
leaflet() |>
  setView(lng = hi_lon,
          lat = hi_lat,
          zoom = 9) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(data = hi_latlong,
             radius = hi_latlong$Data_Value *10,
             color = "skyblue",
             fillColor = "grey",
             fillOpacity = 0.6)
Assuming "long" and "lat" are longitude and latitude, respectively

4. Refine your map to include a mouse-click tooltip

Create a tooltip using paste0

Apply behavior, population and percentage into the box

tooltip_hi <- paste0("<b>Behavior: </b>",
                     hi_latlong$Short_Question_Text,
                     "<br>",
                     "<b>Population: </b>", hi_latlong$PopulationCount,
                     "<br>",
                     "<b>Percentage: </b>",
                     hi_latlong$Data_Value,
                     "<br>")

Create the leaflet map plot

leaflet() |>
  setView(lng = hi_lon,
          lat = hi_lat,
          zoom = 9) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(data = hi_latlong,
             radius = hi_latlong$Data_Value *10,
             color = "skyblue",
             fillColor = "grey",
             fillOpacity = 0.6,
             popup = tooltip_hi)
Assuming "long" and "lat" are longitude and latitude, respectively

5. Write a paragraph

I created a Leaflet map focusing on unhealthy behaviors in Hawaii, filtered from the 500 Cities dataset. After cleaning and separating the GeoLocation variable into latitude and longitude, I filtered the data to only include entries from Hawaii and CategoryID marked as “UNHBEH” (Unhealthy Behaviors). The map shows each dots in Hawaii where data is available, with circle sizes scaled by the data value for each behavior. The tooltips shows the behavior type, population count, and prevalence percentage. This interactive map helps visualize the unhealthy behavior in Hawaii, making it easier to identify patterns and compare areas of higher or lower concern.