Healthy Cities GIS Assignment

Author

Your Name

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
library(webshot2)
library(leaflet)
setwd("C:/Users/MCuser/Downloads")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017) |>
  filter(StateAbbr == "CT") |>
  filter(Category == "Unhealthy Behaviors")
head(latlong_clean)
# A tibble: 6 × 25
   Year StateAbbr StateDesc   CityName   GeographicLevel DataSource Category    
  <dbl> <chr>     <chr>       <chr>      <chr>           <chr>      <chr>       
1  2017 CT        Connecticut Bridgeport Census Tract    BRFSS      Unhealthy B…
2  2017 CT        Connecticut Danbury    City            BRFSS      Unhealthy B…
3  2017 CT        Connecticut Norwalk    Census Tract    BRFSS      Unhealthy B…
4  2017 CT        Connecticut Bridgeport Census Tract    BRFSS      Unhealthy B…
5  2017 CT        Connecticut Hartford   Census Tract    BRFSS      Unhealthy B…
6  2017 CT        Connecticut Waterbury  Census Tract    BRFSS      Unhealthy B…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

latlong_clean2 <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)
# A tibble: 6 × 18
   Year StateAbbr StateDesc   CityName GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>       <chr>    <chr>           <chr>    <chr>    <chr>  
1  2017 CT        Connecticut Bridgep… Census Tract    Unhealt… 0908000… Obesit…
2  2017 CT        Connecticut Danbury  City            Unhealt… 918430   Obesit…
3  2017 CT        Connecticut Norwalk  Census Tract    Unhealt… 0955990… Obesit…
4  2017 CT        Connecticut Bridgep… Census Tract    Unhealt… 0908000… Curren…
5  2017 CT        Connecticut Hartford Census Tract    Unhealt… 0937000… Obesit…
6  2017 CT        Connecticut Waterbu… Census Tract    Unhealt… 0980000… Obesit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with a cleaned dataset.

1. Once you run the above code and learn how to filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations.

Filter chunk here (you may need multiple chunks)

# Correct filter for 'Current Smoking' in Connecticut
subset_data <- latlong_clean2 %>%
  filter(StateAbbr == "CT" & Short_Question_Text == "Current Smoking")

subset_data
# A tibble: 228 × 18
    Year StateAbbr StateDesc  CityName GeographicLevel Category UniqueID Measure
   <dbl> <chr>     <chr>      <chr>    <chr>           <chr>    <chr>    <chr>  
 1  2017 CT        Connectic… Bridgep… Census Tract    Unhealt… 0908000… Curren…
 2  2017 CT        Connectic… Waterbu… Census Tract    Unhealt… 0980000… Curren…
 3  2017 CT        Connectic… New Bri… Census Tract    Unhealt… 0950370… Curren…
 4  2017 CT        Connectic… Bridgep… Census Tract    Unhealt… 0908000… Curren…
 5  2017 CT        Connectic… Hartford Census Tract    Unhealt… 0937000… Curren…
 6  2017 CT        Connectic… Waterbu… Census Tract    Unhealt… 0980000… Curren…
 7  2017 CT        Connectic… Waterbu… Census Tract    Unhealt… 0980000… Curren…
 8  2017 CT        Connectic… Waterbu… Census Tract    Unhealt… 0980000… Curren…
 9  2017 CT        Connectic… Stamford Census Tract    Unhealt… 0973000… Curren…
10  2017 CT        Connectic… Bridgep… Census Tract    Unhealt… 0908000… Curren…
# ℹ 218 more rows
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
# Census Tracts
tract_only <- subset_data %>%
  filter(GeographicLevel == "Census Tract")

# Cities only
city_only <- subset_data %>%
  filter(GeographicLevel == "City")

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

# Create a pastel-colored dot plot for smoking prevalence in CT cities
smoking_plot <- ggplot(city_only, aes(x = CityName, y = Data_Value)) +
  geom_point(shape = 17, size = 3, color = "#A3E4D7") +  # pastel teal
  labs(
    x = "City Name",
    y = "Smoking Rate (%)",
    title = "Adult Smoking Prevalence by City in Connecticut (2017)",
    subtitle = "Each point represents the crude prevalence of smoking (≥18 years) by city."
  ) +
  theme(
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "#f7f7f7"),
    axis.title = element_text(face = 2),
    legend.position = "none",  # legend not needed with single color
    panel.grid = element_line(color = "lightgrey"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

smoking_plot


### 3. Now create a map of your subsetted dataset.

First map chunk here

::: {.cell}

:::
::: {.cell}

```{.r .cell-code}
leaflet() |> 
  setView(lng = -72.7, lat = 41.6, zoom = 7) |>  # Centered on Connecticut
  addProviderTiles("OpenStreetMap") |>           # Or use "CartoDB.Positron" for light theme
  addCircles(
    data = tract_only,
    lng = ~long,
    lat = ~lat,
    radius = sqrt(10^(tract_only$Data_Value / 22)) * 5,
    color = "blue"
  )

:::


### 4. Refine your map to include a mouse-click tooltip

Refined map chunk here

::: {.cell}

:::
::: {.cell}

```{.r .cell-code}
# Create popup content
popup_content <- paste(
  "City: ", tract_only$CityName, "<br>",
  "Smoking Rate: ", round(tract_only$Data_Value, 1), "%"
)

# Create the map with tooltips
leaflet() |>
  setView(lng = -72.7, lat = 41.6, zoom = 7) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = tract_only,
    lng = ~long,
    lat = ~lat,
    radius = tract_only$Data_Value * 100,
    popup = popup_content,
    color = "#A3E4D7",
    fillColor = "#A3E4D7",
    fillOpacity = 0.6
  )

:::

```

5. Write a paragraph

The plots I created display current smoking prevalence data among adults aged 18 and older across cities and census tracts in Connecticut for the year 2017. The first visualization is a dot plot that shows the smoking rate by city, where each point represents a different city and is plotted against the percentage of adults who currently smoke. The chart uses a clean pastel color to emphasize simplicity and makes it easy to compare smoking rates visually across urban areas. This plot highlights variation among cities and reveals that some, like Bridgeport and Hartford, may have higher smoking rates than others, such as Stamford or Norwalk.

The second visualization is an interactive map created using the leaflet package. This map provides a geographic representation of smoking prevalence at the census tract level. Each circle on the map is centered on a tract’s coordinates, and its radius is scaled according to the smoking rate in that area. When a user clicks on a circle, a tooltip displays the city name and smoking percentage, adding context and interactivity. This mapping tool enhances the visual storytelling by allowing viewers to explore spatial disparities in health behavior. Together, these plots help illustrate how smoking prevalence varies not only by city but also within smaller geographic areas, supporting more targeted and location-specific public health interventions.