Healthy Cities GIS Assignment

Author

Paul Daniel-Orie

Load the libraries and set the working directory

library(leaflet)
library(tidyverse)
library(tidyr)
setwd("C:/Users/Owner/OneDrive/Desktop/Data110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
   mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)

# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |> 
  filter(StateDesc != "United States") |> 
  filter(GeographicLevel != "Census Tract") |> 
  filter(Data_Value_Type == "Crude prevalence") |> 
  filter(Year == 2017)

# Convert variable names to lowercase
names(latlong_clean) <- tolower(names(latlong_clean))

# Display the first few rows
head(latlong_clean)

# A tibble: 6 × 25
   year stateabbr statedesc  cityname   geographiclevel datasource category     
  <dbl> <chr>     <chr>      <chr>      <chr>           <chr>      <chr>        
1  2017 CA        California Hawthorne  City            BRFSS      Unhealthy Be…
2  2017 CA        California Hayward    City            BRFSS      Unhealthy Be…
3  2017 CA        California Lakewood   City            BRFSS      Unhealthy Be…
4  2017 CA        California Livermore  City            BRFSS      Health Outco…
5  2017 AL        Alabama    Hoover     City            BRFSS      Health Outco…
6  2017 AL        Alabama    Huntsville City            BRFSS      Health Outco…
# ℹ 18 more variables: uniqueid <chr>, measure <chr>, data_value_unit <chr>,
#   datavaluetypeid <chr>, data_value_type <chr>, data_value <dbl>,
#   low_confidence_limit <dbl>, high_confidence_limit <dbl>,
#   data_value_footnote_symbol <chr>, data_value_footnote <chr>,
#   populationcount <dbl>, lat <dbl>, long <dbl>, categoryid <chr>,
#   measureid <chr>, cityfips <dbl>, tractfips <dbl>, short_question_text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)

 [1] "year"                       "stateabbr"                 
 [3] "statedesc"                  "cityname"                  
 [5] "geographiclevel"            "datasource"                
 [7] "category"                   "uniqueid"                  
 [9] "measure"                    "data_value_unit"           
[11] "datavaluetypeid"            "data_value_type"           
[13] "data_value"                 "low_confidence_limit"      
[15] "high_confidence_limit"      "data_value_footnote_symbol"
[17] "data_value_footnote"        "populationcount"           
[19] "lat"                        "long"                      
[21] "categoryid"                 "measureid"                 
[23] "cityfips"                   "tractfips"                 
[25] "short_question_text"

Remove the variables that will not be used in the assignment

latlong_clean2 <- latlong_clean |>
  select(-datasource,-data_value_unit, -datavaluetypeid, -low_confidence_limit, -high_confidence_limit, -data_value_footnote_symbol, -data_value_footnote)
head(latlong_clean2)

# A tibble: 6 × 18
   year stateabbr statedesc  cityname  geographiclevel category uniqueid measure
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>    <chr>    <chr>  
1  2017 CA        California Hawthorne City            Unhealt… 632548   Curren…
2  2017 CA        California Hayward   City            Unhealt… 633000   Obesit…
3  2017 CA        California Lakewood  City            Unhealt… 639892   Obesit…
4  2017 CA        California Livermore City            Health … 641992   Curren…
5  2017 AL        Alabama    Hoover    City            Health … 135896   Chroni…
6  2017 AL        Alabama    Huntsvil… City            Health … 137000   Corona…
# ℹ 10 more variables: data_value_type <chr>, data_value <dbl>,
#   populationcount <dbl>, lat <dbl>, long <dbl>, categoryid <chr>,
#   measureid <chr>, cityfips <dbl>, tractfips <dbl>, short_question_text <chr>

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with a cleaned dataset.

1. Once you run the above code and learn how to filter in this format, filter this dataset however you choose so that you have a subset with no more than 900 observations.

Filter chunk here

latlong_clean3 <- latlong_clean2 |> 
  filter(category== "Prevention")
# Display the first few rows
head(latlong_clean3)

# A tibble: 6 × 18
   year stateabbr statedesc  cityname  geographiclevel category uniqueid measure
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>    <chr>    <chr>  
1  2017 AL        Alabama    Montgome… City            Prevent… 151000   "Chole…
2  2017 CA        California Concord   City            Prevent… 616000   "Visit…
3  2017 CA        California Concord   City            Prevent… 616000   "Chole…
4  2017 CA        California Fontana   City            Prevent… 624680   "Visit…
5  2017 FL        Florida    Palm Coa… City            Prevent… 1254200  "Curre…
6  2017 FL        Florida    Tampa     City            Prevent… 1271000  "Chole…
# ℹ 10 more variables: data_value_type <chr>, data_value <dbl>,
#   populationcount <dbl>, lat <dbl>, long <dbl>, categoryid <chr>,
#   measureid <chr>, cityfips <dbl>, tractfips <dbl>, short_question_text <chr>

Filter out the state of california.

latlong_clean3_CA <- latlong_clean3|>
  filter(stateabbr== "CA")
head(latlong_clean3_CA)

# A tibble: 6 × 18
   year stateabbr statedesc  cityname  geographiclevel category uniqueid measure
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>    <chr>    <chr>  
1  2017 CA        California Concord   City            Prevent… 616000   "Visit…
2  2017 CA        California Concord   City            Prevent… 616000   "Chole…
3  2017 CA        California Fontana   City            Prevent… 624680   "Visit…
4  2017 CA        California Stockton  City            Prevent… 675000   "Visit…
5  2017 CA        California Vacaville City            Prevent… 681554   "Curre…
6  2017 CA        California Alhambra  City            Prevent… 600884   "Curre…
# ℹ 10 more variables: data_value_type <chr>, data_value <dbl>,
#   populationcount <dbl>, lat <dbl>, long <dbl>, categoryid <chr>,
#   measureid <chr>, cityfips <dbl>, tractfips <dbl>, short_question_text <chr>

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

ggplot(latlong_clean3_CA, aes(x = data_value, fill = short_question_text)) +
  geom_density(alpha = 0.5) +
  labs(
    title = "Density Distribution of \nCrude Prevalence by Prevention",
    x = "Crude Prevalence",
    fill = "category",
    caption = "source:cdc.gov/places",
  ) +
  theme_minimal()

## Set the lat and long values for California

lat is + : north of the equator lat is - : south of the equator long +: east of the prime meridian long - : west of the prime meridian First map chunk here

california_lon <- -119.417931
california_lat <- 36.778259

Create a popup using paste0

create a line break using < br >
surround text with < b > makes it bold

popupplot <- paste0(
      "<b>cityname: </b>", latlong_clean3_CA$cityname, "<br>",
      "<b>data_value: </b>", latlong_clean3_CA$data_value, "<br>",
      "<b>measureid: </b>", latlong_clean3_CA$measureid, "<br>",
      "<b>shortquestiontext",  latlong_clean3_CA$short_question_text,"<br>" )

3. Now create a map of your subsetted dataset.

leaflet() |>
  setView(lng = -119.4, lat = 36.7, zoom =6) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
  data = latlong_clean3_CA,
  lng = ~long,
  lat = ~lat,
  radius = 500, # Fixed radius in meters
  color = "#14010d",
  fillColor = "#f2079c",
  fillOpacity = 0.3,
  popup = popupplot
)

5. Write a paragraph

In a paragraph, describe the plots you created and what they show.

Density Distribution of Crude Prevalence by Prevention – California Focus

I created a density plot to illustrate the distribution of crude prevalence across different health prevention categories in California. To ensure I captured less than 900 observation, I filtered out census tract data and focused specifically on data related to California.

Crude prevalence refers to the overall proportion of a population that has a specific condition or characteristic at a given point in time, without accounting for specific subgroups or risk factors.

This density plot visualizes the distribution for four preventive health categories:

Annual Checkup

Cholesterol Screening

Health Insurance

Taking Blood Pressure (BP) Medication

Each density curve shows how the prevalence values are distributed—indicating both the concentration of observations and the variability within each category.

Insights from the Plot:

Cholesterol Screening (Green) has the highest crude prevalence, centered around 80%, and displays a relatively broad distribution, suggesting it’s a common and widely accessed preventive service.

Annual Checkup (Pink) and Taking BP Medication (Purple) both center around 65–70%, with Annual Checkup showing a narrower peak, implying more consistent participation across the population.

Health Insurance (Blue) displays a much lower and broader distribution, peaking between 10–20%, which may reflect regional disparities or data anomalies within the filtered sample.

Overall, this plot provides valuable insights into how preventive health behaviors are adopted across California, highlighting potential areas for public health intervention or resource allocation.