Healthy Cities GIS Assignment

Author

Duchelle K

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
setwd("C:/Users/User/Downloads/Data 110 Projects and Assignments")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Category == "Prevention") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017)
head(latlong_clean)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName   GeographicLevel DataSource Category  
  <dbl> <chr>     <chr>      <chr>      <chr>           <chr>      <chr>     
1  2017 AL        Alabama    Montgomery City            BRFSS      Prevention
2  2017 CA        California Concord    City            BRFSS      Prevention
3  2017 CA        California Concord    City            BRFSS      Prevention
4  2017 CA        California Fontana    City            BRFSS      Prevention
5  2017 CA        California Richmond   Census Tract    BRFSS      Prevention
6  2017 FL        Florida    Davie      Census Tract    BRFSS      Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

prevention <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)
# A tibble: 6 × 18
   Year StateAbbr StateDesc  CityName  GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>    <chr>    <chr>  
1  2017 AL        Alabama    Montgome… City            Prevent… 151000   Choles…
2  2017 CA        California Concord   City            Prevent… 616000   Visits…
3  2017 CA        California Concord   City            Prevent… 616000   Choles…
4  2017 CA        California Fontana   City            Prevent… 624680   Visits…
5  2017 CA        California Richmond  Census Tract    Prevent… 0660620… Choles…
6  2017 FL        Florida    Davie     Census Tract    Prevent… 1216475… Choles…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
md <- prevention |>
  filter(StateAbbr=="MD")
head(md)
# A tibble: 6 × 18
   Year StateAbbr StateDesc CityName  GeographicLevel Category  UniqueID Measure
  <dbl> <chr>     <chr>     <chr>     <chr>           <chr>     <chr>    <chr>  
1  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Chole…
2  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Visit…
3  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Visit…
4  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Curre…
5  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Curre…
6  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Visit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with the cleaned “Prevention” dataset

1. Once you run the above code, filter this dataset one more time for any particular subset.

Filter chunk here

unique(latlong_clean$StateAbbr)
 [1] "AL" "CA" "FL" "CT" "IL" "MN" "NY" "PA" "NC" "OH" "OK" "OR" "TX" "RI" "SC"
[16] "SD" "TN" "UT" "VA" "WA" "AK" "WI" "AZ" "AR" "CO" "DE" "NV" "DC" "GA" "ID"
[31] "HI" "MA" "MI" "IN" "KS" "KY" "IA" "LA" "MD" "ME" "NH" "NJ" "NM" "MO" "MS"
[46] "NE" "MT" "ND" "WV" "VT" "WY"

Filter the lack of insurance measure for the five most populous cities in California.

The five most populated cities in California are:Los Angeles, San Diego, San Jose, San Francisco, Fresno.

CA_no_insurance <- prevention |>
  filter(StateAbbr == "CA") |>
  filter(CityName == c("Los Angeles", "San Diego", "San Jose", "San Francisco", "Fresno")) |>
  filter(MeasureId == "ACCESS2") 
Warning: There was 1 warning in `filter()`.
ℹ In argument: `==...`.
Caused by warning in `CityName == c("Los Angeles", "San Diego", "San Jose", "San Francisco", "Fresno")`:
! longer object length is not a multiple of shorter object length
head(CA_no_insurance)
# A tibble: 6 × 18
   Year StateAbbr StateDesc  CityName GeographicLevel Category  UniqueID Measure
  <dbl> <chr>     <chr>      <chr>    <chr>           <chr>     <chr>    <chr>  
1  2017 CA        California Fresno   Census Tract    Preventi… 0627000… "Curre…
2  2017 CA        California Fresno   Census Tract    Preventi… 0627000… "Curre…
3  2017 CA        California Fresno   Census Tract    Preventi… 0627000… "Curre…
4  2017 CA        California Fresno   Census Tract    Preventi… 0627000… "Curre…
5  2017 CA        California Fresno   Census Tract    Preventi… 0627000… "Curre…
6  2017 CA        California Fresno   Census Tract    Preventi… 0627000… "Curre…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

plot1 <- ggplot(CA_no_insurance, aes(x = PopulationCount, y = Data_Value, color = CityName))+
  geom_point(alpha = 0.5) +
  scale_color_viridis_d() +
  geom_jitter() +
  facet_wrap(~CityName) +
  labs( title = "Proportion of Population Without Health Insurance",
        subtitle = "California top 5 cities",
        x = "Population Count",
        y= "Percentage without Insurance",
        color = "City Name",
        caption = "Source: Center for Disease Control and Prevention") +
  theme_bw()
  
plot1 
Warning: Removed 9 rows containing missing values or values outside the scale range
(`geom_point()`).
Removed 9 rows containing missing values or values outside the scale range
(`geom_point()`).

3. Now create a map of your subsetted dataset.

First map chunk here

# Install the libraries
library(leaflet)
Warning: package 'leaflet' was built under R version 4.4.1
library(sf)
Warning: package 'sf' was built under R version 4.4.1
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(knitr)

# Create the map
leaflet() |>
  setView(lng = -118.2437, lat = 34.0522, zoom = 12) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(data = CA_no_insurance,
             radius = (CA_no_insurance$PopulationCount)/10,
             color = "green",
             fillColor = "grey",
             fillOpacity = 0.25)
Assuming "long" and "lat" are longitude and latitude, respectively

4. Refine your map to include a mousover tooltip

Refined map chunk here

# create a popup string for an interactive map
mappopup <- paste0("<b>Year:</b> ", CA_no_insurance$Year, "<br>",
                   "<b>Geographic Level:</b> ", CA_no_insurance$GeographicLevel, "<br>",
                   "<b>City Name:</b>", CA_no_insurance$CityName, "<br>",
                   "<b>Population Count:</b> ", CA_no_insurance$PopulationCount, "<br>",
                   "<b>Lack of Insurance Rate:</b> ", CA_no_insurance$Data_Value, "<br>")
# Include the mousover tooltip on the map
leaflet() |>
  setView(lng = -118.2437, lat = 34.0522, zoom = 12) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(data = CA_no_insurance,
             radius = (CA_no_insurance$PopulationCount)/10,
             color = "green",
             fillColor = "grey",
             fillOpacity = 0.25,
             popup =  paste(mappopup))
Assuming "long" and "lat" are longitude and latitude, respectively

5. Write a paragraph

The first visualization is a scatterplot that illustrates the proportion of the population without health insurance in the top five cities in California: Fresno, Los Angeles, San Diego, San Francisco, and San Jose. The plot is divided into five panels, each representing a specific city, with the x-axis representing the population count and the y-axis representing the percentage of the population without health insurance. Different colors are used to distinguish between the cities. The plot is designed to show how the uninsured rate varies with city population size. We might infer that larger cities could have a higher absolute number of uninsured individuals, but this does not necessarily mean a higher percentage of the population is uninsured since the population number varies per cities. There is noticeable variability in the percentage of the population without health insurance within each city. In Fresno the points are spread across a wide range of population counts, with percentages without insurance varying from about 5% to 35% while in Los Angeles the majority of the population counts cluster around 2,500 to 5,000, with a percentage without insurance mainly between 5% and 35%. Additionnally, in San Diego data points are scattered with population counts ranging from 2,500 to 10,000, and the percentage without insurance appears to be more spread out, varying from 5% to 35%. However, population counts in San Francisco range from 2,500 to 7,500 with a percentage without insurance generally between 5% and 20%. In San Jose the data points show a wide range of population counts, mostly clustering around 2,500 to 5,000. The percentage without insurance mainly ranges from 5% to 25%. Some cities like Los Angeles and San Jose show tighter clusters of population counts, indicating more uniformity in the number of people in the sample data.

The map visualization provides a geographical representation of population and health insurance data, focusing initially on Los Angeles with the Esri WorldStreetMap as the base layer. Circles on the map represent various locations, with their radii proportional to the population count, allowing easy comparison of population sizes across areas. By visualizing population counts through the size of the circles, we can quickly assess which areas are more densely populated. Each circle has a green border and a grey fill with adjusted opacity, enhancing visual clarity. When interacted with, these circles display key observations including the year, geographic level, city name, population count, and lack of insurance rate. This visualization highlights population distribution and health insurance coverage, making it easy to identify densely populated areas and regions with higher or lower rates of uninsured individuals. The inclusion of temporal and spatial context(year and geographic level) allows to understand trends and patterns over time and across different areas, serving as a powerful tool for public health analysis and policy-making.