Healthy Cities GIS Assignment

Author

Nadia O.

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
library(leaflet)
setwd("C:/Users/Administrator/OneDrive - montgomerycollege.edu/DATA 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|> #removing parenthesis is importan, must use []
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Veiwing Dataset

unique(latlong$Year)
[1] 2017 2016

Filter the dataset

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2016) |>
  filter(StateAbbr == "CA") |>
  filter(Category == "Unhealthy Behaviors")
head(latlong_clean)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName GeographicLevel DataSource Category       
  <dbl> <chr>     <chr>      <chr>    <chr>           <chr>      <chr>          
1  2016 CA        California Antioch  Census Tract    BRFSS      Unhealthy Beha…
2  2016 CA        California Alhambra Census Tract    BRFSS      Unhealthy Beha…
3  2016 CA        California Anaheim  Census Tract    BRFSS      Unhealthy Beha…
4  2016 CA        California Antioch  Census Tract    BRFSS      Unhealthy Beha…
5  2016 CA        California Anaheim  Census Tract    BRFSS      Unhealthy Beha…
6  2016 CA        California Anaheim  Census Tract    BRFSS      Unhealthy Beha…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

latlong_clean2 <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote, -UniqueID)
head(latlong_clean2)
# A tibble: 6 × 17
   Year StateAbbr StateDesc  CityName GeographicLevel Category           Measure
  <dbl> <chr>     <chr>      <chr>    <chr>           <chr>              <chr>  
1  2016 CA        California Antioch  Census Tract    Unhealthy Behavio… Sleepi…
2  2016 CA        California Alhambra Census Tract    Unhealthy Behavio… Sleepi…
3  2016 CA        California Anaheim  Census Tract    Unhealthy Behavio… Sleepi…
4  2016 CA        California Antioch  Census Tract    Unhealthy Behavio… Sleepi…
5  2016 CA        California Anaheim  Census Tract    Unhealthy Behavio… Sleepi…
6  2016 CA        California Anaheim  Census Tract    Unhealthy Behavio… Sleepi…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new dataset “latlong_clean2” is a manageable dataset now.

For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.

1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.

Filter chunk here (you may need multiple chunks)

latlong_clean2 <- latlong_clean |>
  filter(CityName == "Berkeley")
head(latlong_clean2)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName GeographicLevel DataSource Category       
  <dbl> <chr>     <chr>      <chr>    <chr>           <chr>      <chr>          
1  2016 CA        California Berkeley Census Tract    BRFSS      Unhealthy Beha…
2  2016 CA        California Berkeley Census Tract    BRFSS      Unhealthy Beha…
3  2016 CA        California Berkeley Census Tract    BRFSS      Unhealthy Beha…
4  2016 CA        California Berkeley Census Tract    BRFSS      Unhealthy Beha…
5  2016 CA        California Berkeley Census Tract    BRFSS      Unhealthy Beha…
6  2016 CA        California Berkeley Census Tract    BRFSS      Unhealthy Beha…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

ggplot(latlong_clean2, aes(x = "", y = Data_Value)) +
  labs(x = "",
       y = "Crude Prevalence for Ages 18+ (in %)",
       title = "Percent of Adults In Berkeley CA, Who Sleep < 7 Hours Daily") +
  theme_light() +
  geom_boxplot(fill = "orange")

3. Now create a map of your subsetted dataset.

First map chunk here

Centering My Map by Finding Mean of Lat and Long

mean(latlong_clean2$long)
[1] -122.2716
mean(latlong_clean2$lat)
[1] 37.87054
leaflet(latlong_clean2) |>
  setView(lng = -122.2716, lat =37.87054, zoom = 12) |>
  addProviderTiles("Esri.NatGeoWorldMap") |>
  addCircles(data = latlong_clean2,
    radius = (latlong_clean2$Data_Value*10),
    fillOpacity = 0.3,
    fillColor = "orange",
    color= "darkorange")
Assuming "long" and "lat" are longitude and latitude, respectively

4. Refine your map to include a mouse-click tooltip

Refined map chunk here

popupsleep <- paste0(
  "<b>Sleep < 7 Hours: </b>", "<br>",
  "<b>Year: </b>", latlong_clean2$Year, "<br>",
  "<b>State: </b>", latlong_clean2$StateAbbr, "<br>",
  "<b>City: </b>", latlong_clean2$CityName, "<br>",
  "<b>Population: </b>", latlong_clean2$PopulationCount, "<br>",
  "<b>Crude Prevalence in %: </b>", latlong_clean2$Data_Value, "<br>",
  "<b>Lat: </b>", latlong_clean2$lat, "<br>",
  "<b>Long: </b>", latlong_clean2$long, "<br>"
)
leaflet(latlong_clean2) |>
  setView(lng = -122.2716, lat =37.87054, zoom = 12.4999) |>
  addProviderTiles("CyclOSM") |>
  addCircles(data = latlong_clean2,
    radius = (latlong_clean2$Data_Value*10),
    fillOpacity = 0.3,
    fillColor = "orange",
    color= "darkorange",
    popup = popupsleep) |>
  addCircleMarkers(lng = -122.2595, lat = 37.8719, radius = 8, color = "red", fillColor = "red", fillOpacity = 0.8, popup = "UC Berkeley") |>
  addCircleMarkers(lng = -122.261856, lat = 37.875594, radius = 8, color = "red", fillColor = "red", fillOpacity = 0.8, popup = "Graduate Theological Union") |>
  addCircleMarkers(lng = -122.2699455, lat = 37.8697807, radius = 8, color = "red", fillColor = "red", fillOpacity = 0.8, popup = "Berkeley City College")
Assuming "long" and "lat" are longitude and latitude, respectively

#where I found circle markers code: https://stackoverflow.com/questions/31930616/multiple-addcirclemarkers-layers-using-leaflet-in-r

5. Write a paragraph

During the filtering process, I chose to focus on the state of California and narrowed down the data to the city of Berkeley. Filtering for Unhealthy behavior revealed that the only prevalent behavior was a lack of sleep. This interested me because I know there are prestigious Universities in the area that may have some influence. I decided to change the background to show the city’s traffic, in addition to highlighting three of the major Universities in Berkeley, to possibly reveal a pattern of lack of sleep surrounding the Universities.