Healthy Cities GIS Assignment

Author

NCowan

Load the libraries and set the working directory

library(leaflet)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
setwd("~/Desktop/DATA 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
Rows: 810103 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): StateAbbr, StateDesc, CityName, GeographicLevel, DataSource, Categ...
dbl  (6): Year, Data_Value, Low_Confidence_Limit, High_Confidence_Limit, Cit...
num  (1): PopulationCount

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data(cities500)
Warning in data(cities500): data set 'cities500' not found

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017) |>
  filter(Category == "Unhealthy Behaviors")
head(latlong_clean)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName   GeographicLevel DataSource Category     
  <dbl> <chr>     <chr>      <chr>      <chr>           <chr>      <chr>        
1  2017 CA        California Hawthorne  City            BRFSS      Unhealthy Be…
2  2017 CA        California Hayward    City            BRFSS      Unhealthy Be…
3  2017 CA        California Lakewood   City            BRFSS      Unhealthy Be…
4  2017 AL        Alabama    Huntsville Census Tract    BRFSS      Unhealthy Be…
5  2017 AZ        Arizona    Avondale   Census Tract    BRFSS      Unhealthy Be…
6  2017 AZ        Arizona    Chandler   City            BRFSS      Unhealthy Be…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

latlong_clean2 <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)
# A tibble: 6 × 18
   Year StateAbbr StateDesc  CityName  GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>    <chr>    <chr>  
1  2017 CA        California Hawthorne City            Unhealt… 632548   Curren…
2  2017 CA        California Hayward   City            Unhealt… 633000   Obesit…
3  2017 CA        California Lakewood  City            Unhealt… 639892   Obesit…
4  2017 AL        Alabama    Huntsvil… Census Tract    Unhealt… 0137000… Obesit…
5  2017 AZ        Arizona    Avondale  Census Tract    Unhealt… 0404720… Obesit…
6  2017 AZ        Arizona    Chandler  City            Unhealt… 412000   No lei…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new dataset “latlong_clean2” is a manageable dataset now.

For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.

1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.

Filter chunk here (you may need multiple chunks)

#I researched the most and least walkable cities in the USA in 2017. From multiple articles, it seemed Fayetteville was the clear winner of the least. For the most walkable, it was a tie between New York and San Francisco but New York had too much data so I went with San Francisco to make my observations under 900. I wanted to see the differences in obesity rates between walkable and unwalkable cities in the US so I filtered for obesity in the measure id. However, after doing some of the project, the population difference was making the porject akward so I changed my unwalkable city to Charlotte NC since it was closer in population to San Francisco but was still not considered very walkable. 
 myfilter_data <- latlong_clean2 |>
  filter(CityName %in% c("San Francisco", "Charlotte" )) |>
  filter(StateAbbr %in% c("CA", "NC")) |>
  filter(MeasureId == "OBESITY")

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

# non map plot
ggplot(myfilter_data, aes(x = CityName, y = Data_Value)) +
  geom_point() +
  labs(title = "Obesity in Selected Cities",
       x = "City Names",
       y= "Percent of Adults with Obesity",
       caption = "Source: CDC 500 Healthy Cities ") +
  theme_bw()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

3. Now create a map of your subsetted dataset.

First map chunk here

leaflet() |>
  setView(lng = -122.4194, lat = 37.7749, zoom = 10) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = myfilter_data |> filter(CityName == "San Francisco"),
    lat = ~lat,
    lng = ~long,
    radius = ~Data_Value * 10 
  )
leaflet() |>
  setView(lng = -80.8431, lat = 35.2271, zoom = 10) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = myfilter_data |> filter(CityName == "Charlotte"),
    lat = ~lat,
    lng = ~long,
    radius = ~Data_Value * 10 
  )

4. Refine your map to include a mouse-click tooltip

Refined map chunk here

sf_data <- myfilter_data |> filter(CityName == "San Francisco")
clt_data <- myfilter_data |> filter(CityName == "Charlotte")
leaflet() |>
  setView(lng = -122.4194, lat = 37.7749, zoom = 10) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = sf_data,
    lat = ~lat,
    lng = ~long,
    radius = ~Data_Value * 10,
    color = "#140100",
    fillColor = "#f2079c",
    fillOpacity = 0.35,
    popup = paste(
      "<b>City: </b>", sf_data$CityName, "<br>",
      "<b>Obesity Rate: </b>", sf_data$Data_Value, "%<br>",
      "<b>State: </b>", sf_data$StateAbbr
    )
  )
leaflet() |>
  setView(lng = -80.8431, lat = 35.2271, zoom = 10) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = clt_data,
    lat = ~lat,
    lng = ~long,
    radius = ~Data_Value * 10,
    color = "#140100",
    fillColor = "#f2079c",
    fillOpacity = 0.35,
    popup = paste(
      "<b>City: </b>", clt_data$CityName, "<br>",
      "<b>Obesity Rate: </b>", clt_data$Data_Value, "%<br>",
      "<b>State: </b>", clt_data$StateAbbr
    )
  )

5. Write a paragraph

I wanted to make a graph that showed obesity rates in walkable and unwalkbale cities. I originally wanted to do a comparison of a few cities that were the most and least walkable, but foudn that I could only do one city each as there were so many observations. Originally I had picked Fayetteville, North Carolina as my least walkable as multiple articles form 2017 had stated it was the least walkable. The most walkable was tied between San Francisco and New York but I went with SF because there were fewer observations, making it easier to manage and handle the data. However, the further along I went in the project using these two cities, I found that the population difference in the two cities were so great that the maps look awkward. So, I changed my unwalkable city to Charlotte, NC which was still unwalkable according to a few articles, and it had a similar population to SF. Finally, as I was mapping these out, I found that because these two cities were essentially on the other side of the US, it was easier to make two maps that had the same zoom so you could more easily see, side by side, the comparison of obesity rates in these two cities.