Healthy Cities HW

Author

E Lott

For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.

1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.

Filter chunk here (you may need multiple chunks)

First I load everything into R

library(tidyverse)
library(tidyr)
library(leaflet)
library(ggplot2)
library(RColorBrewer)
setwd("C:/Users/Erika/OneDrive/Desktop/DATA 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

I seperate the latitude and longitude

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)

# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter out for the exclusion/inclusion criteria, I just wanted an area in Maryland and 2017 was the only available year

latlong_filter <- latlong |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == "2017") |>
  filter(StateDesc == "Maryland") |>
  filter(Category == "Unhealthy Behaviors") |>
  filter(GeographicLevel == "Census Tract")
head(latlong_filter)

# A tibble: 6 × 25
   Year StateAbbr StateDesc CityName  GeographicLevel DataSource Category       
  <dbl> <chr>     <chr>     <chr>     <chr>           <chr>      <chr>          
1  2017 MD        Maryland  Baltimore Census Tract    BRFSS      Unhealthy Beha…
2  2017 MD        Maryland  Baltimore Census Tract    BRFSS      Unhealthy Beha…
3  2017 MD        Maryland  Baltimore Census Tract    BRFSS      Unhealthy Beha…
4  2017 MD        Maryland  Baltimore Census Tract    BRFSS      Unhealthy Beha…
5  2017 MD        Maryland  Baltimore Census Tract    BRFSS      Unhealthy Beha…
6  2017 MD        Maryland  Baltimore Census Tract    BRFSS      Unhealthy Beha…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Next I got rid of variables I won’t use to simplify the data

MD_Filter <- latlong_filter |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote, -TractFIPS, -UniqueID, -StateAbbr)
head(MD_Filter)

# A tibble: 6 × 15
   Year StateDesc CityName  GeographicLevel Category     Measure Data_Value_Type
  <dbl> <chr>     <chr>     <chr>           <chr>        <chr>   <chr>          
1  2017 Maryland  Baltimore Census Tract    Unhealthy B… Curren… Crude prevalen…
2  2017 Maryland  Baltimore Census Tract    Unhealthy B… No lei… Crude prevalen…
3  2017 Maryland  Baltimore Census Tract    Unhealthy B… Obesit… Crude prevalen…
4  2017 Maryland  Baltimore Census Tract    Unhealthy B… No lei… Crude prevalen…
5  2017 Maryland  Baltimore Census Tract    Unhealthy B… Binge … Crude prevalen…
6  2017 Maryland  Baltimore Census Tract    Unhealthy B… Curren… Crude prevalen…
# ℹ 8 more variables: Data_Value <dbl>, PopulationCount <dbl>, lat <dbl>,
#   long <dbl>, CategoryID <chr>, MeasureId <chr>, CityFIPS <dbl>,
#   Short_Question_Text <chr>

After, I set Baltimore’s latidude and longitude

B_lat = 39.2905
B_long = -76.6104

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

plot1 <- MD_Filter |>
  ggplot() +
  geom_bar(aes(x= Short_Question_Text, y=Data_Value, fill = Measure),
      position = "dodge", stat = "identity") +
  labs(fill = "Measure Description",
       y = "Data Value",
       x = "Unhealthy Habit",
       title = "Unhealthy Habits of People in Baltimore (2017)",
       caption = "CDC - 500 Cities Project: 2016 to 2019") +
  scale_x_discrete(guide = guide_axis(angle = 45)) + # I used "https://stackoverflow.com/questions/1330989/rotating-and-spacing-axis-labels-in-ggplot2" to help me angle the x values so they don't overlap
  theme_minimal() +
  scale_fill_brewer(palette = "Accent")
plot1

Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_bar()`).

3. Now create a map of your subsetted dataset.

First map chunk here

leaflet() |>
  setView(lng = -76.6, lat = 39.3, zoom = 11) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = MD_Filter,
    radius = MD_Filter$Data_Value,
    color = "#AB3A95")

Assuming "long" and "lat" are longitude and latitude, respectively

Here is the tooltip

popupcity <- paste0(
      "<b>Population: </b>", MD_Filter$PopulationCount, "<br>",
      "<b>Unhealthy Behavior: </b>", MD_Filter$Short_Question_Text, "<br>",
      "<b>Data Value: </b>", MD_Filter$Data_Value, "<br>",
      "<b>Measure Desc: </b>", MD_Filter$Measure, "<br>"
    )

leaflet() |>
  setView(lng = B_long, lat = B_lat, zoom = 11) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = MD_Filter,
    radius = MD_Filter$Data_Value,
    color = "#C73C86",
    fillColor = "#4EE6B9",
    fillOpacity = 2,
    popup = popupcity)

Assuming "long" and "lat" are longitude and latitude, respectively

5. Write a paragraph

In a paragraph, describe the plots you created and the insights they show.

In my first plot, I did a bar graph to show which unhealthy habit out of the four was the most prevalent in Baltimore (out of obestiy, drinking, smoking, and physical inactivity). After looking at it, obesity seems to be the most common though physical inactivity is close. Binge drinking and smoking are pretty low. However, I am wondering what kind of smoking they are referring to because I think it would change the data. For the maps, I just plotted what the data gave me onto the map. Most of the points are evenly spread out and there are not many clusters but it does fade out around the edges of the city. Also after looking through the points, the unhealthy habits are also not exclusive to a certain area, most are all over the city. Overall, I think the bar graph provided good information but the maps are interesting to see the specific locations!