library(tidyverse)
library(tidyr)
library(leaflet)
library(ggplot2)
library(RColorBrewer)
setwd("C:/Users/Erika/OneDrive/Desktop/DATA 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities HW
For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.
1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.
Filter chunk here (you may need multiple chunks)
First I load everything into R
I seperate the latitude and longitude
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter out for the exclusion/inclusion criteria, I just wanted an area in Maryland and 2017 was the only available year
latlong_filter <- latlong |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == "2017") |>
filter(StateDesc == "Maryland") |>
filter(Category == "Unhealthy Behaviors") |>
filter(GeographicLevel == "Census Tract")
head(latlong_filter)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MD Maryland Baltimore Census Tract BRFSS Unhealthy Beha…
2 2017 MD Maryland Baltimore Census Tract BRFSS Unhealthy Beha…
3 2017 MD Maryland Baltimore Census Tract BRFSS Unhealthy Beha…
4 2017 MD Maryland Baltimore Census Tract BRFSS Unhealthy Beha…
5 2017 MD Maryland Baltimore Census Tract BRFSS Unhealthy Beha…
6 2017 MD Maryland Baltimore Census Tract BRFSS Unhealthy Beha…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Next I got rid of variables I won’t use to simplify the data
MD_Filter <- latlong_filter |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote, -TractFIPS, -UniqueID, -StateAbbr)
head(MD_Filter)# A tibble: 6 × 15
Year StateDesc CityName GeographicLevel Category Measure Data_Value_Type
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 Maryland Baltimore Census Tract Unhealthy B… Curren… Crude prevalen…
2 2017 Maryland Baltimore Census Tract Unhealthy B… No lei… Crude prevalen…
3 2017 Maryland Baltimore Census Tract Unhealthy B… Obesit… Crude prevalen…
4 2017 Maryland Baltimore Census Tract Unhealthy B… No lei… Crude prevalen…
5 2017 Maryland Baltimore Census Tract Unhealthy B… Binge … Crude prevalen…
6 2017 Maryland Baltimore Census Tract Unhealthy B… Curren… Crude prevalen…
# ℹ 8 more variables: Data_Value <dbl>, PopulationCount <dbl>, lat <dbl>,
# long <dbl>, CategoryID <chr>, MeasureId <chr>, CityFIPS <dbl>,
# Short_Question_Text <chr>
After, I set Baltimore’s latidude and longitude
B_lat = 39.2905
B_long = -76.61042. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
plot1 <- MD_Filter |>
ggplot() +
geom_bar(aes(x= Short_Question_Text, y=Data_Value, fill = Measure),
position = "dodge", stat = "identity") +
labs(fill = "Measure Description",
y = "Data Value",
x = "Unhealthy Habit",
title = "Unhealthy Habits of People in Baltimore (2017)",
caption = "CDC - 500 Cities Project: 2016 to 2019") +
scale_x_discrete(guide = guide_axis(angle = 45)) + # I used "https://stackoverflow.com/questions/1330989/rotating-and-spacing-axis-labels-in-ggplot2" to help me angle the x values so they don't overlap
theme_minimal() +
scale_fill_brewer(palette = "Accent")
plot1Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_bar()`).
3. Now create a map of your subsetted dataset.
First map chunk here
leaflet() |>
setView(lng = -76.6, lat = 39.3, zoom = 11) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = MD_Filter,
radius = MD_Filter$Data_Value,
color = "#AB3A95")Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
Refined map chunk here
Here is the tooltip
popupcity <- paste0(
"<b>Population: </b>", MD_Filter$PopulationCount, "<br>",
"<b>Unhealthy Behavior: </b>", MD_Filter$Short_Question_Text, "<br>",
"<b>Data Value: </b>", MD_Filter$Data_Value, "<br>",
"<b>Measure Desc: </b>", MD_Filter$Measure, "<br>"
)leaflet() |>
setView(lng = B_long, lat = B_lat, zoom = 11) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = MD_Filter,
radius = MD_Filter$Data_Value,
color = "#C73C86",
fillColor = "#4EE6B9",
fillOpacity = 2,
popup = popupcity)Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
In a paragraph, describe the plots you created and the insights they show.
In my first plot, I did a bar graph to show which unhealthy habit out of the four was the most prevalent in Baltimore (out of obestiy, drinking, smoking, and physical inactivity). After looking at it, obesity seems to be the most common though physical inactivity is close. Binge drinking and smoking are pretty low. However, I am wondering what kind of smoking they are referring to because I think it would change the data. For the maps, I just plotted what the data gave me onto the map. Most of the points are evenly spread out and there are not many clusters but it does fade out around the edges of the city. Also after looking through the points, the unhealthy habits are also not exclusive to a certain area, most are all over the city. Overall, I think the bar graph provided good information but the maps are interesting to see the specific locations!