library(tidyverse)
library(tidyr)
library(leaflet)
setwd("C:/Users/Administrator/OneDrive - montgomerycollege.edu/DATA 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|> #removing parenthesis is importan, must use []
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Veiwing Dataset
unique(latlong$Year)[1] 2017 2016
Filter the dataset
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2016) |>
filter(StateAbbr == "CA") |>
filter(Category == "Unhealthy Behaviors")
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2016 CA California Antioch Census Tract BRFSS Unhealthy Beha…
2 2016 CA California Alhambra Census Tract BRFSS Unhealthy Beha…
3 2016 CA California Anaheim Census Tract BRFSS Unhealthy Beha…
4 2016 CA California Antioch Census Tract BRFSS Unhealthy Beha…
5 2016 CA California Anaheim Census Tract BRFSS Unhealthy Beha…
6 2016 CA California Anaheim Census Tract BRFSS Unhealthy Beha…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
latlong_clean2 <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote, -UniqueID)
head(latlong_clean2)# A tibble: 6 × 17
Year StateAbbr StateDesc CityName GeographicLevel Category Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2016 CA California Antioch Census Tract Unhealthy Behavio… Sleepi…
2 2016 CA California Alhambra Census Tract Unhealthy Behavio… Sleepi…
3 2016 CA California Anaheim Census Tract Unhealthy Behavio… Sleepi…
4 2016 CA California Antioch Census Tract Unhealthy Behavio… Sleepi…
5 2016 CA California Anaheim Census Tract Unhealthy Behavio… Sleepi…
6 2016 CA California Anaheim Census Tract Unhealthy Behavio… Sleepi…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “latlong_clean2” is a manageable dataset now.
For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.
1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.
Filter chunk here (you may need multiple chunks)
latlong_clean2 <- latlong_clean |>
filter(CityName == "Berkeley")
head(latlong_clean2)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2016 CA California Berkeley Census Tract BRFSS Unhealthy Beha…
2 2016 CA California Berkeley Census Tract BRFSS Unhealthy Beha…
3 2016 CA California Berkeley Census Tract BRFSS Unhealthy Beha…
4 2016 CA California Berkeley Census Tract BRFSS Unhealthy Beha…
5 2016 CA California Berkeley Census Tract BRFSS Unhealthy Beha…
6 2016 CA California Berkeley Census Tract BRFSS Unhealthy Beha…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
ggplot(latlong_clean2, aes(x = "", y = Data_Value)) +
labs(x = "",
y = "Crude Prevalence for Ages 18+ (in %)",
title = "Percent of Adults In Berkeley CA, Who Sleep < 7 Hours Daily") +
theme_light() +
geom_boxplot(fill = "orange")3. Now create a map of your subsetted dataset.
First map chunk here
Centering My Map by Finding Mean of Lat and Long
mean(latlong_clean2$long)[1] -122.2716
mean(latlong_clean2$lat)[1] 37.87054
leaflet(latlong_clean2) |>
setView(lng = -122.2716, lat =37.87054, zoom = 12) |>
addProviderTiles("Esri.NatGeoWorldMap") |>
addCircles(data = latlong_clean2,
radius = (latlong_clean2$Data_Value*10),
fillOpacity = 0.3,
fillColor = "orange",
color= "darkorange")Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
Refined map chunk here
popupsleep <- paste0(
"<b>Sleep < 7 Hours: </b>", "<br>",
"<b>Year: </b>", latlong_clean2$Year, "<br>",
"<b>State: </b>", latlong_clean2$StateAbbr, "<br>",
"<b>City: </b>", latlong_clean2$CityName, "<br>",
"<b>Population: </b>", latlong_clean2$PopulationCount, "<br>",
"<b>Crude Prevalence in %: </b>", latlong_clean2$Data_Value, "<br>",
"<b>Lat: </b>", latlong_clean2$lat, "<br>",
"<b>Long: </b>", latlong_clean2$long, "<br>"
)leaflet(latlong_clean2) |>
setView(lng = -122.2716, lat =37.87054, zoom = 12.4999) |>
addProviderTiles("CyclOSM") |>
addCircles(data = latlong_clean2,
radius = (latlong_clean2$Data_Value*10),
fillOpacity = 0.3,
fillColor = "orange",
color= "darkorange",
popup = popupsleep) |>
addCircleMarkers(lng = -122.2595, lat = 37.8719, radius = 8, color = "red", fillColor = "red", fillOpacity = 0.8, popup = "UC Berkeley") |>
addCircleMarkers(lng = -122.261856, lat = 37.875594, radius = 8, color = "red", fillColor = "red", fillOpacity = 0.8, popup = "Graduate Theological Union") |>
addCircleMarkers(lng = -122.2699455, lat = 37.8697807, radius = 8, color = "red", fillColor = "red", fillOpacity = 0.8, popup = "Berkeley City College")Assuming "long" and "lat" are longitude and latitude, respectively
#where I found circle markers code: https://stackoverflow.com/questions/31930616/multiple-addcirclemarkers-layers-using-leaflet-in-r
5. Write a paragraph
During the filtering process, I chose to focus on the state of California and narrowed down the data to the city of Berkeley. Filtering for Unhealthy behavior revealed that the only prevalent behavior was a lack of sleep. This interested me because I know there are prestigious Universities in the area that may have some influence. I decided to change the background to show the city’s traffic, in addition to highlighting three of the major Universities in Berkeley, to possibly reveal a pattern of lack of sleep surrounding the Universities.