library(tidyverse)
library(tidyr)
library(leaflet)
cities500 <- read_csv("/Users/cody/Downloads/500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017) |>
filter(StateAbbr == "CT") |>
filter(Category == "Unhealthy Behaviors")
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CT Connecticut Bridgeport Census Tract BRFSS Unhealthy B…
2 2017 CT Connecticut Danbury City BRFSS Unhealthy B…
3 2017 CT Connecticut Norwalk Census Tract BRFSS Unhealthy B…
4 2017 CT Connecticut Bridgeport Census Tract BRFSS Unhealthy B…
5 2017 CT Connecticut Hartford Census Tract BRFSS Unhealthy B…
6 2017 CT Connecticut Waterbury Census Tract BRFSS Unhealthy B…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
latlong_clean2 <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CT Connecticut Bridgep… Census Tract Unhealt… 0908000… Obesit…
2 2017 CT Connecticut Danbury City Unhealt… 918430 Obesit…
3 2017 CT Connecticut Norwalk Census Tract Unhealt… 0955990… Obesit…
4 2017 CT Connecticut Bridgep… Census Tract Unhealt… 0908000… Curren…
5 2017 CT Connecticut Hartford Census Tract Unhealt… 0937000… Obesit…
6 2017 CT Connecticut Waterbu… Census Tract Unhealt… 0980000… Obesit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with a cleaned dataset.
1. Once you run the above code and learn how to filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations.
Filter in Hawaii
hi <- cities500 |>
filter(StateAbbr == "HI")down to 6,721 observations
Filter in unhealthy behaviors
hi_unhbeh <- hi |>
filter(CategoryID == "UNHBEH")down to 1,205 observations
Filter in physical inactivity
hi_lpa <- hi_unhbeh |>
filter(MeasureId == "LPA")down to 241 observations of physical inactivity cases in Hawaii. I am actually curious about Maryland, too.
md_lpa <- cities500 |>
filter(StateAbbr == "MD",
CategoryID == "UNHBEH",
MeasureId == "LPA")I am surprised to see Maryland having lower observations than Hawaii.
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
Lat and Long values for Honolulu, Hawaii
hi_lon <- -157.8583
hi_lat <- 21.3069I miss Hawaii
Bar plot of population by unhealthy behaviors in Hawaii
ggplot(hi_unhbeh) +
geom_bar(aes(x = MeasureId,
y = PopulationCount),
stat = "identity") + #ChatGPT assistance
theme_minimal() +
labs(title = "Population Count by Unhealthy Behavors in Hawaii",
x = "Measure ID",
y = "Population Count")3. Now create a map of your subsetted dataset.
First map chunk here
I dont like this plot at all but we are getting there
I have to do something about the population numbers and make them real numbers
Summarize the total population count per MeasureId
hi_summary <- hi_unhbeh |>
group_by(MeasureId) |>
summarize(total_population = sum(PopulationCount,
na.rm = TRUE))Renovated Plot
ggplot(data = hi_summary) +
geom_bar(aes(x = MeasureId,
y = total_population),
stat = "identity") +
scale_y_continuous(labels = scales::comma) + # ChatGPT assistance: Makes y-axis readable
labs(title = "Population Count by Unhealthy Behavors in Hawaii",
x = "Measure ID",
y = "Total Population Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45))There is something wrong with the bars at the same height. Each row in the dataset has a different measureid but they all have the same total population count value: 2,859,621. That means the population is being duplicated across the dataset. I have to change the dataset in the summary to data value, not the population count
Updated Dataset
This one uses the data value variable, not the population count. Then make the mean, average, out of the data value summary
hi_summary2 <- hi_unhbeh |>
group_by(MeasureId,) |>
summarize(avg_value = mean(Data_Value,
na.rm = TRUE))Rerenovated Plot
ggplot(data = hi_summary2) +
geom_bar(aes(x = MeasureId,
y = avg_value),
stat = "identity") +
labs(title = "Average Percentage of Unhealthy Behaviors in Hawaii",
x = "Behaviors",
y = "Average Percentage %") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) #ChatGPT assistance: hjust function to lower the label on x axisLeaflet Map Plot
New Dataset for Leaflet with splitted GeoLocation (lat, long) into two columns: lat and long with filtered hawaii state and unhealthy behavior.
hi_latlong <- latlong|>
filter(StateAbbr == "HI",
CategoryID == "UNHBEH")leaflet() |>
setView(lng = hi_lon,
lat = hi_lat,
zoom = 9) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(data = hi_latlong,
radius = hi_latlong$Data_Value *10,
color = "skyblue",
fillColor = "grey",
fillOpacity = 0.6)Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
Create a tooltip using paste0
Apply behavior, population and percentage into the box
tooltip_hi <- paste0("<b>Behavior: </b>",
hi_latlong$Short_Question_Text,
"<br>",
"<b>Population: </b>", hi_latlong$PopulationCount,
"<br>",
"<b>Percentage: </b>",
hi_latlong$Data_Value,
"<br>")Create the leaflet map plot
leaflet() |>
setView(lng = hi_lon,
lat = hi_lat,
zoom = 9) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(data = hi_latlong,
radius = hi_latlong$Data_Value *10,
color = "skyblue",
fillColor = "grey",
fillOpacity = 0.6,
popup = tooltip_hi)Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
I created a Leaflet map focusing on unhealthy behaviors in Hawaii, filtered from the 500 Cities dataset. After cleaning and separating the GeoLocation variable into latitude and longitude, I filtered the data to only include entries from Hawaii and CategoryID marked as “UNHBEH” (Unhealthy Behaviors). The map shows each dots in Hawaii where data is available, with circle sizes scaled by the data value for each behavior. The tooltips shows the behavior type, population count, and prevalence percentage. This interactive map helps visualize the unhealthy behavior in Hawaii, making it easier to identify patterns and compare areas of higher or lower concern.