library(leaflet)
library(tidyverse)
library(tidyr)
library(plotly)
setwd("~/aaaworkingdirectory")
<- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
cities500 data(cities500)
Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
<- cities500|>
latlong mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
<- latlong |>
latlong_clean filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)
# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Unhealthy Beh…
4 2017 CA California Indio Census Tract BRFSS Health Outcom…
5 2017 CA California Inglewood Census Tract BRFSS Health Outcom…
6 2017 CA California Lakewood City BRFSS Unhealthy Beh…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean)
[1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
<- latlong_clean |>
latlong_clean2 select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)
# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract Health … 0632548… Arthri…
2 2017 CA California Hawthorne City Unhealt… 632548 Curren…
3 2017 CA California Hayward City Unhealt… 633000 Obesit…
4 2017 CA California Indio Census Tract Health … 0636448… Arthri…
5 2017 CA California Inglewood Census Tract Health … 0636546… Diagno…
6 2017 CA California Lakewood City Unhealt… 639892 Obesit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with a cleaned dataset.
1. Once you run the above code and learn how to filter in this format, filter this dataset however you choose so that you have a subset with no more than 900 observations.
Filter chunk here
<- latlong_clean2 |>
latlong_final filter(PopulationCount < 250, MeasureId == "LPA", lat < 36)
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
<- latlong_final |>
plot1 ggplot(aes(Data_Value, lat)) +
geom_point(aes(color = StateAbbr)) +
labs(x = "Rate of Adults with No Leisure-time Physical Activity",
y = "Latitude",
caption = "CDC",
title = "Adults with No Leisure-time Physical Activity in Southern States",
color = "State")
plot1
Warning: Removed 438 rows containing missing values or values outside the scale range
(`geom_point()`).
ggplotly()
3. Now create a map of your subsetted dataset.
First map chunk here
leaflet() |>
setView(lat = 33.708035, lng = -98.533298, zoom =4) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = latlong_final
)
Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
Refined map chunk here
<- paste0(
popupmap "<b>State: </b>", latlong_final$StateDesc, "<br>",
"<b>City: </b>", latlong_final$CityName, "<br>",
"<b>Population: </b>", latlong_final$PopulationCount, "<br>",
"<strong>Final Value: </strong>", latlong_final$Data_Value, "<br>"
)
leaflet() |>
setView(lat = 33.708035, lng = -98.533298, zoom =4) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = latlong_final,
radius = sqrt(1.5^latlong_final$Data_Value)*4,
popup = popupmap
)
Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
In a paragraph, describe the plots you created and what they show.
The graphs that I made showed the rate of adults without any leisure-time physical activity in southern states. The first scatter plot that I made was the one that I thought was the most interesting. While it was just a blob for the most part, there was a slight trend for data points further to the north having a lower Rate. This trend didn’t show clearly in the map plots I made, which is why I’m pointing it out now. Other than that, I also thought it was interesting that you could see how the scatter plot was stratified by state, since there aren’t any long vertical states. The map plots that I made were less interesting, but it did show that the places with the highest rates of no leisure-time physical activity was mostly centered on large cities, although that could potentially be biased, since more data was probably collected in big cities than in rural areas.