library(tidyverse)
library(tidyr)
library(leaflet)
library(ggplot2)
<- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
cities500 data(cities500)
Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
<- cities500|>
latlong mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United States, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
<- latlong |>
latlong_clean filter(StateDesc != "United States") |>
filter(Category == "Prevention") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)
# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Montgomery City BRFSS Prevention
2 2017 CA California Concord City BRFSS Prevention
3 2017 CA California Concord City BRFSS Prevention
4 2017 CA California Fontana City BRFSS Prevention
5 2017 CA California Richmond Census Tract BRFSS Prevention
6 2017 FL Florida Davie Census Tract BRFSS Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean)
[1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
<- latlong_clean |>
prevention select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)
# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Montgome… City Prevent… 151000 Choles…
2 2017 CA California Concord City Prevent… 616000 Visits…
3 2017 CA California Concord City Prevent… 616000 Choles…
4 2017 CA California Fontana City Prevent… 624680 Visits…
5 2017 CA California Richmond Census Tract Prevent… 0660620… Choles…
6 2017 FL Florida Davie Census Tract Prevent… 1216475… Choles…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
<- prevention |>
md filter(StateAbbr=="MD")
head(md)
# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Chole…
2 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
3 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
4 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Curre…
5 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Curre…
6 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
names(md)
[1] "Year" "StateAbbr" "StateDesc"
[4] "CityName" "GeographicLevel" "Category"
[7] "UniqueID" "Measure" "Data_Value_Type"
[10] "Data_Value" "PopulationCount" "lat"
[13] "long" "CategoryID" "MeasureId"
[16] "CityFIPS" "TractFIPS" "Short_Question_Text"
md
# A tibble: 804 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Chole…
2 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Visit…
3 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Visit…
4 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Curre…
5 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Curre…
6 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Visit…
7 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Curre…
8 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Takin…
9 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Curre…
10 2017 MD Maryland Baltimore Census Tract Prevent… 2404000… "Chole…
# ℹ 794 more rows
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
unique(md$CityName)
[1] "Baltimore"
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with a cleaned dataset.
1. Once you run the above code, filter this dataset one more time for any particular subset with no more than 900 observations.
Filter chunk here
<-
waco |>
latlong filter(CityName == "Waco" & Year == 2017 & GeographicLevel != "City") |> #focusing on census tracts in Waco, Texas
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote, -UniqueID) #getting rid of unnessary columns
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here; (North = + lat, South = - lat, East = + lng, West = - lng)
|>
waco ggplot(aes(x = Category, y = Short_Question_Text)) +
geom_point(size = 5, color = "red", shape = "x") +
theme_bw() |>
labs(title = "Waco, TX - 2017 Health Data",
y = "Measure")
3. Now create a map of your subsetted dataset.
First map chunk here
leaflet() |>
setView(lng = -97.1290, lat = 31.5576, zoom = 11) |>
addProviderTiles("Stadia.AlidadeSmooth") |>
addCircles(data = waco)
Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
<-
tooltip paste0(
"<b>Measure: </b>", waco$Short_Question_Text, "<br>",
"<b>Category: </b>", waco$Category, "<br>",
"<b>Population: </b>", waco$PopulationCount, "<br>"
)
leaflet() |>
setView(lng = -97.1290, lat = 31.5576, zoom = 11) |>
addProviderTiles("Stadia.AlidadeSmooth") |>
addCircles(data = waco,
radius = (waco$PopulationCount / 10), #varying sizes based on pop.
color = "tan",
fillOpacity = 0.05,
popup = tooltip
)
Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
In a paragraph, describe the plots you created and what they show.
From this dataset of the 500 largest cities, I chose to create my visualizations based on Waco, Texas. First, I created a point graph demonstrating each recorded measure of chronic disease vs. its category. In this, I noticed most measurements included health outcomes such as diseases and health condition, which means this was the most common factor in determining the city’s healthiness. Next, I mapped the city of Waco with each census tract plotted along with information on the area’s measure, category, and population; with the points varying in size based on the population count.