library(tidyverse)
library(tidyr)
setwd("/Users/karenlizethpp/Library/Mobile Documents/com~apple~CloudDocs/Data 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Unhealthy Beh…
4 2017 CA California Indio Census Tract BRFSS Health Outcom…
5 2017 CA California Inglewood Census Tract BRFSS Health Outcom…
6 2017 CA California Lakewood City BRFSS Unhealthy Beh…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
latlong_clean2 <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract Health … 0632548… Arthri…
2 2017 CA California Hawthorne City Unhealt… 632548 Curren…
3 2017 CA California Hayward City Unhealt… 633000 Obesit…
4 2017 CA California Indio Census Tract Health … 0636448… Arthri…
5 2017 CA California Inglewood Census Tract Health … 0636546… Diagno…
6 2017 CA California Lakewood City Unhealt… 639892 Obesit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
#unique(md$CityName)The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with a cleaned dataset.
1. Once you run the above code and learn how to filter in this format, filter this dataset however you choose so that you have a subset with no more than 900 observations.
Filter chunk here
I filtered the data to keep only the tracts in Florida where the diabetes rate is 15% or higher, and I made sure it only includes the diabetes question at the census tract level.
latlongfl <- latlong_clean2 |>
filter(StateAbbr == "FL",Data_Value>=15, GeographicLevel == "Census Tract", Short_Question_Text=="Diabetes")
head(latlongfl)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 FL Florida Deerfield… Census Tract Health … 1216725… Diagno…
2 2017 FL Florida Hialeah Census Tract Health … 1230000… Diagno…
3 2017 FL Florida Gainesvil… Census Tract Health … 1225175… Diagno…
4 2017 FL Florida Hialeah Census Tract Health … 1230000… Diagno…
5 2017 FL Florida Boynton B… Census Tract Health … 1207875… Diagno…
6 2017 FL Florida Hialeah Census Tract Health … 1230000… Diagno…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here:
- I filtered the dataset to get the top 5 cities with the highest diabetes percentages at the tract level using max().
# Filter top 5 cities with highest diabetes %
top_cities <- latlongfl |>
group_by(CityName) |>
summarize(max_diabetes = max(Data_Value, na.rm = TRUE)) |>
arrange(desc(max_diabetes)) |>
slice(1:5) |>
pull(CityName)
latlong_top5 <- latlongfl |>
filter(CityName %in% top_cities,
Short_Question_Text == "Diabetes")Pull function:
- https://www.statology.org/dplyr-pull/
- Now that I have the 5 cities with the highest rate of diabates in adults, I created a scatterplot to show diabetes percentages by population across tracts, with a panel for each of those top cities using facet_wrap.
# Create a Scatterplot of diabetes % by population, faceted by city
ggplot(latlong_top5, aes(x = PopulationCount, y = Data_Value, color = CityName)) +
geom_point(alpha = 0.6, size=3) +
facet_wrap(~CityName) +
scale_color_viridis_d() +
labs(
title = "Diabetes by Population in the Top 5 \nCities with Highest Rates in Florida (2017)",
x = "Population",
y = "Diabetes %",
color = "City",
caption = "Source: CDC, 500 Cities Project (2016–2019)"
) +
theme_bw() +
theme(strip.text = element_text(size = 10, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5))After this scatterplot, I would like to create a bar graph that shows the average diabetes rate in the top 5 cities in Florida by city. I’ll also need it for mapping later.
First Function:
- https://www.rdocumentation.org/packages/gdata/versions/3.0.1/topics/first
#Top 5 cities with highest average diabetes %, total population summed
diabetesfl <- latlongfl |>
group_by(CityName) |>
summarize(
mean_diabetes = mean(Data_Value, na.rm = TRUE),
lat = first(lat),
long = first(long),
PopulationCount = sum(PopulationCount, na.rm = TRUE)
) |>
arrange(desc(mean_diabetes)) |>
slice(1:5)
#Create Bar Graph for Average Diabetes % in Top 5 Cities
ggplot(diabetesfl, aes(x = reorder(CityName, -mean_diabetes),
y = mean_diabetes,
fill = mean_diabetes)) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste0(round(mean_diabetes, 1), "%")),
vjust = -0.5, size = 3) +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "Top 5 Cities with the Highest Average\n Rate of Diabetes in Adults in Florida (2017)",
x = "City",
y = "Average % with Diabetes",
fill = "Diabetes %",
caption = "Source: CDC, 500 Cities Project (2016–2019)") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5))3. Now create a map of your subsetted dataset.
First map chunk here
Florida lat=27.994402, long= -81.760254.
Lat and Long Florida:
-https://www.latlong.net/place/florida-usa-15262.html
-https://leaflet-extras.github.io/leaflet-providers/preview/
florida_lon <- -81.760254
florida_lat <- 27.994402
# Calculed the average of all cities with diabetes rate >15% in Florida
diabetesfl2 <- latlongfl |>
group_by(CityName) |>
summarize(
mean_diabetes = mean(Data_Value, na.rm = TRUE),
lat = first(lat),
long = first(long),
PopulationCount = sum(PopulationCount, na.rm = TRUE)
) |>
arrange(desc(mean_diabetes))
#Create a map
library(leaflet)
leaflet() |>
setView(florida_lon,florida_lat, zoom = 6.5) |>
addProviderTiles("Stadia.Outdoors") |>
addCircles(
data = diabetesfl2,
lat = diabetesfl2$lat,
lng = diabetesfl2$long,
radius = diabetesfl2$mean_diabetes)Adding radius scale:
-https://r-graph-gallery.com/182-add-circles-rectangles-on-leaflet-map.html
library(leaflet)
leaflet() |>
setView(lng = florida_lon, lat = florida_lat, zoom = 7) |>
addProviderTiles("Stadia.Outdoors") |>
addCircles(
data = diabetesfl2,
lat = ~lat,
lng = ~long,
radius = ~mean_diabetes * 500, # I use a scale it for better visibility
color = "#eb1e17",
fillOpacity = 0.5
)4. Refine your map to include a mouse-click tooltip
Refined map chunk here
Adding tooltips and using round() for better readability on the map.
# Create popup
popupdiabetes <- paste0(
"<b>City: </b>", diabetesfl2$CityName, "<br>",
"<b>Diabetes Prevalence: </b>", round(diabetesfl2$mean_diabetes, 1), "%<br>",
"<b>Population: </b>", diabetesfl2$PopulationCount
)
# Create the interactive map
leaflet() |>
setView(lng = florida_lon, lat = florida_lat, zoom = 6.4) |>
addProviderTiles("Stadia.Outdoors") |>
addCircles(
data = diabetesfl2,
lat = ~lat,
lng = ~long,
radius = ~mean_diabetes *1000, # Improving scaling
color = "#8a0b07",
fillColor = "#ed3434",
fillOpacity = 0.5,
popup = popupdiabetes
)5. Write a paragraph
In a paragraph, describe the plots you created and what they show.
In this assigment, I focused on Florida because it’s one of the states with the highest immigrant populations, and I wanted to explore how the percentage of adults with diabetes behaves in different cities. My first plot is a scatterplot that shows the distribution of diabetes percentages across census tracts in the top five cities with the highest rates. This helped me see how the numbers vary within each city. In the second plot, I created a bar graph that shows the average diabetes rate by city. What I found interesting is that some cities with high individual rates didn’t necessarily have the highest averages. On the map, I displayed 25 cities in Florida that have an average diabetes rate above 15%, showing how those cities are distributed across the state. From this, I observed that the highest rates tend to appear in the south of Florida, where most immigrant populations are located. This makes it interesting for future analysis to focus on factors like race or gender in these areas, especially for chronic illnesses like diabetes. It could help identify trends and support better decision-making to improve health outcomes in these communities.