library(tidyverse)
library(tidyr)
setwd("/Users/ayomidealagbada/AYOMIDE'S DATAVISUALITIOM")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Category == "Prevention") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Montgomery City BRFSS Prevention
2 2017 CA California Concord City BRFSS Prevention
3 2017 CA California Concord City BRFSS Prevention
4 2017 CA California Fontana City BRFSS Prevention
5 2017 CA California Richmond Census Tract BRFSS Prevention
6 2017 FL Florida Davie Census Tract BRFSS Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
prevention <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Montgome… City Prevent… 151000 Choles…
2 2017 CA California Concord City Prevent… 616000 Visits…
3 2017 CA California Concord City Prevent… 616000 Choles…
4 2017 CA California Fontana City Prevent… 624680 Visits…
5 2017 CA California Richmond Census Tract Prevent… 0660620… Choles…
6 2017 FL Florida Davie Census Tract Prevent… 1216475… Choles…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
md <- prevention |>
filter(StateAbbr=="MD")
head(md)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Chole…
2 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
3 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
4 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Curre…
5 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Curre…
6 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
unique(md$CityName)[1] "Baltimore"
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with a cleaned dataset.
1. Once you run the above code, filter this dataset one more time for any particular subset with no more than 900 observations.
Filter chunk here
# Filter data for individuals managing high blood pressure with medication and calculate estimated counts
MHBP <- md %>%
filter(Measure == "Adults aged 18 and older with high blood pressure taking medication to control it") %>%
mutate(
data_value_decimal = Data_Value / 100, # Convert percentage to a proportion
estimated_users = round(data_value_decimal * PopulationCount) # Compute estimated number of users
) %>%
drop_na() # Remove rows with any missing values
# Uncomment and adjust for geographical tract data if required
# Load Maryland tracts and set coordinate reference system
# oh_tracts <- tracts(state = "MD", cb = TRUE) %>%
# st_transform(4326)md_data <- prevention %>%
filter(StateAbbr == "MD") %>%
drop_na(long, lat, Data_Value) %>%
# Take first 900 observations as required
slice(1:900)2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
ggplot(md_data, aes(x = CityName, y = Data_Value, color = Measure)) +
geom_boxplot(alpha = 0.6, outlier.shape = NA) + # Suppress outliers to avoid overlap with jittered points
geom_jitter(width = 0.2, alpha = 0.6, size = 2) + # Show individual data points
scale_color_viridis_d() +
labs(
title = "Distribution of Health Prevention Measures Across Maryland Cities",
subtitle = "Comparing Various Measures by City",
x = "City",
y = "Prevalence (%)",
color = "Measure Type"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "right"
)3. Now create a map of your subsetted dataset.
First map chunk here
# Ensure leaflet is loaded
library(leaflet)
# Example leaflet code
leaflet(md_data) %>%
addTiles() %>%
setView(
lng = mean(md_data$long, na.rm = TRUE),
lat = mean(md_data$lat, na.rm = TRUE),
zoom = 7
) %>%
addCircleMarkers(
lng = ~long, lat = ~lat,
radius = 8,
color = "blue",
fillOpacity = 0.6
)4. Refine your map to include a mouse-click tooltip
Refined map chunk here
leaflet(md_data) %>%
addTiles() %>%
setView(
lng = -76.6413, # Maryland center longitude
lat = 39.0458, # Maryland center latitude
zoom = 7
) %>%
addCircleMarkers(
~long,
~lat,
radius = 8,
fillColor = "blue",
fillOpacity = 0.6,
stroke = FALSE,
label = paste(md_data$CityName, round(md_data$Data_Value, 1))
)5. Write a paragraph
In a paragraph, describe the plots you created and what they show.
For this assignment, I first created a scatter plot showing the prevalence of preventive measures by city. This plot highlights differences in prevalence across various cities, allowing me to identify areas with higher or lower values.
Next, I looked at trends over time by plotting average prevalence by year. This view lets me see any changes or trends in preventive health measures over time.
Finally, I created an interactive map, plotting each city geographically and using marker sizes to represent prevalence levels. This map gives me a clear picture of how prevalence varies by location, and I can click each marker to view details for specific cities.