library(tidyverse)
library(tidyr)
setwd("C:/Users/User/Downloads/Data 110 Projects and Assignments")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Category == "Prevention") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Montgomery City BRFSS Prevention
2 2017 CA California Concord City BRFSS Prevention
3 2017 CA California Concord City BRFSS Prevention
4 2017 CA California Fontana City BRFSS Prevention
5 2017 CA California Richmond Census Tract BRFSS Prevention
6 2017 FL Florida Davie Census Tract BRFSS Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
prevention <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Montgome… City Prevent… 151000 Choles…
2 2017 CA California Concord City Prevent… 616000 Visits…
3 2017 CA California Concord City Prevent… 616000 Choles…
4 2017 CA California Fontana City Prevent… 624680 Visits…
5 2017 CA California Richmond Census Tract Prevent… 0660620… Choles…
6 2017 FL Florida Davie Census Tract Prevent… 1216475… Choles…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
md <- prevention |>
filter(StateAbbr=="MD")
head(md)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Chole…
2 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
3 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
4 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Curre…
5 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Curre…
6 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with the cleaned “Prevention” dataset
1. Once you run the above code, filter this dataset one more time for any particular subset.
Filter chunk here
unique(latlong_clean$StateAbbr) [1] "AL" "CA" "FL" "CT" "IL" "MN" "NY" "PA" "NC" "OH" "OK" "OR" "TX" "RI" "SC"
[16] "SD" "TN" "UT" "VA" "WA" "AK" "WI" "AZ" "AR" "CO" "DE" "NV" "DC" "GA" "ID"
[31] "HI" "MA" "MI" "IN" "KS" "KY" "IA" "LA" "MD" "ME" "NH" "NJ" "NM" "MO" "MS"
[46] "NE" "MT" "ND" "WV" "VT" "WY"
Filter the lack of insurance measure for the five most populous cities in California.
The five most populated cities in California are:Los Angeles, San Diego, San Jose, San Francisco, Fresno.
CA_no_insurance <- prevention |>
filter(StateAbbr == "CA") |>
filter(CityName == c("Los Angeles", "San Diego", "San Jose", "San Francisco", "Fresno")) |>
filter(MeasureId == "ACCESS2") Warning: There was 1 warning in `filter()`.
ℹ In argument: `==...`.
Caused by warning in `CityName == c("Los Angeles", "San Diego", "San Jose", "San Francisco", "Fresno")`:
! longer object length is not a multiple of shorter object length
head(CA_no_insurance)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Fresno Census Tract Preventi… 0627000… "Curre…
2 2017 CA California Fresno Census Tract Preventi… 0627000… "Curre…
3 2017 CA California Fresno Census Tract Preventi… 0627000… "Curre…
4 2017 CA California Fresno Census Tract Preventi… 0627000… "Curre…
5 2017 CA California Fresno Census Tract Preventi… 0627000… "Curre…
6 2017 CA California Fresno Census Tract Preventi… 0627000… "Curre…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
plot1 <- ggplot(CA_no_insurance, aes(x = PopulationCount, y = Data_Value, color = CityName))+
geom_point(alpha = 0.5) +
scale_color_viridis_d() +
geom_jitter() +
facet_wrap(~CityName) +
labs( title = "Proportion of Population Without Health Insurance",
subtitle = "California top 5 cities",
x = "Population Count",
y= "Percentage without Insurance",
color = "City Name",
caption = "Source: Center for Disease Control and Prevention") +
theme_bw()
plot1 Warning: Removed 9 rows containing missing values or values outside the scale range
(`geom_point()`).
Removed 9 rows containing missing values or values outside the scale range
(`geom_point()`).
3. Now create a map of your subsetted dataset.
First map chunk here
# Install the libraries
library(leaflet)Warning: package 'leaflet' was built under R version 4.4.1
library(sf)Warning: package 'sf' was built under R version 4.4.1
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(knitr)
# Create the map
leaflet() |>
setView(lng = -118.2437, lat = 34.0522, zoom = 12) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(data = CA_no_insurance,
radius = (CA_no_insurance$PopulationCount)/10,
color = "green",
fillColor = "grey",
fillOpacity = 0.25)Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mousover tooltip
Refined map chunk here
# create a popup string for an interactive map
mappopup <- paste0("<b>Year:</b> ", CA_no_insurance$Year, "<br>",
"<b>Geographic Level:</b> ", CA_no_insurance$GeographicLevel, "<br>",
"<b>City Name:</b>", CA_no_insurance$CityName, "<br>",
"<b>Population Count:</b> ", CA_no_insurance$PopulationCount, "<br>",
"<b>Lack of Insurance Rate:</b> ", CA_no_insurance$Data_Value, "<br>")# Include the mousover tooltip on the map
leaflet() |>
setView(lng = -118.2437, lat = 34.0522, zoom = 12) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(data = CA_no_insurance,
radius = (CA_no_insurance$PopulationCount)/10,
color = "green",
fillColor = "grey",
fillOpacity = 0.25,
popup = paste(mappopup))Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
The first visualization is a scatterplot that illustrates the proportion of the population without health insurance in the top five cities in California: Fresno, Los Angeles, San Diego, San Francisco, and San Jose. The plot is divided into five panels, each representing a specific city, with the x-axis representing the population count and the y-axis representing the percentage of the population without health insurance. Different colors are used to distinguish between the cities. The plot is designed to show how the uninsured rate varies with city population size. We might infer that larger cities could have a higher absolute number of uninsured individuals, but this does not necessarily mean a higher percentage of the population is uninsured since the population number varies per cities. There is noticeable variability in the percentage of the population without health insurance within each city. In Fresno the points are spread across a wide range of population counts, with percentages without insurance varying from about 5% to 35% while in Los Angeles the majority of the population counts cluster around 2,500 to 5,000, with a percentage without insurance mainly between 5% and 35%. Additionnally, in San Diego data points are scattered with population counts ranging from 2,500 to 10,000, and the percentage without insurance appears to be more spread out, varying from 5% to 35%. However, population counts in San Francisco range from 2,500 to 7,500 with a percentage without insurance generally between 5% and 20%. In San Jose the data points show a wide range of population counts, mostly clustering around 2,500 to 5,000. The percentage without insurance mainly ranges from 5% to 25%. Some cities like Los Angeles and San Jose show tighter clusters of population counts, indicating more uniformity in the number of people in the sample data.
The map visualization provides a geographical representation of population and health insurance data, focusing initially on Los Angeles with the Esri WorldStreetMap as the base layer. Circles on the map represent various locations, with their radii proportional to the population count, allowing easy comparison of population sizes across areas. By visualizing population counts through the size of the circles, we can quickly assess which areas are more densely populated. Each circle has a green border and a grey fill with adjusted opacity, enhancing visual clarity. When interacted with, these circles display key observations including the year, geographic level, city name, population count, and lack of insurance rate. This visualization highlights population distribution and health insurance coverage, making it easy to identify densely populated areas and regions with higher or lower rates of uninsured individuals. The inclusion of temporal and spatial context(year and geographic level) allows to understand trends and patterns over time and across different areas, serving as a powerful tool for public health analysis and policy-making.