library(tidyverse)
library(tidyr)
library(leaflet)
setwd("C:/Users/emmap/Downloads/DATA110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United States, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Category == "Prevention") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Montgomery City BRFSS Prevention
2 2017 CA California Concord City BRFSS Prevention
3 2017 CA California Concord City BRFSS Prevention
4 2017 CA California Fontana City BRFSS Prevention
5 2017 CA California Richmond Census Tract BRFSS Prevention
6 2017 FL Florida Davie Census Tract BRFSS Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
prevention <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Montgome… City Prevent… 151000 Choles…
2 2017 CA California Concord City Prevent… 616000 Visits…
3 2017 CA California Concord City Prevent… 616000 Choles…
4 2017 CA California Fontana City Prevent… 624680 Visits…
5 2017 CA California Richmond Census Tract Prevent… 0660620… Choles…
6 2017 FL Florida Davie Census Tract Prevent… 1216475… Choles…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
md <- prevention |>
filter(StateAbbr=="MD")
head(md)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Chole…
2 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
3 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
4 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Curre…
5 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Curre…
6 2017 MD Maryland Baltimore Census Tract Preventi… 2404000… "Visit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
unique(md$CityName)[1] "Baltimore"
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with a cleaned dataset.
1. Once you run the above code, filter this dataset one more time for any particular subset with no more than 900 observations.
Filter chunk here
prevention2 <- prevention |>
filter(MeasureId == "ACCESS2") |>
filter(StateAbbr == "MD" | StateAbbr == "DC" | StateAbbr == "VA") |>
filter(GeographicLevel != "City")
head(prevention2)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 DC District o… Washing… Census Tract Prevent… 1150000… "Curre…
2 2017 DC District o… Washing… Census Tract Prevent… 1150000… "Curre…
3 2017 DC District o… Washing… Census Tract Prevent… 1150000… "Curre…
4 2017 DC District o… Washing… Census Tract Prevent… 1150000… "Curre…
5 2017 DC District o… Washing… Census Tract Prevent… 1150000… "Curre…
6 2017 DC District o… Washing… Census Tract Prevent… 1150000… "Curre…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
plot1 <- prevention2 |>
ggplot(aes(x = PopulationCount, y = Data_Value))+
geom_point()+
geom_smooth(method = "lm")+
labs(x = "Population", y = "Prevalence of Adults Age 18-64 Without Health Insurance")+
theme_bw()
plot1`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 3 rows containing missing values (`geom_point()`).
3. Now create a map of your subsetted dataset.
First map chunk here
map1 <-
leaflet() |>
setView(lng = -76.64, lat = 39.045, zoom = 5) |>
addProviderTiles("Esri.WorldPhysical") |>
addCircles(
data = prevention2,
radius = prevention2$Data_Value
)Assuming "long" and "lat" are longitude and latitude, respectively
map14. Refine your map to include a mouse-click tooltip
Refined map chunk here
insurancepopup <- paste0(
"<b>Population: <b>", prevention2$PopulationCount, "<br>",
"<b>Lack of Insurance Prevalence: <b>", prevention2$Data_Value, "<br>"
)
pal <- colorNumeric(c("#960000", "#ae0000", "#c70000", "#e10000", "#ff0000"), domain = prevention2$Data_Value)
map2 <- leaflet() |>
setView(lng = -76.64, lat = 39.0458, zoom = 6) |>
addProviderTiles("OpenStreetMap") |>
addCircles(
data = prevention2,
radius = 5*(prevention2$Data_Value)^2,
color = pal(prevention2$Data_Value),
opacity = 1,
popup = insurancepopup
)Assuming "long" and "lat" are longitude and latitude, respectively
map25. Write a paragraph
In a paragraph, describe the plots you created and what they show.
The plots that I created demonstrate the prevalence of adults (ages 18-64) who lack insurance in various census tracts throughout the DMV region. The initial scatterplot that I created, to determine if there was any relationship between population levels and lower insurance rates, indicated a slight negative correlation between population size and a lack of insurance. Although I did not consider this correlation strong enough to be significant, I still found population size to be salient enough information that I included it alongside the data value for the prevalence of low insurance. I scaled the data points slightly by making the radius equivalent to 5 times the value squared, as the differences between the values on their own were not enough to yield meaningful results. I found it interesting that, in multiple areas, the rates of individuals without insurance tend to be highest right in the center of the city, while in others the rates are highest on the southernmost edges of the city. I’d find it interesting to look more deeply into the urban planning of these areas to determine the other factors with which these distributions might be consistent.