library(tidyverse)
library(tidyr)
library(ggthemes)
library(leaflet)
setwd("C:/Users/steve/Downloads")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
1. Once you run the above code and learn how to filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations.
latlong <- cities500 |>
mutate(GeoLocation = str_replace_all(GeoLocation,"[()]", "")) |>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Minnesota_clean <- latlong |>
filter(StateAbbr == "MN") |>
filter(PopulationCount >= "3000") |>
filter(Category == "Prevention") |>
filter(Year == "2017") |>
filter(GeographicLevel == "Census Tract")
head(Minnesota_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MN Minnesota Minneapolis Census Tract BRFSS Prevention
2 2017 MN Minnesota Brooklyn Park Census Tract BRFSS Prevention
3 2017 MN Minnesota Duluth Census Tract BRFSS Prevention
4 2017 MN Minnesota Minneapolis Census Tract BRFSS Prevention
5 2017 MN Minnesota Minneapolis Census Tract BRFSS Prevention
6 2017 MN Minnesota Duluth Census Tract BRFSS Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
let’s get rid of the column we don’t need
Minnesota_clean2 <- Minnesota_clean |>
select(-TractFIPS, -DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(Minnesota_clean2)# A tibble: 6 × 17
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MN Minnesota Minneapol… Census Tract Prevent… 2743000… "Chole…
2 2017 MN Minnesota Brooklyn … Census Tract Prevent… 2707966… "Curre…
3 2017 MN Minnesota Duluth Census Tract Prevent… 2717000… "Visit…
4 2017 MN Minnesota Minneapol… Census Tract Prevent… 2743000… "Takin…
5 2017 MN Minnesota Minneapol… Census Tract Prevent… 2743000… "Curre…
6 2017 MN Minnesota Duluth Census Tract Prevent… 2717000… "Visit…
# ℹ 9 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, Short_Question_Text <chr>
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
##plot 1
# non map plot
ggplot(Minnesota_clean2, aes(x=PopulationCount, y=Data_Value, color = CityName)) +
geom_point(alpha = 10) +
scale_color_discrete()+
geom_jitter() +
labs(title = "Relationship between the population of each town and the value collected in Minnesota in 2017",
caption = "Source: Cities500 Database") +
theme_minimal()##Plot 2
ggplot(Minnesota_clean2, aes(x=Short_Question_Text, y=Data_Value, color = CityName)) +
geom_point() +
scale_color_discrete()+
geom_jitter() +
labs(title = " percentage of Data collected per questions asked ",
caption = "Source: Cities500 Database") +
theme_economist()3. Now create a map of your subsetted dataset.
We are creating our map. on this map for size sake , we decided to focus only on the city of minneapolis
# leaflet()
leaflet() |>
setView(lng = -93.2, lat = 44.9, zoom =10 ) |>
addProviderTiles("Esri.NatGeoWorldMap") |>
addCircles(
data = Minnesota_clean2 ,
radius = Minnesota_clean2$Data_Value,
color = "green",
fillColor = "white",
fillOpacity = "0.5"
)Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
Let’s create our popup first
popupminnesota <- paste0(
"<b>Year: </b>", Minnesota_clean2$Year, "<br>",
"<b>City: </b>", Minnesota_clean2$CityName, "<br>",
"<b>Value_in_percent: </b>", Minnesota_clean2$Data_Value, "<br>",
"<b>Populationcount: </b>", Minnesota_clean2$PopulationCount, "<br>",
"<b>questionasked: </b>", Minnesota_clean2$Short_Question_Text, "<br>"
)4. Refine your map to include a mouse-click tooltip
Here we refined our map to add a tooltip, which will enable our audience to see for minneapolis , the exact data value in percent for each categrory , the year , the population count and the question that was asked to the population.
leaflet() |>
setView(lng = -93.26, lat = 44.97, zoom =11 ) |>
addProviderTiles("Esri.NatGeoWorldMap") |>
addCircles(
data = Minnesota_clean2 ,
radius = sqrt(Minnesota_clean2$Data_Value),
color = "red",
fillColor = "white",
fillOpacity = 0.25,
popup = popupminnesota
)Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
In one paragraph, describe the graphs you created and what they show.
For this assignment, we used the “Cities500” data set. We first cleaned our data set. We wanted our subset to include only observations from Minnesota in 2017, with prevention as the category, census tract as the geographic level, and a population greater than or equal to 3,000.
To achieve this, we first decided to filter our data set to obtain a subset of no more than 900 observations. We first applied a filter to keep only observations from Minnesota, then another filter to keep only the population (which constitutes our sample here) greater than or equal to 3,000. We then selected only the “Prevention” category and the census tract as the geographic area, followed by another filter to retain only observations from 2017.
We started with a data set of over 800,000 observations to obtain a new subset of only 764 observations. To better clean our data set, we decided to remove the column that was unnecessary for our analysis.
The first graph we created aimed to observe the relationship between the population of each city and the value of the data collected. The second graph aimed to precisely identify the category for which the most data was collected. As the graph shows, less than 25% of the respondents in this study have health insurance, while over 50% have an annual health checkup, cholesterol screening, or take blood pressure medication.
We then mapped our subset. For reasons of size, we chose to focus only on the city of Minneapolis. Finally, we refined the map to add a tooltip that allows our audience to see the exact percentage value of the data for Minneapolis for each category, year, population, and question asked.