library(tidyverse)
library(tidyr)
library(leaflet)
setwd("~/Desktop/Data Science MC/Data Science 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Unhealthy Beh…
4 2017 CA California Indio Census Tract BRFSS Health Outcom…
5 2017 CA California Inglewood Census Tract BRFSS Health Outcom…
6 2017 CA California Lakewood City BRFSS Unhealthy Beh…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
latlong_clean2 <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract Health … 0632548… Arthri…
2 2017 CA California Hawthorne City Unhealt… 632548 Curren…
3 2017 CA California Hayward City Unhealt… 633000 Obesit…
4 2017 CA California Indio Census Tract Health … 0636448… Arthri…
5 2017 CA California Inglewood Census Tract Health … 0636546… Diagno…
6 2017 CA California Lakewood City Unhealt… 639892 Obesit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with a cleaned dataset.
1. Once you run the above code and learn how to filter in this format, filter this dataset however you choose so that you have a subset with no more than 900 observations.
Filter chunk here
#filter for specific State, then clean NAs
newyork_data <- latlong_clean2 |>
filter(latlong_clean2$StateDesc == "New York") |>
filter(!is.na(Data_Value), !is.na(PopulationCount), !is.na(Measure))
unique(newyork_data$Short_Question_Text) [1] "Stroke" "Cholesterol Screening" "COPD"
[4] "Diabetes" "Chronic Kidney Disease" "Physical Inactivity"
[7] "Mental Health" "Obesity" "Physical Health"
[10] "Health Insurance" "Annual Checkup" "Arthritis"
[13] "Taking BP Medication" "High Blood Pressure" "Binge Drinking"
[16] "High Cholesterol" "Coronary Heart Disease" "Current Smoking"
[19] "Cancer (except skin)" "Current Asthma"
# Filters by Short Question Text, City Name, and Population Count less than 1500
newyork_data2 <- newyork_data |>
filter(Short_Question_Text %in% c("Current Smoking", "Current Asthma", "Stroke")) |>
filter(CityName %in% c("New York", "Buffalo", "Rochester", "Syracuse", "Albany"), PopulationCount < 1500)2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
ggplot(newyork_data2, aes(x = CityName, y = Data_Value, color = Short_Question_Text)) +
geom_jitter(width = 0.35, alpha = 0.7, size = 2.5) +
labs(title = "Health Measures for 5 New York Cities", x = "City", y = "Crude Prevalence", color = "Health Measure", caption = "500 Healthy Cities (CDC)") +
theme_bw(base_size = 8)3. Now create a map of your subsetted dataset.
First map chunk here
leaflet() |>
setView(lng = -73.97213, lat = 40.76260, zoom = 5) |>
addProviderTiles("OpenStreetMap.Mapnik") |>
addCircles(
data = newyork_data2,
radius = newyork_data2$Data_Value,
color = "#e20f32",
fillColor = "#713c65",
fillOpacity = 0.6
)Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
Refined map chunk here
popupNY <- paste0(
"<b>City:</b> ", newyork_data2$CityName, "<br>",
"<b>Measure:</b>", newyork_data2$Short_Question_Text, "<br>",
"<b>Value:</b>", newyork_data2$Data_Value, "%"
)leaflet() |>
setView(lng = -73.97213, lat = 40.76260, zoom = 5) |>
addProviderTiles("OpenStreetMap.Mapnik") |>
addCircles(
data = newyork_data2,
lng = ~long,
lat = ~lat,
radius = newyork_data2$Data_Value,
color = "#e20f32",
fillColor = "#713c65",
fillOpacity = 0.6,
popup = popupNY
)5. Write a paragraph
Plot 1: Is a scatter plot that shows the crude prevalence percentage of three health indicators (Current Smoking, Current Asthma, and Stroke) in 5 major New York cities (NYC, Buffalo, Rochester, Syracus, and Albany). I used the jitter plot to cleary graph any overlapping points, to make it more readable. The graph shows that Smoking is widely recorded while stroke is recorded less often. Plot 2 is an interactive map that displays the subset of data. The map filters for the three health measurements and their correlated data value.