library(tidyverse)
library(tidyr)
library(leaflet)
setwd("/home/andrewarsaw/DATA 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017) |>
filter(StateAbbr == "CT") |>
filter(Category == "Unhealthy Behaviors")
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CT Connecticut Bridgeport Census Tract BRFSS Unhealthy B…
2 2017 CT Connecticut Danbury City BRFSS Unhealthy B…
3 2017 CT Connecticut Norwalk Census Tract BRFSS Unhealthy B…
4 2017 CT Connecticut Bridgeport Census Tract BRFSS Unhealthy B…
5 2017 CT Connecticut Hartford Census Tract BRFSS Unhealthy B…
6 2017 CT Connecticut Waterbury Census Tract BRFSS Unhealthy B…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
latlong_clean2 <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_clean2)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CT Connecticut Bridgep… Census Tract Unhealt… 0908000… Obesit…
2 2017 CT Connecticut Danbury City Unhealt… 918430 Obesit…
3 2017 CT Connecticut Norwalk Census Tract Unhealt… 0955990… Obesit…
4 2017 CT Connecticut Bridgep… Census Tract Unhealt… 0908000… Curren…
5 2017 CT Connecticut Hartford Census Tract Unhealt… 0937000… Obesit…
6 2017 CT Connecticut Waterbu… Census Tract Unhealt… 0980000… Obesit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “Prevention” is a manageable dataset now.
New Orleans Unhealthy Beahviors
For my assignment, I will be observing New Orleans Unhealthy Behaviors in 2017
Initial Dataset Cleaning
latlong_clean <- latlong |>
filter(StateDesc == "Louisiana") |>
filter(CityName == "New Orleans") |>
filter(Category == "Unhealthy Behaviors") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
2 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
3 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
4 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
5 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
6 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What Variables Are Included?
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Further Data Filtering
I would like my data to be more focused, so I will filter the data further to limit the number of observations as much as possible:
nola_subset <- latlong_clean |>
filter(MeasureId %in% c("CSMOKING", "BINGE")) |>
arrange(MeasureId)
nrow(nola_subset)[1] 352
head(nola_subset)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
2 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
3 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
4 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
5 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
6 2017 LA Louisiana New Orleans Census Tract BRFSS Unhealthy Be…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Bar plot of Unhealthy Behavior Prevalence in New Orleans
# non map plot
nola_plot <- nola_subset |>
ggplot(aes(x = Measure, y = Data_Value, fill = Measure)) +
geom_bar(stat = "identity", show.legend = FALSE) +
labs(
title = "Unhealthy Behavior Prevalence\n in New Orleans (2017)",
y = "Crude Prevalence (%)",
x = ""
) +
coord_flip() +
theme_minimal()
nola_plotWarning: Removed 6 rows containing missing values or values outside the scale range
(`geom_bar()`).
Map of New Orleans Unhealthy Behaviors
First map chunk here
leaflet(data = nola_subset) |>
addTiles() |>
addCircleMarkers(
lng = ~long, lat = ~lat,
radius = 5,
stroke = FALSE,
fillOpacity = 0.7,
color = "steelblue")Refined Map Including Interactivity
leaflet(data = nola_subset) |>
addProviderTiles("CartoDB.Positron") |>
addCircleMarkers(
lng = ~long,
lat = ~lat,
radius = 6,
stroke = TRUE,
color = "white",
fillColor = "darkred",
fillOpacity = 0.85,
popup = ~paste0(
"<strong>Category:</strong> ", Measure, "<br>",
"<strong>Prevalence:</strong> ", round(Data_Value, 1), "%"))Summary
For this assignment I explored crude prevalence health data for New Orleans in 2017, focusing on two key “Unhealthy Behaviors”:
- Current smoking
- Binge drinking
For my visuals, I use a bar chart to display how each behavior varies in prevalence between the two variables, with smoking showing the highest rates. The leaflet map visualizes these indicators spatially across New Orleans, with each circle corresponding to a location and indicator. The refined map adds interactivity by allowing users to click and view specific measures to give more insight into the percentage of prevalence of the unhealthy behaviors (which is also stated in each given point) on a given area of the map.
Although my findings are not enough information to give conclusive observations on New Orleans, it does give rise to concerns regarding the high rates of smoking and binge drinking in the surrounding areas, and what potential circumstances could be causing such a thing including high stress, for instance.