library(tidyverse)
library(tidyr)
library(dplyr)
library(leaflet)
setwd("/Users/Owner/Documents/DATA 110/Week 5")
<- read_csv("500CitiesLocalHealthIndicators.cdc.csv") cities500
Healthy Cities GIS Assignment
Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
<- cities500|>
latlong mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
*Please note: I filtered Category for “Unhealthy Behaviors” instead of “Prevention.
<- latlong |>
latlong_clean filter(StateDesc != "United States") |>
filter(Category == "Unhealthy Behaviors") |> # Originally Prevention
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean)
# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne City BRFSS Unhealthy Be…
2 2017 CA California Hayward City BRFSS Unhealthy Be…
3 2017 CA California Lakewood City BRFSS Unhealthy Be…
4 2017 AL Alabama Huntsville Census Tract BRFSS Unhealthy Be…
5 2017 AZ Arizona Avondale Census Tract BRFSS Unhealthy Be…
6 2017 AZ Arizona Chandler City BRFSS Unhealthy Be…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
What variables are included? (can any of them be removed?)
names(latlong_clean)
[1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
*Please note: I filtered the data for CityName==“Baltimore” instead of StateAbbr==“MD.”
<- latlong_clean |>
prevention select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)
# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne City Unhealt… 632548 Curren…
2 2017 CA California Hayward City Unhealt… 633000 Obesit…
3 2017 CA California Lakewood City Unhealt… 639892 Obesit…
4 2017 AL Alabama Huntsvil… Census Tract Unhealt… 0137000… Obesit…
5 2017 AZ Arizona Avondale Census Tract Unhealt… 0404720… Obesit…
6 2017 AZ Arizona Chandler City Unhealt… 412000 No lei…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
<- prevention |>
baltimore filter(CityName=="Baltimore")
head(baltimore)
# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 MD Maryland Baltimore Census Tract Unhealth… 2404000… Curren…
2 2017 MD Maryland Baltimore Census Tract Unhealth… 2404000… No lei…
3 2017 MD Maryland Baltimore Census Tract Unhealth… 2404000… Obesit…
4 2017 MD Maryland Baltimore Census Tract Unhealth… 2404000… No lei…
5 2017 MD Maryland Baltimore Census Tract Unhealth… 2404000… Binge …
6 2017 MD Maryland Baltimore Census Tract Unhealth… 2404000… Curren…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with the cleaned “Prevention” dataset
1. Once you run the above code, filter this dataset one more time for any particular subset.
Filter chunk here
*I checked each unique value of “Measure” to have a glance at which “Unhealthy Behaviors” are included in this dataset, then chose the variable “Obesity among adults aged >=18 Years” from among them. I also removed the included data point for the city of Baltimore overall.
unique(latlong_clean$Measure)
[1] "Current smoking among adults aged >=18 Years"
[2] "Obesity among adults aged >=18 Years"
[3] "No leisure-time physical activity among adults aged >=18 Years"
[4] "Binge drinking among adults aged >=18 Years"
<- baltimore |>
baltimore_obesity filter(GeographicLevel =="Census Tract") |>
filter(Measure =="Obesity among adults aged >=18 Years")
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
ggplot(baltimore_obesity, aes(x = Data_Value)) +
geom_density(linewidth = 0.75) +
labs(title = "Percentage of Obese Adults (18 years or older) in Population by Census Tract in the City of Baltimore, 2017",
x = "Percentage of Obese Adults (18 years or older) in Population",
y = "Density")
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_density()`).
*I pulled a summary of percentage data values to get an idea of where the data points were most concentrated via the mean.
summary(baltimore_obesity$Data_Value)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
19.30 34.20 41.60 39.57 46.30 55.30 1
3. Now create a map of your subsetted dataset.
First map chunk here
leaflet() |>
setView(lng = -76.6, lat = 39.3, zoom =12) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = baltimore_obesity,
radius = baltimore_obesity$Data_Value*10)
Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
<- paste0(
popupobesity "<b>Percentage of Obesity in Adults => 18 years: </b>", baltimore_obesity$Data_Value, "<br>",
"<b>Population of Area: </b>", baltimore_obesity$PopulationCount, "<br>",
"<b>Latitude in degrees: </b>", baltimore_obesity$lat, "<br>",
"<b>Longitude in degrees: </b>", baltimore_obesity$long, "<br>")
Refined map chunk here
leaflet() |>
setView(lng = -76.6, lat = 39.3, zoom = 12) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = baltimore_obesity,
radius = baltimore_obesity$Data_Value* 10,
color = "#51405e",
fillColor = "#03fc28",
fillOpacity = 0.35,
popup = popupobesity )
Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
In a paragraph, describe the plots you created and what they show.
My first plot is a simple density plot displaying the distribution of percentage values of the adult population (18 years and older) in 2017 that were obese per census tract in the city of Baltimore. The data is both bimodal and considerably left-skewed with a calculated mean value of 39.57%. Values of rougly 40.00% to 50.00% (the location along the x axis of the top of the taller of the two peaks) are much more frequently observed than values closer to 0.00% in the city of Baltimore in 2017.
I next plotted these percentage values over the corresponding geographic locations of the associated census tracts on a map to visualize the relative values spatially. The relative sizes of the plotted points correspond to the percentage values of obese adults in the population of the associated census tracts. For the purpose of legibility, I multiplied these values by 10; the originally-plotted points at the default sizes were so small that there didn’t appear to be any difference in size between them. As is common in port cities the majority of the population (and therefore the plotted data points) are clustered around the center of the harbor. Since both the fill and outline of these points have the same shades of blue, it can be difficult to distinguish each point from each other, particularly in the center of the city where each circle overlaps with several others at a time.
For my final plot, I changed the fill and outline color of these circles, respectfully, to green and black, making it easier to distinguish between overlapping points. I also added a mouse-click tooltip that displays the percentage of obesity among the adult population, the overall population count, and the latitude and longitude of the associated census tract for each data point. Zooming in with the map and clicking the tooltips, you notice that a great number of data points have similar sizes and percentage values roughly between 40.00% and 50.00%. It is worth noting that some of the more expensive areas of Baltimore, including such waterfront neighborhoods as Fells Point and far northern neighborhoods as Roland Park, contain distinctly-smaller circles, and thus lower percentages of obese adults, than those in the center, east, or west of the city. It would be informative to see how other factors measuring health such as life expectancy follow a similar pattern.