Healthy Cities GIS Assignment

Author

A Porambo

Healthy Cities GIS Assignment

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
library(dplyr)
library(leaflet)
setwd("/Users/Owner/Documents/DATA 110/Week 5")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

*Please note: I filtered Category for “Unhealthy Behaviors” instead of “Prevention.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Category == "Unhealthy Behaviors") |> # Originally Prevention
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017)
head(latlong_clean)
# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName   GeographicLevel DataSource Category     
  <dbl> <chr>     <chr>      <chr>      <chr>           <chr>      <chr>        
1  2017 CA        California Hawthorne  City            BRFSS      Unhealthy Be…
2  2017 CA        California Hayward    City            BRFSS      Unhealthy Be…
3  2017 CA        California Lakewood   City            BRFSS      Unhealthy Be…
4  2017 AL        Alabama    Huntsville Census Tract    BRFSS      Unhealthy Be…
5  2017 AZ        Arizona    Avondale   Census Tract    BRFSS      Unhealthy Be…
6  2017 AZ        Arizona    Chandler   City            BRFSS      Unhealthy Be…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

*Please note: I filtered the data for CityName==“Baltimore” instead of StateAbbr==“MD.”

prevention <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)
# A tibble: 6 × 18
   Year StateAbbr StateDesc  CityName  GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>    <chr>    <chr>  
1  2017 CA        California Hawthorne City            Unhealt… 632548   Curren…
2  2017 CA        California Hayward   City            Unhealt… 633000   Obesit…
3  2017 CA        California Lakewood  City            Unhealt… 639892   Obesit…
4  2017 AL        Alabama    Huntsvil… Census Tract    Unhealt… 0137000… Obesit…
5  2017 AZ        Arizona    Avondale  Census Tract    Unhealt… 0404720… Obesit…
6  2017 AZ        Arizona    Chandler  City            Unhealt… 412000   No lei…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
baltimore <- prevention |>
  filter(CityName=="Baltimore")
head(baltimore)
# A tibble: 6 × 18
   Year StateAbbr StateDesc CityName  GeographicLevel Category  UniqueID Measure
  <dbl> <chr>     <chr>     <chr>     <chr>           <chr>     <chr>    <chr>  
1  2017 MD        Maryland  Baltimore Census Tract    Unhealth… 2404000… Curren…
2  2017 MD        Maryland  Baltimore Census Tract    Unhealth… 2404000… No lei…
3  2017 MD        Maryland  Baltimore Census Tract    Unhealth… 2404000… Obesit…
4  2017 MD        Maryland  Baltimore Census Tract    Unhealth… 2404000… No lei…
5  2017 MD        Maryland  Baltimore Census Tract    Unhealth… 2404000… Binge …
6  2017 MD        Maryland  Baltimore Census Tract    Unhealth… 2404000… Curren…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with the cleaned “Prevention” dataset

1. Once you run the above code, filter this dataset one more time for any particular subset.

Filter chunk here

*I checked each unique value of “Measure” to have a glance at which “Unhealthy Behaviors” are included in this dataset, then chose the variable “Obesity among adults aged >=18 Years” from among them. I also removed the included data point for the city of Baltimore overall.

unique(latlong_clean$Measure)
[1] "Current smoking among adults aged >=18 Years"                  
[2] "Obesity among adults aged >=18 Years"                          
[3] "No leisure-time physical activity among adults aged >=18 Years"
[4] "Binge drinking among adults aged >=18 Years"                   
baltimore_obesity <- baltimore |>
  filter(GeographicLevel =="Census Tract") |>
  filter(Measure =="Obesity among adults aged >=18 Years")

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

ggplot(baltimore_obesity, aes(x = Data_Value)) +
  geom_density(linewidth = 0.75) +
  labs(title = "Percentage of Obese Adults (18 years or older) in Population by Census Tract in the City of Baltimore, 2017",
    x = "Percentage of Obese Adults (18 years or older) in Population",
    y = "Density")
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_density()`).

*I pulled a summary of percentage data values to get an idea of where the data points were most concentrated via the mean.

summary(baltimore_obesity$Data_Value)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  19.30   34.20   41.60   39.57   46.30   55.30       1 

3. Now create a map of your subsetted dataset.

First map chunk here

leaflet() |>
  setView(lng = -76.6, lat = 39.3, zoom =12) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = baltimore_obesity,
    radius = baltimore_obesity$Data_Value*10)
Assuming "long" and "lat" are longitude and latitude, respectively

4. Refine your map to include a mouse-click tooltip

popupobesity <- paste0(
      "<b>Percentage of Obesity in Adults => 18 years: </b>", baltimore_obesity$Data_Value, "<br>",
      "<b>Population of Area: </b>", baltimore_obesity$PopulationCount, "<br>",
      "<b>Latitude in degrees: </b>", baltimore_obesity$lat, "<br>",
      "<b>Longitude in degrees: </b>", baltimore_obesity$long, "<br>")

Refined map chunk here

leaflet() |>
  setView(lng = -76.6, lat = 39.3, zoom = 12) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = baltimore_obesity,
    radius = baltimore_obesity$Data_Value* 10,
    color = "#51405e",
    fillColor = "#03fc28",
    fillOpacity = 0.35,
    popup = popupobesity )
Assuming "long" and "lat" are longitude and latitude, respectively

5. Write a paragraph

In a paragraph, describe the plots you created and what they show.

My first plot is a simple density plot displaying the distribution of percentage values of the adult population (18 years and older) in 2017 that were obese per census tract in the city of Baltimore. The data is both bimodal and considerably left-skewed with a calculated mean value of 39.57%. Values of rougly 40.00% to 50.00% (the location along the x axis of the top of the taller of the two peaks) are much more frequently observed than values closer to 0.00% in the city of Baltimore in 2017.

I next plotted these percentage values over the corresponding geographic locations of the associated census tracts on a map to visualize the relative values spatially. The relative sizes of the plotted points correspond to the percentage values of obese adults in the population of the associated census tracts. For the purpose of legibility, I multiplied these values by 10; the originally-plotted points at the default sizes were so small that there didn’t appear to be any difference in size between them. As is common in port cities the majority of the population (and therefore the plotted data points) are clustered around the center of the harbor. Since both the fill and outline of these points have the same shades of blue, it can be difficult to distinguish each point from each other, particularly in the center of the city where each circle overlaps with several others at a time.

For my final plot, I changed the fill and outline color of these circles, respectfully, to green and black, making it easier to distinguish between overlapping points. I also added a mouse-click tooltip that displays the percentage of obesity among the adult population, the overall population count, and the latitude and longitude of the associated census tract for each data point. Zooming in with the map and clicking the tooltips, you notice that a great number of data points have similar sizes and percentage values roughly between 40.00% and 50.00%. It is worth noting that some of the more expensive areas of Baltimore, including such waterfront neighborhoods as Fells Point and far northern neighborhoods as Roland Park, contain distinctly-smaller circles, and thus lower percentages of obese adults, than those in the center, east, or west of the city. It would be informative to see how other factors measuring health such as life expectancy follow a similar pattern.