Healthy Cities GIS Assignment

Author

STEVE DONFACK

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
library(ggthemes)
library(leaflet)
setwd("C:/Users/steve/Downloads")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

1. Once you run the above code and learn how to filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations.

latlong <- cities500 |>
  mutate(GeoLocation = str_replace_all(GeoLocation,"[()]", "")) |>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)

# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Minnesota_clean <- latlong |>
  filter(StateAbbr == "MN") |>
  filter(PopulationCount >= "3000") |>
  filter(Category == "Prevention") |> 
  filter(Year == "2017") |>
  filter(GeographicLevel == "Census Tract")
head(Minnesota_clean)

# A tibble: 6 × 25
   Year StateAbbr StateDesc CityName      GeographicLevel DataSource Category  
  <dbl> <chr>     <chr>     <chr>         <chr>           <chr>      <chr>     
1  2017 MN        Minnesota Minneapolis   Census Tract    BRFSS      Prevention
2  2017 MN        Minnesota Brooklyn Park Census Tract    BRFSS      Prevention
3  2017 MN        Minnesota Duluth        Census Tract    BRFSS      Prevention
4  2017 MN        Minnesota Minneapolis   Census Tract    BRFSS      Prevention
5  2017 MN        Minnesota Minneapolis   Census Tract    BRFSS      Prevention
6  2017 MN        Minnesota Duluth        Census Tract    BRFSS      Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

let’s get rid of the column we don’t need

Minnesota_clean2 <- Minnesota_clean |>
  select(-TractFIPS, -DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(Minnesota_clean2)

# A tibble: 6 × 17
   Year StateAbbr StateDesc CityName   GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>     <chr>      <chr>           <chr>    <chr>    <chr>  
1  2017 MN        Minnesota Minneapol… Census Tract    Prevent… 2743000… "Chole…
2  2017 MN        Minnesota Brooklyn … Census Tract    Prevent… 2707966… "Curre…
3  2017 MN        Minnesota Duluth     Census Tract    Prevent… 2717000… "Visit…
4  2017 MN        Minnesota Minneapol… Census Tract    Prevent… 2743000… "Takin…
5  2017 MN        Minnesota Minneapol… Census Tract    Prevent… 2743000… "Curre…
6  2017 MN        Minnesota Duluth     Census Tract    Prevent… 2717000… "Visit…
# ℹ 9 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, Short_Question_Text <chr>

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

##plot 1

# non map plot

ggplot(Minnesota_clean2, aes(x=PopulationCount, y=Data_Value, color = CityName)) +
  geom_point(alpha = 10) +
  scale_color_discrete()+
  geom_jitter() +
  labs(title = "Relationship between the population of each town and the value collected in Minnesota in 2017",
       caption = "Source: Cities500 Database")  +
  theme_minimal()

##Plot 2

ggplot(Minnesota_clean2, aes(x=Short_Question_Text, y=Data_Value, color = CityName)) +
  geom_point() +
  scale_color_discrete()+
  geom_jitter() +
  labs(title = " percentage of Data collected per questions asked ",
       caption = "Source: Cities500 Database")  +
  theme_economist()

3. Now create a map of your subsetted dataset.

We are creating our map. on this map for size sake , we decided to focus only on the city of minneapolis

# leaflet()
leaflet() |>
  setView(lng = -93.2, lat = 44.9, zoom =10 ) |>
  addProviderTiles("Esri.NatGeoWorldMap") |>
  addCircles(
    data = Minnesota_clean2 ,
    radius = Minnesota_clean2$Data_Value,
    color = "green",
    fillColor = "white",
    fillOpacity = "0.5"
)

Assuming "long" and "lat" are longitude and latitude, respectively

4. Refine your map to include a mouse-click tooltip

Let’s create our popup first

popupminnesota <- paste0(
      "<b>Year: </b>", Minnesota_clean2$Year, "<br>",
      "<b>City: </b>", Minnesota_clean2$CityName, "<br>",
      "<b>Value_in_percent: </b>", Minnesota_clean2$Data_Value, "<br>",
      "<b>Populationcount: </b>", Minnesota_clean2$PopulationCount, "<br>",
      "<b>questionasked: </b>", Minnesota_clean2$Short_Question_Text, "<br>"
)

4. Refine your map to include a mouse-click tooltip

Here we refined our map to add a tooltip, which will enable our audience to see for minneapolis , the exact data value in percent for each categrory , the year , the population count and the question that was asked to the population.

leaflet() |>
  setView(lng = -93.26, lat = 44.97, zoom =11 ) |>
  addProviderTiles("Esri.NatGeoWorldMap") |>
  addCircles(
    data = Minnesota_clean2 ,
    radius = sqrt(Minnesota_clean2$Data_Value),
    color = "red",
    fillColor = "white",
    fillOpacity = 0.25,
    popup = popupminnesota
  )

Assuming "long" and "lat" are longitude and latitude, respectively

5. Write a paragraph

In one paragraph, describe the graphs you created and what they show.

For this assignment, we used the “Cities500” data set. We first cleaned our data set. We wanted our subset to include only observations from Minnesota in 2017, with prevention as the category, census tract as the geographic level, and a population greater than or equal to 3,000.

To achieve this, we first decided to filter our data set to obtain a subset of no more than 900 observations. We first applied a filter to keep only observations from Minnesota, then another filter to keep only the population (which constitutes our sample here) greater than or equal to 3,000. We then selected only the “Prevention” category and the census tract as the geographic area, followed by another filter to retain only observations from 2017.

We started with a data set of over 800,000 observations to obtain a new subset of only 764 observations. To better clean our data set, we decided to remove the column that was unnecessary for our analysis.

The first graph we created aimed to observe the relationship between the population of each city and the value of the data collected. The second graph aimed to precisely identify the category for which the most data was collected. As the graph shows, less than 25% of the respondents in this study have health insurance, while over 50% have an annual health checkup, cholesterol screening, or take blood pressure medication.

We then mapped our subset. For reasons of size, we chose to focus only on the city of Minneapolis. Finally, we refined the map to add a tooltip that allows our audience to see the exact percentage value of the data for Minneapolis for each category, year, population, and question asked.