GIS Assignment - 500 Healthy Cities

Author

David Burkart

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
setwd("C:/Users/dburkart/Desktop/DATA 110/data")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)

# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Category == "Prevention") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017)
head(latlong_clean)

# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName   GeographicLevel DataSource Category  
  <dbl> <chr>     <chr>      <chr>      <chr>           <chr>      <chr>     
1  2017 AL        Alabama    Montgomery City            BRFSS      Prevention
2  2017 CA        California Concord    City            BRFSS      Prevention
3  2017 CA        California Concord    City            BRFSS      Prevention
4  2017 CA        California Fontana    City            BRFSS      Prevention
5  2017 CA        California Richmond   Census Tract    BRFSS      Prevention
6  2017 FL        Florida    Davie      Census Tract    BRFSS      Prevention
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

What variables are included? (can any of them be removed?)

names(latlong_clean)

 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"

Remove the variables that will not be used in the assignment

prevention <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)

# A tibble: 6 × 18
   Year StateAbbr StateDesc  CityName  GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>    <chr>    <chr>  
1  2017 AL        Alabama    Montgome… City            Prevent… 151000   Choles…
2  2017 CA        California Concord   City            Prevent… 616000   Visits…
3  2017 CA        California Concord   City            Prevent… 616000   Choles…
4  2017 CA        California Fontana   City            Prevent… 624680   Visits…
5  2017 CA        California Richmond  Census Tract    Prevent… 0660620… Choles…
6  2017 FL        Florida    Davie     Census Tract    Prevent… 1216475… Choles…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

md <- prevention |>
  filter(StateAbbr=="MD")
head(md)

# A tibble: 6 × 18
   Year StateAbbr StateDesc CityName  GeographicLevel Category  UniqueID Measure
  <dbl> <chr>     <chr>     <chr>     <chr>           <chr>     <chr>    <chr>  
1  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Chole…
2  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Visit…
3  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Visit…
4  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Curre…
5  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Curre…
6  2017 MD        Maryland  Baltimore Census Tract    Preventi… 2404000… "Visit…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

unique(md$CityName)

[1] "Baltimore"

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with a cleaned dataset.

1. Once you run the above code, filter this dataset one more time for any particular subset with no more than 900 observations.

Filter chunk here

us <- latlong |>
  filter(GeographicLevel == "City") |>
  filter(Short_Question_Text == "Chronic Kidney Disease") |>
  filter(DataValueTypeID == "AgeAdjPrv") |>
  filter(Year == "2017") |>
  filter(StateAbbr != "AK") |>
  filter(StateAbbr != "HI")
head(us)

# A tibble: 6 × 25
   Year StateAbbr StateDesc     CityName     GeographicLevel DataSource Category
  <dbl> <chr>     <chr>         <chr>        <chr>           <chr>      <chr>   
1  2017 CA        California    Menifee      City            BRFSS      Health …
2  2017 CT        Connecticut   New Britain  City            BRFSS      Health …
3  2017 FL        Florida       Lakeland     City            BRFSS      Health …
4  2017 PA        Pennsylvania  Pittsburgh   City            BRFSS      Health …
5  2017 SC        South Carolin Rock Hill    City            BRFSS      Health …
6  2017 TX        Texas         College Sta… City            BRFSS      Health …
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

db <- us |>
  select(-GeographicLevel, -DataSource,-Category, -UniqueID, -Data_Value_Unit, -DataValueTypeID, -Data_Value_Type, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote, -CategoryID, -TractFIPS)
head(db)

# A tibble: 6 × 12
   Year StateAbbr StateDesc    CityName Measure Data_Value PopulationCount   lat
  <dbl> <chr>     <chr>        <chr>    <chr>        <dbl>           <dbl> <dbl>
1  2017 CA        California   Menifee  Chroni…        2.9           77519  33.7
2  2017 CT        Connecticut  New Bri… Chroni…        3.2           73206  41.7
3  2017 FL        Florida      Lakeland Chroni…        3.3           97422  28.1
4  2017 PA        Pennsylvania Pittsbu… Chroni…        3            305704  40.4
5  2017 SC        South Carol… Rock Hi… Chroni…        3.2           66154  34.9
6  2017 TX        Texas        College… Chroni…        2.9           93857  30.6
# ℹ 4 more variables: long <dbl>, MeasureId <chr>, CityFIPS <dbl>,
#   Short_Question_Text <chr>

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

p1 <- db |>
  ggplot(aes(x = lat, y = Data_Value)) +
  geom_point(alpha = 0.5, color = "#9c4016") +
  scale_color_viridis_d()+
  #geom_jitter() +
  labs(title = "Prevalence of Chronic Kidney Disease by Latitude in the Continuous United States",
       x = "Latitude",
       y = "Prevalence (%)",
       caption = "Source:Centers for Disease Control and Prevention (CDC), \n Division of Population Health, Epidemiology and Surveillance Branch")  +
  theme_classic() +
  scale_y_continuous(limits = c(0,5)) +
  geom_vline(xintercept = 37, linetype = "dotdash", size = 0.5, color = "black") +
  geom_text(aes(x=43, y=1, label="Northern States"), cex=3.5, color="black") +
  geom_text(aes(x=31, y=1, label="Southern States"), cex=3.5, color="black")

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

p1

Warning in geom_text(aes(x = 43, y = 1, label = "Northern States"), cex = 3.5, : All aesthetics have length 1, but the data has 498 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

Warning in geom_text(aes(x = 31, y = 1, label = "Southern States"), cex = 3.5, : All aesthetics have length 1, but the data has 498 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

3. Now create a map of your subsetted dataset.

First map chunk here

library(leaflet)

Warning: package 'leaflet' was built under R version 4.4.3

leaflet() |>
  setView(lng = -94.57857, lat = 39.09973, zoom =4) |>
  addProviderTiles("OpenStreetMap.Mapnik") |>
  addCircles(
    data = db,
    radius = (db$Data_Value*10000),
     color = "#9c4016",
    fillColor = "#de9126",
    fillOpacity = 0.25
)

Assuming "long" and "lat" are longitude and latitude, respectively

4. Refine your map to include a mouse-click tooltip

Refined map chunk here

popups <- paste0(
      "<b>State: </b>", db$StateDesc, "<br>",
      "<b>City: </b>", db$CityName, "<br>",
      "<b>Population: </b>", db$PopulationCount, "<br>",
      "<strong>Prevalence (%): </strong>", db$Data_Value, "<br>"
    )
leaflet() |>
  setView(lng = -94.57857, lat = 39.09973, zoom =4) |>
  addProviderTiles("OpenStreetMap.Mapnik") |>
  addCircles(
    data = db,
    radius = (db$Data_Value)*10000,
     color = "#9c4016",
    fillColor = "#de9126",,
    fillOpacity = 0.5,
     popup = popups)

Assuming "long" and "lat" are longitude and latitude, respectively

5. Write a paragraph

In a paragraph, describe the plots you created and what they show.

I created these plots because I wanted to know if chronic kidney disease was more prevalent in southern US states than northern US states (distinguished by the 37th parallel) due to potential dietary differences between the two groups. Although I did not properly analyze the data, the graph and map that I created do not show any obvious difference between southern and northern states. The graph is a scatter plot with latitude on the x axis and prevalence of chronic kidney disease on the y axis (the response variable). A vertical line through x=37 divides the data into the two state groups. The map shows the geographical distribution of the data with the size of the circle corresponding to the prevalence. Since there wasn’t significant variation in this data, all of the circles are of similar size. Furthermore, it seems like most of the data was taken from metropolitan areas (Los Angeles, New York City, Chicago), so this doesn’t represent full coverage of the United States and therefore doesn’t provide any anecdotal evdience towards answering the original question. I chose the color orange for both these plots as orange is the color for kidney disease awareness.