GIS Assignment

Author

Sam Rajabian

Load libraries and dataset

library(tidyverse)
library(tidyr)
library(leaflet)
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

Split GeoLocation into lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
#head(latlong)

Questions

1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.

Filter chunk here (you may need multiple chunks)

latlong_filtered <- latlong |>
  filter(Year == 2017) |>
  filter(MeasureId == "CSMOKING") |>
  filter(StateAbbr %in% c("MD", "VA", "DC"))
head(latlong_filtered)

# A tibble: 6 × 25
   Year StateAbbr StateDesc     CityName   GeographicLevel DataSource Category  
  <dbl> <chr>     <chr>         <chr>      <chr>           <chr>      <chr>     
1  2017 VA        Virginia      Hampton    City            BRFSS      Unhealthy…
2  2017 VA        Virginia      Roanoke    Census Tract    BRFSS      Unhealthy…
3  2017 DC        District of C Washington Census Tract    BRFSS      Unhealthy…
4  2017 DC        District of C Washington Census Tract    BRFSS      Unhealthy…
5  2017 DC        District of C Washington Census Tract    BRFSS      Unhealthy…
6  2017 DC        District of C Washington Census Tract    BRFSS      Unhealthy…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

dim(latlong_filtered)

[1] 906  25

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

ggplot(latlong_filtered, aes(x = PopulationCount, y = Data_Value, color = StateAbbr)) +
  geom_point(alpha = 0.5) + 
  labs(title = "DMV Smoking Prevalence by Small Area Population (2017)",
       x = "Population",
       y = "% Current Smokers",
       caption = "Source: CDC") +
  facet_wrap(~StateAbbr) +
  xlim(0, 10000) + 
  theme_bw() +
  scale_color_manual(values = c("VA" = "darkred", 
                                "MD" = "darkcyan", 
                                "DC" = "darkviolet")) +
  theme(legend.position = "none") #https://www.geeksforgeeks.org/r-language/remove-legend-in-ggplot2-in-r/

3. Now create a map of your subsetted dataset.

First map chunk here

#create palette
pal <- colorFactor(palette = c("darkviolet", "darkcyan","darkred"), 
               levels = c("DC", "MD", "VA"), latlong_filtered$StateAbbr)
#create map
leaflet() |>
  setView(lng = -77.5, lat = 38, zoom = 6.5) |>
  addProviderTiles("Stadia.AlidadeSmoothDark") |>
  addCircles(
    data = latlong_filtered,
    radius = sqrt(1.333^latlong_filtered$Data_Value)*2,
    color = ~pal(latlong_filtered$StateAbbr)
)

4. Refine your map to include a mouse-click tooltip

Refined map chunk here

#create tooltip
tooltip <- paste0(
      "<b>State: </b>", latlong_filtered$StateAbbr, "<br>",
      "<b>Current Smokers: </b>", latlong_filtered$Data_Value, "%<br>",
      "<b>Area Population: </b>", latlong_filtered$PopulationCount, "<br>"
    )

#apply to map
leaflet() |>
  setView(lng = -77.5, lat = 38, zoom = 6.5) |>
  addProviderTiles("Stadia.AlidadeSmoothDark") |>
  addCircles(
    data = latlong_filtered,
    radius = sqrt(1.333^latlong_filtered$Data_Value)*2,
    color = ~pal(latlong_filtered$StateAbbr),
    popup = tooltip
  )

5. Write a paragraph

In a paragraph, describe the plots you created and the insights they show.

I filtered the cities500 dataset to only include information on smoking in the DMV in 2017. With this filtered dataset, I created a faceted scatterplot of smoking prevalence (%) versus the number of people within the measured areas separated by state. I created this plot to see if the percentage of smokers was related to the number of people in an area, and while the correlation is not very strong it is interesting to see that there tends to be fewer smokers as area population increases in DC and VA. This is less apparent in Maryland, though a weakness in this dataset is that Maryland’s data only contains the city of Baltimore.

I also created a map to visualize this data in the areas being recorded, with color representing the state and the points increasing in size relative to percentage of smokers. The map shows that the highest percentages of smokers tend to be towards the center of major cities, including Baltimore, Richmond, and Norfolk. I expected DC would be the same, however the highest percentages of DC smokers lie in the southeast suburbs. The percentages in the city are relatively low at ~20% rather than the SE’s ~40%.