GIS Assignment

Author

D Devkota

GIS 500 Healthy Cities Assignment

Data Source and Defination

The dataset used in this analysis comes from the CDC places (500 Cities) Project. “Crude prevalence” means the percentage of people in a population who have a certain health condition or behavior, without adjusting for age differences. In this analysis, physical inactivity refers to adults who do not participate in any leisure time physical activity, such as excercise, sports, or recreational activities. For more details:

https://www.cdc.gov/places/about/500-cities-2016-2019/index.html

Loading Libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
library(highcharter)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
library(maps)

Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map
latlong <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
Rows: 810103 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): StateAbbr, StateDesc, CityName, GeographicLevel, DataSource, Categ...
dbl  (6): Year, Data_Value, Low_Confidence_Limit, High_Confidence_Limit, Cit...
num  (1): PopulationCount

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data_subset <- latlong |>
  filter(
    Year == 2017,
    grepl("physical", Measure, ignore.case = TRUE)
  ) |>
  filter(!is.na(Data_Value)) |>
  slice(1:600)
Warning: There were 115464 warnings in `filter()`.
The first warning was:
ℹ In argument: `grepl("physical", Measure, ignore.case = TRUE)`.
Caused by warning in `grepl()`:
! unable to translate 'Mammography use among women aged 50<96>74 Years' to a wide string
ℹ Run `dplyr::last_dplyr_warnings()` to see the 115463 remaining warnings.
# Check data exists
nrow(data_subset)
[1] 600
city_summary <- data_subset |>
  group_by(CityName) |>
  summarise(avg_value = mean(Data_Value, na.rm = TRUE)) |>
  arrange(desc(avg_value)) |>
  slice(1:15)

highchart() |>
  hc_chart(type = "column") |>
  hc_add_series(
    data = city_summary,
    type = "column",
    hcaes(x = CityName, y = avg_value)
  ) |>
  hc_title(text = "Top 15 Cities: Physical Inactivity Rate") |>
  hc_xAxis(title = list(text = "City")) |>
  hc_yAxis(title = list(text = "Prevalence"))
library(maps)

state_coords <- map_data("state") |>
  group_by(region) |>
  summarise(
    lat = mean(lat),
    long = mean(long)
  )

data_subset <- data_subset |>
  mutate(region = tolower(StateDesc)) |>
  left_join(state_coords, by = "region")

# Check
head(data_subset$lat)
[1] 34.52859 36.71313 36.71313 36.71313 36.71313 28.73102
head(data_subset$long)
[1] -113.27464 -120.70642 -120.70642 -120.70642 -120.70642  -83.35629
state_coords <- map_data("state") |>
  group_by(region) |>
  summarise(
    lat = mean(lat),
    long = mean(long),
    .groups = "drop"
  )
data_subset <- data_subset |>
  filter(!is.na(long) & !is.na(lat))
# Check
nrow(data_subset)
[1] 540
leaflet(data_subset) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircleMarkers(
    lng = ~long,
    lat = ~lat,
    radius = ~sqrt(Data_Value) * 2,
    color = "darkgreen",
    fillOpacity = 0.7,
    popup = ~paste0("<b>City:</b> ", CityName, "<br>",
                    "<b>Physical Inactivity:</b> ", Data_Value)
  )