GIS Assignment

Author

D Devkota

GIS 500 Healthy Cities Assignment

Data Source and Defination

The dataset used in this analysis comes from the CDC places (500 Cities) Project. “Crude prevalence” means the percentage of people in a population who have a certain health condition or behavior, without adjusting for age differences. In this analysis, physical inactivity refers to adults who do not participate in any leisure time physical activity, such as excercise, sports, or recreational activities. For more details:

https://www.cdc.gov/places/about/500-cities-2016-2019/index.html

Loading Libraries

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(leaflet)
library(highcharter)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

library(maps)


Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map

latlong <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")

Rows: 810103 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): StateAbbr, StateDesc, CityName, GeographicLevel, DataSource, Categ...
dbl  (6): Year, Data_Value, Low_Confidence_Limit, High_Confidence_Limit, Cit...
num  (1): PopulationCount

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

data_subset <- latlong |>
  filter(
    Year == 2017,
    grepl("physical", Measure, ignore.case = TRUE)
  ) |>
  filter(!is.na(Data_Value)) |>
  slice(1:600)

Warning: There were 115464 warnings in `filter()`.
The first warning was:
ℹ In argument: `grepl("physical", Measure, ignore.case = TRUE)`.
Caused by warning in `grepl()`:
! unable to translate 'Mammography use among women aged 50<96>74 Years' to a wide string
ℹ Run `dplyr::last_dplyr_warnings()` to see the 115463 remaining warnings.

# Check data exists
nrow(data_subset)

[1] 600

city_summary <- data_subset |>
  group_by(CityName) |>
  summarise(avg_value = mean(Data_Value, na.rm = TRUE)) |>
  arrange(desc(avg_value)) |>
  slice(1:15)

highchart() |>
  hc_chart(type = "column") |>
  hc_add_series(
    data = city_summary,
    type = "column",
    hcaes(x = CityName, y = avg_value)
  ) |>
  hc_title(text = "Top 15 Cities: Physical Inactivity Rate") |>
  hc_xAxis(title = list(text = "City")) |>
  hc_yAxis(title = list(text = "Prevalence"))

library(maps)

state_coords <- map_data("state") |>
  group_by(region) |>
  summarise(
    lat = mean(lat),
    long = mean(long)
  )

data_subset <- data_subset |>
  mutate(region = tolower(StateDesc)) |>
  left_join(state_coords, by = "region")

# Check
head(data_subset$lat)

[1] 34.52859 36.71313 36.71313 36.71313 36.71313 28.73102

head(data_subset$long)

[1] -113.27464 -120.70642 -120.70642 -120.70642 -120.70642  -83.35629

state_coords <- map_data("state") |>
  group_by(region) |>
  summarise(
    lat = mean(lat),
    long = mean(long),
    .groups = "drop"
  )
data_subset <- data_subset |>
  filter(!is.na(long) & !is.na(lat))
# Check
nrow(data_subset)

[1] 540

leaflet(data_subset) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircleMarkers(
    lng = ~long,
    lat = ~lat,
    radius = ~sqrt(Data_Value) * 2,
    color = "darkgreen",
    fillOpacity = 0.7,
    popup = ~paste0("<b>City:</b> ", CityName, "<br>",
                    "<b>Physical Inactivity:</b> ", Data_Value)
  )