Assignment 3

R Markdown

The aim is to explore the spatial distribution of hospitals in Fulton and DeKalb counties, and its relationship to various socioeconomic factors. This analysis will help us understand potential inequities in healthcare access across different neighborhoods and demographic groups.

Hospital POI data will be used from Yelp and Census ACS data. Variables chosen in the Census ACS for the analysis are household income, umemployment rate and education level completion for population that is 25 years or younger to perform exploratory data analysis using these two datasets. The justification behind choosing these variables are that household income is a determinant for economic opportunity, unemployment rate can comment on stability and high school graduation rate would determine social mobility. The idea is to share insights, and answering the following question: is the spatial distribution of hospitals in the area equitable?

library(viridis)
library(tidyverse)
library(sf)
library(tmap)
library(leaflet)
library(tidycensus)
library(dplyr)   
library(tidyr) 
library(ggplot2)
library(units)

# Let's prepare data
yelp_data_url <-"https://raw.githubusercontent.com/ujhwang/urban-analytics-2024/main/Assignment/mini_3/yelp_hospital.geojson"
yelp_data <- st_read(yelp_data_url)


# Census data
census_var <- c(hhincome = 'B19019_001',
                umeploymentrate = "B23025_005", 
                education = 'B15003_001',
                pop = "B02001_001"
                )

census_data <- get_acs(geography = "tract", state = "GA", county = c("Fulton", "DeKalb"),
                 output = "wide", geometry = TRUE, year = 2020,
                 variables = census_var)
head(census_data)

#household income density variable

census_data_clean <- census_data %>%
  mutate(
    income_density = hhincomeE / popE, 
    percentage_educated = (educationE/popE)*100)

head(census_data_clean)

summarise_mean <- c(str_c(names(census_var), "E"), 
                   "review_count", "income_density", "percentage_educated")

census_yelp <- census_data_clean %>% 
  separate(col = NAME, into=c("tract","county","state"), sep=', ') %>% 
  # Spatial join
  st_join(yelp_data %>% 
            mutate(n = 1) %>% 
            st_transform(crs = st_crs(census_data_clean))) %>% 
  # Group_by
  group_by(GEOID, county) %>% 
  # Mean for all census variables, sum for n
  summarise(across(
    all_of(summarise_mean), mean), 
    n = sum(n)) %>% 
  # Release grouping
  ungroup() %>% 
  # Drop 'E' from column names
  rename_with(function(x) str_sub(x,1,nchar(x)-1), str_c(names(census_var), "E")) %>% # rename_with() renames with a function
  # Replace NA in column n&review_count with 0
  mutate(across(c(n, review_count), function(x) case_when(is.na(x) ~ 0, TRUE ~ x)))


census_yelp <- st_transform(census_yelp, st_crs(yelp_data))

buffer_distance <- 0.25 * 1609.34  # in meters

# Calculate the number of hospitals within the buffer for each tract
census_yelp <- census_yelp %>%
  mutate(hospital_count = lengths(st_intersects(st_buffer(geometry, dist = buffer_distance), yelp_data)))


#To find the distance to the nearest hospital from each census tract, use st_distance:

# Calculate distance to the nearest hospital
census_yelp <- census_yelp %>%
  rowwise() %>%
  mutate(nearest_hospital_distance = min(st_distance(geometry, yelp_data))) %>%
  ungroup()

# View the results
head(census_yelp)

summary(census_yelp$hospital_count)
summary(census_yelp$nearest_hospital_distance)


cor(census_yelp$hhincome, census_yelp$hospital_count, use = "complete.obs")
cor(census_yelp$hhincome, census_yelp$nearest_hospital_distance, use = "complete.obs")

## Including Plots

You can also embed plots, for example:


```r
library(tmap)
tmap_mode("view")

## tmap mode set to interactive viewing

# Map for hospital counts
 hospital_count_map <- tm_shape(census_yelp) +
  tm_polygons("hospital_count", palette = "Blues", title = "Number of Hospitals within 0.25 miles") +
  tm_borders()

print(hospital_count_map)
# Map for nearest hospital distance
nearest_hospital_map <- tm_shape(census_yelp) +
  tm_polygons("nearest_hospital_distance", palette = "Reds", title = "Distance to Nearest Hospital (meters)") +
  tm_borders()
  


tmap_arrange(hospital_count_map, nearest_hospital_map, sync = TRUE)

#Based on the first map of Number of hospitals within 0.25 miles, the distribution of hospitals seems to be concentrated in the center surrounded by tracts that either have 0 or lower amounts of hospitals nearby, which could indicate potential inequities in healthcare access.

#The census tracts in the eastern tip are especially vulnerable with limited access to hospitals (high nearest hospital distance)' ranging from 9.32 to 12.43 miles. This is followed by census tracts in the northernmost tip that range from 3.1 to 6.21 miles. The lowest quantile is calculates the hospital distance within 5000m or 3.11 miles. 

ggplot(data = census_yelp, 
       mapping = aes(x=percentage_educated, y=hospital_count)) +
  geom_point() + 
  geom_smooth(mapping = aes(color = county), method = lm) +
  labs(x = "Percentage educated within census tract", #<<
       y = "Hospital Count within 0.25 miles of Census Tract",
       color = "County in Census",
       title = "Does hospital count vary within tracts based on level of education?")

## `geom_smooth()` using formula = 'y ~ x'

#In DeKalb county, even as level of education increases within the tract, there seems to be a correlation with a reduction in hospital count. In Fulton county there doesn't seem to be a relationship between level of education within the tract and number of hospitals within 0.25 miles of the tract.
ggplot(data = census_yelp, 
       mapping = aes(x= hhincome, y= hospital_count)) +
  geom_point() + 
  geom_smooth(mapping = aes(color = county), method = lm) +
  labs(x = "Household Income", #<<
       y = "Hospital Count within 0.25 miles of Census Tract",
       color = "County in Census",
       title = "Does hospital count vary within tracts based on levels of household income?")

## `geom_smooth()` using formula = 'y ~ x'

#Hospital counts do not seem to increase based on household income, as the largest number of hospitals are found within 0.25 miles of tracts that have a household income ranfing from $50,000 to $100,000.

#Is the spatial distribution of hospitals in Fulton and DeKalb counties equitable?
#When looking at the metrics of household income and percentage of education, then the distribution of hospitals does not seem to vary based on those metrics alone. 
#However, in terms of spatial distribution, there are tracts that are worse off because of their location, namely the tracts in the eastern most part as they are 9 to 12 miles from the nearest hospital.Should those families not have a car, this would further worsen the problem.
#With this I would conclude that based on some tracts being isolated from hospitals, the distribution is not equitable.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Assignment 3

Hina Ahmed

2024-10-03

R Markdown