1. Prep Data

1.1 Import Hospital POI Data

hospitals <- st_read("https://raw.githubusercontent.com/ujhwang/urban-analytics-2025/main/Assignment/mini_3/hospital_11counties.geojson")
## Reading layer `hospital_11counties' from data source 
##   `https://raw.githubusercontent.com/ujhwang/urban-analytics-2025/main/Assignment/mini_3/hospital_11counties.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 119 features and 8 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -84.73147 ymin: 33.42719 xmax: -83.92052 ymax: 34.24585
## Geodetic CRS:  WGS 84

1.2 Extract Census Data for the 11 Counties in City of Atlanta

#Extract Census data for the 4 variables
census_var <- c(mhincome = "B19019_001E", #Median Household Income in the Past 12 Months by Household Size
                totalpop = "B01003_001E", #total population
                whitepop = "B02001_002E", #white population
                blackpop = "B02001_003E") #black population

#List of counties within City of Atlanta
census_county <- c("Cherokee", "Clayton", "Cobb", "DeKalb", "Douglas", 
                   "Fayette", "Forsyth", "Fulton", "Gwinnett", "Henry", "Rockdale")

#Retrieve Census ACS Data at census tract level
census <- get_acs(geography = "tract", state = "GA", county = census_county,
                  output = "wide", geometry = TRUE, year = 2023,
                  variables = census_var) %>%
                  mutate(white_pct = (whitepop/totalpop)*100, 
                         #percentage of white population out of total population
                         black_pct = (blackpop/totalpop)*100,
                         #percentage of black population out of total population
                         belowmhincome = ifelse(mhincome < median(mhincome, na.rm = TRUE), 1, 0)) %>% 
                                                #1 for below median household income
                            separate(col = NAME, into=c("tract","county","state"), sep='; ') 

1.2.1 Justification for each variable I selected:

• Median Household Income:
    Median household income is chosen as a variable because income and lower socioeconomic groups are often seen having less opportunities and I want to test the hypothesis out. Households with lower income sometimes may need to be in closer proximity to hospitals because not all households have cars to travel far for healthcare.
• Total Population:
    Total population variable is chosen to later divide white and black population data to better standardize across census tracts.
• White Population & Black Population:
    Race is always an important factor in evaluating equity and equality.I wanted to see if race influences the number of accessible hospitals. Since racial minority neighborhoods tend to not have decent built infrastructure and services. White population and black population will later divided by total population to create variables percentage of white/black population.

2. Equity Metric: The number of hospitals within X miles

2.1 Create centroid from census tract polygons, create buffer, to calculate how many hospitals within 10km driving distance

#Transform to appropriate pcs
census <- st_transform(census, 32616)
hospitals <- st_transform(hospitals, 32616)

#Convert polygon to centroids
census_centroids <- st_centroid(census)

#Create 10 km buffers around each centroid
census_buffers <- st_buffer(census_centroids, dist = 10000) #10km buffer

#Spatial join to count hospitals in each buffer
census_hospitalcount <- census_buffers %>%
                              st_join(hospitals %>% mutate(n = 1)) %>%
                              group_by(GEOID) %>% 
                              summarise(hosp_count_10km = sum(n, na.rm = TRUE)) 

#Join hospital counts back to original census dataframe
census_final <- census %>%
                  left_join(census_hospitalcount %>%
                              st_drop_geometry(), by = "GEOID")

3. Analyze spatial distribution of hospitals from an equity perspective

3.1 Conduct exploratory data analysis and present graphs and maps

3.1.1 Graphs on County-Wise Information - Median Household Income Bar Chart by County

#Create summary statistics by County
census_dissolve <- census_final %>%
                      group_by(county) %>%
                      summarise(med_mhincome = median(mhincome, na.rm = TRUE), #calculate median of the median household income of all tracts
                                black_pct = (sum(blackpop)/sum(totalpop))*100) %>% #calculate total black population percentage out of entire county population
                      mutate(county = str_replace_all(county, regex("county", ignore_case = TRUE), "") %>%
                              str_trim())
#Plot ggplot
ggplot(census_dissolve, aes(x = county, y = med_mhincome)) +
  geom_col(fill = "#7dd481") +
  labs(
    title = "Median Household Income by County",
    x = "County",
    y = "Median Household Income ($)"
  ) +
  scale_y_continuous(labels = comma, limits = c(0, 150000)) +
  geom_text(aes(label = scales::comma(med_mhincome)), 
            vjust = -0.5, #position text above bar
            size = 4) +
  theme_minimal()

Explanation:

    Among all counties, Forsyth County has the highest median household income, followed by Fayette County and Cherokee County. The lowest median county is located in Clayton County. The average median household income across all counties is 92,982.2. This bar chart shows that Clayton, DeKalb, Douglas, Gwinnett, Henry, and Rockdale County are under the county average of median household income.

3.1.2 Graphs on County-Wise Information - Black population Percentage Bar Chart by County

ggplot(census_dissolve, aes(x = county, y = black_pct)) +
  geom_col(fill = "#828282") +
  labs(
    title = "Black Population Percentage by County",
    x = "County",
    y = "Black Population % (%)"
  ) +
  scale_y_continuous(labels = comma, limits = c(0, 70)) +
  geom_text(aes(label = scales::comma(black_pct)), 
            vjust = -0.5, #position text above bar
            size = 4) +
  theme_minimal()

Explanation:

    Among all counties, Clayton County has the highest percentage of black population, while Forsyth County has the smallest percentage. Linking back to the conclusion of the previous median household income bar chart - Forsyth highest, Clayton lowest - can reveal some racial and income disparities.

3.1.3 Maps on Census Tract Level Analysis - Thematic Map of Black Population Percentage by Census Tract

tmap_mode("view")
tm_shape(census_final) + 
  tm_polygons("black_pct",
              palette = "purples", 
              style = "quantile", #quantile classification method
              n = 5, # 5 bins
              border.col = "grey",
              lwd = 0.5,
              title = "Black Population %") +
  tm_layout(title = "Black Population Percentage by Census Tract")

Explanation:

    Merely looking at the spatial distribution across City of Atlanta, it seems that census tracts in the south of City of Atlanta have higher black population percentage.

3.1.4 Maps on Census Tract Level Analysis - Thematic Map of Median Household Income by Census Tract

tm_shape(census_final) + 
  tm_polygons("mhincome",
              palette = "BuGn", #green palette
              style = "quantile", #quantile classification method
              n = 5, # 5 bins
              border.col = "grey",
              lwd = 0.5,
              title = "Median Household Income") +
  tm_layout(title = "Median Household Income by Census Tract")

Explanation:

    Merely looking at the spatial distribution across City of Atlanta, there isn’t too much of pattern in median household income, but there seems to be more census tracts in the center and north with higher median household income.

3.1.5 Maps on Census Tract Level Analysis - Thematic Map of Number of Hospitals within 10km from Census Tract Centroid

tm_shape(census_final) + 
  tm_polygons("hosp_count_10km",
              palette = "teal", 
              style = "quantile", #quantile classification method
              n = 5, # 5 bins
              border.col = "white",
              lwd = 0.5,
              title = "Hospital Count") +
    tm_layout(title = "Total Hospital Count within 10km Census Tract Buffer")

Explanation:

    Merely looking at the spatial distribution across City of Atlanta, hospital count within 10km of census tract centroid seems to increase as you travel towards the city center. There also seems to be more hospitals on census tracts in north.

3.2 Regression model to test associations between healthcare access and income and race

m1 <- glm(hosp_count_10km ~ white_pct + black_pct + mhincome + belowmhincome, 
          data=census_final, family = "gaussian")
summary(m1)
## 
## Call:
## glm(formula = hosp_count_10km ~ white_pct + black_pct + mhincome + 
##     belowmhincome, family = "gaussian", data = census_final)
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.331e+01  1.029e+00  12.926  < 2e-16 ***
## white_pct     -6.010e-02  1.186e-02  -5.067 4.67e-07 ***
## black_pct     -9.711e-02  9.587e-03 -10.129  < 2e-16 ***
## mhincome       2.613e-07  5.610e-06   0.047    0.963    
## belowmhincome  7.005e-01  4.683e-01   1.496    0.135    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 27.28681)
## 
##     Null deviance: 36879  on 1231  degrees of freedom
## Residual deviance: 33481  on 1227  degrees of freedom
##   (16 observations deleted due to missingness)
## AIC: 7576.7
## 
## Number of Fisher Scoring iterations: 2

Explanation:

    Among all independent variables, percentage white and black population have a more statistically significant impact towards hospital count. Although both variables have a slight negative coefficient, percent black population has a lower coefficient, indicating that with 1 percentage point increase in black population, hospital count decreases 0.037 more than 1 percentage point increase in white population.
    For median household income, there is a positive coefficient, but the coefficient is nearly zero and has a p value almost 1, indicating that median household income doesn’t have a big influence on hospital count.
    For tracts below median household income, the census tracts are predicted to have 0.7 more hospitals compared to tracts above median, but the coefficient is also not as statistically significant as racial composition.
    The conclusion from this regression model is that race seems to have a more substantial and statistically significant impact on hospital count.

4. Verdict: Is the spatial distribution of hospitals in Metro Atlanta equitable?

    My final verdict is that the spatial distribution of hospitals in Metro Atlanta is equitable regarding to median household income, but not equitable regarding to people of color. The choropleth maps and the regression model all point towards a pattern of lower median household income for census tracts with bigger black population percentage, and with every 1 percentage point increase in black population, it causes a greater decrease in hospital count compared to white population. I suspect that median household income may not have as big of a impact on hospital count becuase median household income may also simultaneously influence racial distribution, and vice versa.