Mini Assignment 3

Import Libraries

library(tidycensus)
library(sf)
library(tmap)
library(tidyverse)
library(here)
library(knitr)
library(kableExtra)
library(glue)
library(tigris)
library(skimr)
library(broom)

1. Import the hospital POI data and make sure it is tidy and ready for analysis.

#Import Data
hospital <- st_read("https://raw.githubusercontent.com/ujhwang/urban-analytics-2025/main/Assignment/mini_3/hospital_11counties.geojson", quiet = TRUE)

#Check Data
skim(hospital)

Data summary
Name	hospital
Number of rows	119
Number of columns	9
_______________________
Column type frequency:
character	7
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
id	1	27	27	119
name	1	11	72	104
address	1	25	66	119
primary_type	1	8	8	1
types	1	47	96	12
status	1	11	11	1
geometry	1	24	26	119

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
rating	0	1	4.04	0.86	2	3.35	4.3	4.8	5	▂▂▂▃▇
rating_count	0	1	633.93	783.33	50	91.00	285.0	838.0	3882	▇▂▁▁▁

Some rows share the same “name” (e.g., Resurgens Orthopaedics), but this appears to be coincidental. Given that there are 119 total rows and 119 unique addresses and geometries, the dataset seems well-structured with no duplicated records.

#Download Tract Boundaries
atlanta_MPO <- counties("GA", progress_bar = F) %>% 
  filter(NAME %in% c("Cherokee", "Clayton", "Cobb", "DeKalb", "Douglas", "Fayette", "Forsyth", "Fulton", "Gwinnett", "Henry", "Rockdale"))

#Check Distribution of hospitals
tmap_mode("view")
tm_shape(hospital) + tm_dots(fill="red") + 
  tm_shape(atlanta_MPO) + tm_borders(col = "black", lwd = 2)

From the plot above, there is one hospital located in Newton County which lies outside our research area but adjacent to Rockdale County. I decided to remove this hospital (Resurgens Orthopaedics). To be thorough, I attempts to exclude all hospitals located outside our defined research area.

#Spatial Join
hospital %<>% st_transform(32616) %>% st_make_valid()
atlanta_MPO %<>% st_transform(32616) %>% st_make_valid()
hosp_in_mpo <- st_join(hospital, atlanta_MPO, join = st_within, left = FALSE)

#Check Distribution of hospitals Again
tmap_mode("view")
tm_shape(hosp_in_mpo) + tm_dots(fill="red") + 
  tm_shape(atlanta_MPO) + tm_borders(col = "black", lwd = 2)

Now it looks good. I will continue to analyze.

2. Select variables in the Census ACS 5-year estimates that are relevant for an equity analysis

1. Total Population

Hospitals often locate in areas with larger populations to maximize potential patient volume and profit. Including total population helps assess whether hospital distribution aligns with population size or reflects inequitable access. #### 2. Population Density Population density may be an even stronger determinant than total population, as hospitals tend to cluster in densely populated urban areas. This variable captures spatial concentration of demand for healthcare services. #### 3. Population Aged 65+ and Under 5 Older adults and young children typically require more frequent medical care due to age-related health vulnerabilities. These age groups are important indicators of healthcare needs within a community. #### 4. Percentage of People with Disabilities Individuals with disabilities often face greater healthcare needs and accessibility challenges. Including this variable allows evaluation of whether hospital locations adequately serve populations with higher medical dependency. #### 5. Percentage Below Poverty Line Low-income populations may have limited access to healthcare, and hospitals might avoid locating in such areas due to lower profitability. This variable helps assess economic inequities in hospital accessibility. #### 6. Median Household Income Household income reflects overall economic capacity in an area. Higher-income regions may attract more healthcare facilities seeking financially stable patient bases, potentially creating inequities in service distribution. #### 7. Percentage of Households with No Vehicle Available Lack of vehicle ownership limits mobility and affects the ability to reach healthcare facilities. Including this variable helps identify areas where proximity to hospitals is especially critical for equitable access. #### 8. Percentage of Uninsured Population Regions with higher uninsured rates may be underserved by hospitals due to lower reimbursement potential. This variable captures a structural factor contributing to healthcare access disparities. #### 9. Percentage of Non-Hispanic White Population Racial composition can reveal potential inequities in healthcare infrastructure. Examining this variable helps determine whether hospital distribution disproportionately favors or disadvantages certain racial or ethnic groups.

3. Download ACS data for the selected variables at the Census Tract level for the 11 counties using the tidycensus

## Collect data for the control variables
tidycensus::census_api_key(Sys.getenv("CENSUS_API_KEY"))

tract_control_vars <- suppressMessages(
  get_acs(geography = "tract",
          state = "GA",
          county = c("Cherokee", "Clayton", "Cobb", "DeKalb", "Douglas", "Fayette", "Forsyth", "Fulton", "Gwinnett", "Henry", "Rockdale"),
          variables = c(population = "B01003_001E", #Population
                        age_male_under5 = "B01001_003E", #Population Under 5
                        age_male_65to66 = "B01001_020E", #Population Aged 65+
                        age_male_67to69 = "B01001_021E", #Population Aged 65+
                        age_male_70to74 = "B01001_022E", #Population Aged 65+
                        age_male_75to79 = "B01001_023E", #Population Aged 65+
                        age_male_80to84 = "B01001_024E", #Population Aged 65+
                        age_male_85to = "B01001_025E", #Population Aged 65+
                        age_female_under5 = "B01001_027E", #Population Under 5
                        age_female_65to66 = "B01001_044E", #Population Aged 65+
                        age_female_67to69 = "B01001_045E", #Population Aged 65+
                        age_female_70to74 = "B01001_046E", #Population Aged 65+
                        age_female_75to79 = "B01001_047E", #Population Aged 65+
                        age_female_80to84 = "B01001_048E", #Population Aged 65+
                        age_female_85to = "B01001_049E", #Population Aged 65+
                        nodisab_under18 = "C18108_005E", #Number of Disabilities
                        nodisab_18to64 = "C18108_009E", #Number of Disabilities
                        nodisab_over64 = "C18108_013E", #Number of Disabilities
                        poverty = "B17001_002E", #Poverty Status in the Past 12 Months by Sex by Age
                        hhincome = "B19013_001E", #Median Household Income in the Past 12 Months (in 2023 Inflation-Adjusted Dollars)
                        household = "B08201_001E", #The number of Households
                        novehicle = "B08201_002E", #Household Size by Vehicles Available (No Vehicle)
                        uninsured_under19 = "B27010_017E", #Types of Health Insurance Coverage by Age
                        uninsured_19to34 = "B27010_033E", #Types of Health Insurance Coverage by Age
                        uninsured_35to64 = "B27010_050E", #Types of Health Insurance Coverage by Age
                        uninsured_over65 = "B27010_066E", #Types of Health Insurance Coverage by Age
                        race_ethnic_total = "B03002_001E", #Ethnicity
                        non_hispanic_white = "B03002_003E" #Ethnicity
                        ),
          year = 2023,
          survey = "acs5", 
          geometry = TRUE,
          output = "wide"))

4. Prepare the data for analysis

# Check for missing values and fill them
skim(tract_control_vars)

Data summary
Name	tract_control_vars
Number of rows	1248
Number of columns	59
_______________________
Column type frequency:
character	3
numeric	56
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
GEOID	1	11	11	1248
NAME	1	38	45	1248
geometry	1	169	3960	1248

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
population	0	1.00	4011.74	1514.92	0	2928.00	3876.5	4925.00	10640	▁▇▅▁▁
B01003_001M	0	1.00	722.21	385.99	14	476.75	651.5	872.00	5042	▇▁▁▁▁
age_male_under5	0	1.00	120.52	106.13	0	44.75	95.0	167.00	1088	▇▂▁▁▁
B01001_003M	0	1.00	98.48	76.98	5	46.00	82.0	129.00	839	▇▁▁▁▁
age_male_65to66	0	1.00	35.75	40.64	0	8.00	26.0	50.00	432	▇▁▁▁▁
B01001_020M	0	1.00	40.48	37.08	3	16.00	31.0	50.00	529	▇▁▁▁▁
age_male_67to69	0	1.00	45.53	42.99	0	14.00	37.0	64.25	318	▇▂▁▁▁
B01001_021M	0	1.00	45.03	35.28	2	21.00	37.0	57.00	345	▇▁▁▁▁
age_male_70to74	0	1.00	62.59	56.20	0	22.00	50.0	89.00	553	▇▂▁▁▁
B01001_022M	0	1.00	53.10	43.07	2	25.00	44.0	66.25	543	▇▁▁▁▁
age_male_75to79	0	1.00	38.03	42.25	0	9.00	28.0	55.00	487	▇▁▁▁▁
B01001_023M	0	1.00	38.03	30.21	2	16.00	31.0	48.00	297	▇▁▁▁▁
age_male_80to84	0	1.00	20.87	29.96	0	0.00	11.5	29.25	305	▇▁▁▁▁
B01001_024M	0	1.00	28.34	25.21	2	14.00	20.0	34.00	252	▇▁▁▁▁
age_male_85to	0	1.00	17.39	30.33	0	0.00	7.0	23.00	375	▇▁▁▁▁
B01001_025M	0	1.00	27.28	27.71	2	14.00	19.0	29.00	344	▇▁▁▁▁
age_female_under5	0	1.00	116.04	104.48	0	44.00	90.0	163.00	1148	▇▁▁▁▁
B01001_027M	0	1.00	96.42	74.55	7	45.75	80.0	127.00	560	▇▂▁▁▁
age_female_65to66	0	1.00	40.88	41.42	0	11.00	30.0	58.00	280	▇▂▁▁▁
B01001_044M	0	1.00	42.74	36.04	2	19.00	33.0	54.00	332	▇▁▁▁▁
age_female_67to69	0	1.00	55.91	52.86	0	19.00	44.0	77.00	427	▇▂▁▁▁
B01001_045M	0	1.00	51.20	45.07	2	24.00	40.0	62.00	506	▇▁▁▁▁
age_female_70to74	0	1.00	78.57	69.25	0	33.00	63.0	109.00	690	▇▁▁▁▁
B01001_046M	0	1.00	60.17	52.41	2	32.00	50.0	73.00	664	▇▁▁▁▁
age_female_75to79	0	1.00	50.27	48.72	0	15.00	37.5	72.00	377	▇▂▁▁▁
B01001_047M	0	1.00	45.27	34.61	2	21.00	38.0	58.00	304	▇▂▁▁▁
age_female_80to84	0	1.00	30.22	38.00	0	0.00	18.0	43.00	457	▇▁▁▁▁
B01001_048M	0	1.00	34.85	31.32	3	14.00	24.0	43.00	316	▇▁▁▁▁
age_female_85to	0	1.00	30.30	42.65	0	0.00	16.0	41.25	407	▇▁▁▁▁
B01001_049M	0	1.00	35.72	35.37	2	14.00	22.0	45.00	398	▇▁▁▁▁
nodisab_under18	0	1.00	915.76	506.97	0	552.25	853.0	1215.25	3234	▅▇▃▁▁
C18108_005M	0	1.00	322.43	218.05	11	181.75	283.0	409.00	2587	▇▁▁▁▁
nodisab_18to64	0	1.00	2315.00	918.85	0	1639.50	2219.0	2831.00	6925	▂▇▃▁▁
C18108_009M	0	1.00	471.89	223.25	3	322.75	428.0	575.00	2207	▇▆▁▁▁
nodisab_over64	0	1.00	349.12	217.19	0	198.00	319.0	470.00	2041	▇▃▁▁▁
C18108_013M	0	1.00	140.60	92.45	2	87.00	120.0	169.00	1190	▇▁▁▁▁
poverty	0	1.00	428.87	394.12	0	151.00	306.5	595.25	2542	▇▂▁▁▁
B17001_002M	0	1.00	285.64	241.17	4	120.00	216.0	382.00	2166	▇▂▁▁▁
hhincome	16	0.99	98494.89	44960.63	14971	65883.25	89502.5	121433.75	250001	▃▇▃▂▁
B19013_001M	28	0.98	24411.56	15086.09	1396	13469.00	20793.0	31790.00	108195	▇▅▁▁▁
household	0	1.00	1493.31	548.61	0	1110.75	1454.0	1805.00	4472	▂▇▃▁▁
B08201_001M	0	1.00	237.04	112.90	14	156.00	220.0	293.00	856	▆▇▂▁▁
novehicle	0	1.00	86.58	112.45	0	13.00	48.0	117.25	853	▇▁▁▁▁
B08201_002M	0	1.00	70.43	67.82	4	20.00	51.0	94.00	545	▇▂▁▁▁
uninsured_under19	0	1.00	74.17	109.23	0	0.00	36.0	98.25	1265	▇▁▁▁▁
B27010_017M	0	1.00	82.83	105.70	2	14.00	50.0	104.25	1483	▇▁▁▁▁
uninsured_19to34	0	1.00	179.51	180.53	0	48.75	125.0	259.00	1416	▇▂▁▁▁
B27010_033M	0	1.00	132.88	111.98	4	53.75	102.5	182.00	928	▇▂▁▁▁
uninsured_35to64	0	1.00	225.53	195.64	0	75.00	173.0	323.50	1039	▇▃▁▁▁
B27010_050M	0	1.00	148.12	112.68	3	70.75	123.5	200.00	948	▇▂▁▁▁
uninsured_over65	0	1.00	7.01	21.72	0	0.00	0.0	0.00	242	▇▁▁▁▁
B27010_066M	0	1.00	21.75	23.96	2	14.00	14.0	20.00	287	▇▁▁▁▁
race_ethnic_total	0	1.00	4011.74	1514.92	0	2928.00	3876.5	4925.00	10640	▁▇▅▁▁
B03002_001M	0	1.00	722.21	385.99	14	476.75	651.5	872.00	5042	▇▁▁▁▁
non_hispanic_white	0	1.00	1531.63	1272.43	0	467.75	1304.5	2303.00	7192	▇▅▂▁▁
B03002_003M	0	1.00	382.82	288.58	3	194.75	338.0	502.25	4070	▇▁▁▁▁

Some tracts have 0 residents. But this might be true. When I checked Census Tract 9800, Fulton, GA, this tract is a runway of Hartsfield-Jackson Airport. Also, some of tracts do not have household income data accordingly.

It is still possible that other variables to have 0 value (population by age, vehicle availiablity, ethnicity), although the possibility is low. But I decided to include those variables for now.

Finally, I decided remove rows with 0 population, 0 households, and NA household income.

library(units)

tract_control_vars_noNA <- tract_control_vars %>%
  drop_na(population, household, hhincome) %>%
  filter(population != 0,
         household  != 0,
         hhincome   != 0) %>%
  mutate(area_mi2 = st_area(.) %>% set_units(mi^2))

# Create new variables
tract_control_vars_final <- tract_control_vars_noNA %>%
  transmute(
    GEOID,
    NAME,
    geometry,
    population_1000 = population/1000, # 1,000 people unit
    pop_den_1000 = (population/area_mi2)/1000, # 1,000 people unit
    under5_1000 = (age_male_under5 + age_female_under5)/1000, # 1,000 people unit
    over65_1000 = ((age_male_65to66 + age_male_67to69 + age_male_70to74 + age_male_75to79 + age_male_80to84 + age_male_85to) + (age_female_65to66 + age_female_67to69 + age_female_70to74 + age_female_75to79 + age_female_80to84 + age_female_85to))/1000, # 1,000 people unit
    disable_pct = ((population - nodisab_under18 - nodisab_18to64 - nodisab_over64)/population)*100,
    poverty_pct = (poverty/population)*100,
    hhincome_10000 = hhincome/10000, # $10,000 unit
    novehicle_pct = (novehicle/household)*100,
    uninsured_pct = ((uninsured_under19 + uninsured_19to34 + uninsured_35to64 + uninsured_over65)/population)*100,
    white_pct = (non_hispanic_white/race_ethnic_total)*100
    )

# Join the POI data with the ACS data
hosp_in_mpo %<>% st_transform(32616) %>% st_make_valid()
tract_control_vars_final %<>% st_transform(32616) %>% st_make_valid()
tract_hosp <- st_join(hosp_in_mpo, tract_control_vars_final)

# Ensure coordinate reference systems (CRS) are appropriate for distance calculations
crs_info <- st_crs(tract_control_vars_final)
cat("CRS: \n", "EPSG:", crs_info$epsg, "\n", "NAME:", crs_info$Name, "\n")

## CRS: 
##  EPSG: 32616 
##  NAME: WGS 84 / UTM zone 16N

The coordinate reference system (EPSG:32616, WGS 84 / UTM Zone 16N) is appropriate for this study, as it accurately covers the Atlanta region.

5. Define and compute operational measure of healthcare access

1. The distance to the nearest hospital.

The distance to the nearest hospital is a critical factor in ensuring timely medical care. Patients with severe injuries or acute conditions may experience worse outcomes if they cannot reach appropriate treatment quickly.

# Compute Centroid of each tract
tract_centroids <- st_centroid(tract_control_vars_final)

# The distance to the Nearest Hospital
nearest_hospital <- st_nearest_feature(tract_centroids, hosp_in_mpo)
dist_to_near_hosp <- st_distance(
  tract_centroids, 
  hosp_in_mpo[nearest_hospital,], 
  by_element = TRUE)

# Convert meter to km (Cause 1meter unit is too small to interpret)
dist_to_near_hosp_km <- set_units(
  dist_to_near_hosp, "km") %>% 
  drop_units()

# Combine
tract_control_vars_final <- tract_control_vars_final %>%
  mutate(dist_nearest_hosp_km = dist_to_near_hosp_km)

2. The number of hospitals within 10 miles

The number of hospitals within a 10-mile radius is another key indicator of healthcare accessibility. In cases of multiple or large-scale emergencies, having several nearby hospitals increases the overall treatment capacity. Additionally, because hospitals often differ in specialties and services, a higher number of nearby hospitals can also represent greater diversity and availability of medical care.

#The number of hospitals within 10 miles (15min by drive) from centroid of tract
buffer <- st_buffer(tract_centroids, dist=16093.4) #10mile = 16093.4m
hosp_count <- st_join(buffer,
                      hosp_in_mpo) %>%
  st_drop_geometry() %>%
  rename(GEOID = GEOID.x) %>%
  group_by(GEOID) %>%
  summarise(hosp_within_10mi = n())

# Combine
tract_control_vars_final <- tract_control_vars_final %>%
  left_join(hosp_count, by = "GEOID") %>%
  mutate(hosp_within_10mi = replace_na(hosp_within_10mi, 0L))

6. Analyze the spatial distribution of hospitals from an equity perspective

The number of hospitals within 10 miles

tm_shape(tract_control_vars_final) + tm_fill(fill = "hosp_within_10mi") + 
  tm_shape(atlanta_MPO) + tm_borders(col = "black", lwd = 2)

## Registered S3 method overwritten by 'jsonify':
##   method     from    
##   print.json jsonlite

Overall, the downtown area has better access to multiple hospitals compared to the outer parts of the Atlanta MPO region.

tract_control_vars_final %>% 
  ggplot(aes(x = hhincome_10000, y = hosp_within_10mi)) +
  geom_point(alpha = 0.5, size = 1.2) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Annual Household Income", 
       y = "The number of hospitals within 10miles from tract center", 
       title = "Household Income vs. The number of hospitals") +
  ggdark::dark_theme_gray() +
  ggpubr::stat_cor(method = "pearson", label.x = 15, label.y = 35)

## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().

## `geom_smooth()` using formula = 'y ~ x'

There appears to be a slight positive correlation between household income and the number of hospitals within 10 miles of each tract center. However, given that the relationship is relatively weak, this trend may change once additional variables are incorporated into the analysis.

Distance from tract centroid to the nearest hospital

tm_shape(tract_control_vars_final) + tm_fill(fill = "dist_nearest_hosp_km") + 
  tm_shape(atlanta_MPO) + tm_borders(col = "black", lwd = 2)

Overall, the downtown area is located closer to the nearest hospitals compared to the outskirts of the Atlanta MPO region. Residents in outlying areas generally need to drive longer distances to reach the nearest hospital. However, since this analysis includes only hospitals within the Atlanta MPO boundary, it is possible that some residents in the outskirts have access to hospitals located just outside the study area.

tract_control_vars_final %>% 
  ggplot(aes(x = hhincome_10000, y = dist_nearest_hosp_km)) +
  geom_point(alpha = 0.5, size = 1.2) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Annual Household Income", 
       y = "Distance to nearest hospital", 
       title = "Household Income vs. Distance") +
  ggdark::dark_theme_gray() +
  ggpubr::stat_cor(method = "pearson", label.x = 16, label.y = 18)

## `geom_smooth()` using formula = 'y ~ x'

There appears to be a slight positive correlation between household income and the distance to the nearest hospital. However, given that the relationship is relatively weak, this pattern may change once additional variables are included in the analysis.

7. Regression

1. The number of hospitals within 10miles from tract center

# Run the baseline model (glm, poisson)
m1 <- glm(hosp_within_10mi ~ population_1000 + pop_den_1000 + under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + novehicle_pct + uninsured_pct + white_pct,
          data=tract_control_vars_final,
          family = poisson(link = "log"))
summary(m1)

## 
## Call:
## glm(formula = hosp_within_10mi ~ population_1000 + pop_den_1000 + 
##     under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + 
##     novehicle_pct + uninsured_pct + white_pct, family = poisson(link = "log"), 
##     data = tract_control_vars_final)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      2.6571980  0.0459307  57.852  < 2e-16 ***
## population_1000 -0.1031654  0.0073274 -14.079  < 2e-16 ***
## pop_den_1000     0.0364205  0.0013247  27.493  < 2e-16 ***
## under5_1000      0.2119434  0.0533043   3.976 7.01e-05 ***
## over65_1000      0.0250713  0.0338770   0.740  0.45926    
## disable_pct     -0.0146427  0.0015056  -9.726  < 2e-16 ***
## poverty_pct      0.0070842  0.0009919   7.142 9.21e-13 ***
## hhincome_10000   0.0307579  0.0023795  12.926  < 2e-16 ***
## novehicle_pct    0.0140683  0.0010454  13.458  < 2e-16 ***
## uninsured_pct    0.0037633  0.0009123   4.125 3.70e-05 ***
## white_pct        0.0011692  0.0004025   2.904  0.00368 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7435.9  on 1231  degrees of freedom
## Residual deviance: 5374.3  on 1221  degrees of freedom
## AIC: 10922
## 
## Number of Fisher Scoring iterations: 5

dispersion <- sum(residuals(m1, type = "pearson")^2) / df.residual(m1)
dispersion

## [1] 4.057754

The dependent variable exhibits overdispersion, indicating that the initial model results may not be reliable. Therefore, I decided to use a negative binomial regression, which is a more suitable method for modeling count data that commonly display overdispersion.

library(MASS)
m_nb <- glm.nb(
  hosp_within_10mi ~ population_1000 + pop_den_1000 + under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + novehicle_pct + uninsured_pct + white_pct,
  data = tract_control_vars_final, link = log
)

summary(m_nb)

## 
## Call:
## glm.nb(formula = hosp_within_10mi ~ population_1000 + pop_den_1000 + 
##     under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + 
##     novehicle_pct + uninsured_pct + white_pct, data = tract_control_vars_final, 
##     link = log, init.theta = 4.585978058)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      2.4282721  0.1020630  23.792  < 2e-16 ***
## population_1000 -0.1054040  0.0157299  -6.701 2.07e-11 ***
## pop_den_1000     0.0735248  0.0042919  17.131  < 2e-16 ***
## under5_1000      0.1583818  0.1161037   1.364 0.172523    
## over65_1000      0.0983450  0.0717502   1.371 0.170482    
## disable_pct     -0.0119353  0.0031265  -3.817 0.000135 ***
## poverty_pct      0.0070672  0.0022707   3.112 0.001856 ** 
## hhincome_10000   0.0383790  0.0054856   6.996 2.63e-12 ***
## novehicle_pct    0.0135683  0.0025262   5.371 7.83e-08 ***
## uninsured_pct    0.0045206  0.0021073   2.145 0.031936 *  
## white_pct        0.0003326  0.0008581   0.388 0.698317    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.586) family taken to be 1)
## 
##     Null deviance: 1828.7  on 1231  degrees of freedom
## Residual deviance: 1329.3  on 1221  degrees of freedom
## AIC: 8704.4
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.586 
##           Std. Err.:  0.246 
## 
##  2 x log-likelihood:  -8680.397

For easier interpretation, I computed the incidence rate ratios (IRRs).

irr_table <- tidy(m_nb, exponentiate = TRUE, conf.int = TRUE) %>%
  dplyr::select(term, estimate, p.value) %>%
  dplyr::rename(
    IRR = estimate,
    p_value = p.value
  ) %>%
  dplyr::mutate(
    IRR = round(IRR, 3),
    p_value = round(p_value, 3)
  )

irr_table

## # A tibble: 11 × 3
##    term               IRR p_value
##    <chr>            <dbl>   <dbl>
##  1 (Intercept)     11.3     0    
##  2 population_1000  0.9     0    
##  3 pop_den_1000     1.08    0    
##  4 under5_1000      1.17    0.173
##  5 over65_1000      1.10    0.17 
##  6 disable_pct      0.988   0    
##  7 poverty_pct      1.01    0.002
##  8 hhincome_10000   1.04    0    
##  9 novehicle_pct    1.01    0    
## 10 uninsured_pct    1.00    0.032
## 11 white_pct        1       0.698

2. The distance from tract center to the nearest hospital

# Run the baseline model (ols)
m2 <- lm(dist_nearest_hosp_km ~ population_1000 + pop_den_1000 + under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + novehicle_pct + uninsured_pct + white_pct,
          data=tract_control_vars_final)

# Check multicollinearity
library(car)
vif(m2)

## population_1000    pop_den_1000     under5_1000     over65_1000     disable_pct 
##        2.392571        1.216188        1.736752        1.938854        1.486970 
##     poverty_pct  hhincome_10000   novehicle_pct   uninsured_pct       white_pct 
##        2.093447        2.731236        1.755614        1.665707        2.215819

There is no evidence of significant multicollinearity. Therefore, I proceeded to interpret the results.

summary(m2)

## 
## Call:
## lm(formula = dist_nearest_hosp_km ~ population_1000 + pop_den_1000 + 
##     under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + 
##     novehicle_pct + uninsured_pct + white_pct, data = tract_control_vars_final)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8547 -1.7864 -0.4893  1.1895 14.3458 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      4.639426   0.511600   9.068  < 2e-16 ***
## population_1000  0.282976   0.078077   3.624 0.000302 ***
## pop_den_1000    -0.249536   0.023145 -10.781  < 2e-16 ***
## under5_1000      0.232153   0.578846   0.401 0.688445    
## over65_1000     -0.270603   0.355258  -0.762 0.446381    
## disable_pct      0.022933   0.015341   1.495 0.135189    
## poverty_pct     -0.012732   0.011484  -1.109 0.267788    
## hhincome_10000  -0.062405   0.027801  -2.245 0.024964 *  
## novehicle_pct   -0.038982   0.012952  -3.010 0.002669 ** 
## uninsured_pct   -0.016189   0.010699  -1.513 0.130516    
## white_pct        0.004915   0.004248   1.157 0.247521    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.654 on 1221 degrees of freedom
## Multiple R-squared:  0.1647, Adjusted R-squared:  0.1579 
## F-statistic: 24.08 on 10 and 1221 DF,  p-value: < 2.2e-16

8. Result

1. Negative Binomial Regression (The number of hospitals within 10miles of each census tract)

The negative binomial regression results for the number of hospitals within 10 miles of each census tract centroid in the 11-county Atlanta metropolitan area revealed the following associations:

Positively associated

Total population size (population) was negatively associated with hospital counts, with an additional 1,000 residents linked to an estimated 10% decrease (IRR≈0.90, p<0.001). This likely reflects the effect of tract area: larger, lower-density tracts on the metropolitan periphery tend to have fewer hospitals within a 10-mile radius despite having larger populations.
Population density (pop_den) was strongly and positively associated with hospital availability: an increase of 1,000 persons per square mile corresponded to an estimated 7.6% increase in the number of nearby hospitals (IRR≈1.076, p<0.001).
Household income (hhincome) also had a positive effect: a $10,000 increase in median household income was associated with an estimated 3.9% increase in hospitals (IRR≈1.039, p<0.001).
Households without a vehicle (novehicle_pct) showed an association: a 1 percentage-point increase in non-vehicle households was linked to an estimated 1.3% increase in hospital counts (IRR≈1.013, p<0.001).
Poverty rate (poverty_pct) and uninsured rate (uninsured_pct) were also positively associated: a 1 percentage-point increase in each was linked to an estimated 0.7% (IRR≈1.007, p=0.03) and 0.4% increase (IRR≈1.004, p=0.032) in nearby hospitals, respectively.

Negatively associated

Disability prevalence (disable_pct) was negatively associated with hospital availability: a 1 percentage-point increase in the disabled population share was associated with an 1.2% decrease in hospital counts (IRR≈0.988, p<0.001).

Not significant

Neither the population under age 5 (under5) nor the population aged 65 and over (over65) showed statistically significant associations with hospital counts (p>0.17).
Percentage of white population does not show statistically significant associations with hospital counts (p>0.69)

2. Ordinary Least Squares Regression (Distance to the nearest hospital)

The OLS regression results for the distance (in kilometers) from each census tract centroid to the nearest hospital in the 11-county Atlanta metropolitan area revealed the following associations:

Positively associated (greater distance to hospitals)

Total population (population_1000) was positively associated with the distance to the nearest hospital. Each additional 1,000 residents in a tract was linked to an estimated 0.28 km increase in distance (p < 0.001).

Negatively associated (shorter distance to hospitals)

Population density (pop_den_1000) was strongly and negatively associated with hospital distance: each increase of 1,000 persons per square mile was associated with an estimated 0.25 km decrease in distance (p < 0.001).
Household income (hhincome_10000) was also negatively associated: a $10,000 increase in median household income corresponded to an estimated 0.06 km shorter distance to the nearest hospital (p = 0.025).
Households without a vehicle (novehicle_pct) showed a significant negative association: each 1 percentage-point increase in non-vehicle households was linked to an estimated 0.039 km reduction in distance (p = 0.003).

Not significant

The proportions of young children (under5_1000) and elderly residents (over65_1000) were not statistically significant (p > 0.44).
Disability rate (disable_pct), poverty rate (poverty_pct), uninsured rate (uninsured_pct), and racial composition (white_pct) also showed no significant effects (p > 0.10).

Conclusion

The analysis reveals spatial inequities in hospital accessibility across the 11-county Atlanta metropolitan region.

In the Negative Binomial Regression, hospital availability—measured as the number of hospitals within 10 miles—was significantly greater in census tracts with larger populations, higher population density, and higher median household income. In contrast, tracts with higher disability rates had fewer hospitals within a 10-mile radius, indicating that populations with greater healthcare needs may be underserved in terms of facility access. The percentage of households without a vehicle showed a positive but modest association, while poverty and uninsured rates were also positively associated, though their effects were relatively small.

The OLS regression on distance to the nearest hospital reveals evidence of inequity in spatial accessibility. Census tracts with higher household income and greater population density are situated significantly closer to hospitals. The percentage of households without a vehicle exhibited a negative but modest association with distance, which may suggest that hospitals tend to be located in areas where residents are less dependent on private vehicles; however, this relationship remains uncertain. On the other hand, demographic indicators of healthcare need, such as the shares of young children, elderly residents, and individuals with disabilities, were not significantly related to hospital proximity. This suggests a mismatch between where hospitals are located and where medical demand is likely to be greatest.

Taken together, these findings suggest that the current spatial pattern of hospital locations in the Atlanta MPO region reflects market-oriented and urban-centric placement, rather than equitable access based on community health needs.