Import Libraries
library(tidycensus)
library(sf)
library(tmap)
library(tidyverse)
library(here)
library(knitr)
library(kableExtra)
library(glue)
library(tigris)
library(skimr)
library(broom)
#Import Data
hospital <- st_read("https://raw.githubusercontent.com/ujhwang/urban-analytics-2025/main/Assignment/mini_3/hospital_11counties.geojson", quiet = TRUE)
#Check Data
skim(hospital)
Name | hospital |
Number of rows | 119 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
id | 0 | 1 | 27 | 27 | 0 | 119 | 0 |
name | 0 | 1 | 11 | 72 | 0 | 104 | 0 |
address | 0 | 1 | 25 | 66 | 0 | 119 | 0 |
primary_type | 0 | 1 | 8 | 8 | 0 | 1 | 0 |
types | 0 | 1 | 47 | 96 | 0 | 12 | 0 |
status | 0 | 1 | 11 | 11 | 0 | 1 | 0 |
geometry | 0 | 1 | 24 | 26 | 0 | 119 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
rating | 0 | 1 | 4.04 | 0.86 | 2 | 3.35 | 4.3 | 4.8 | 5 | ▂▂▂▃▇ |
rating_count | 0 | 1 | 633.93 | 783.33 | 50 | 91.00 | 285.0 | 838.0 | 3882 | ▇▂▁▁▁ |
Some rows share the same “name” (e.g., Resurgens Orthopaedics), but this appears to be coincidental. Given that there are 119 total rows and 119 unique addresses and geometries, the dataset seems well-structured with no duplicated records.
#Download Tract Boundaries
atlanta_MPO <- counties("GA", progress_bar = F) %>%
filter(NAME %in% c("Cherokee", "Clayton", "Cobb", "DeKalb", "Douglas", "Fayette", "Forsyth", "Fulton", "Gwinnett", "Henry", "Rockdale"))
#Check Distribution of hospitals
tmap_mode("view")
tm_shape(hospital) + tm_dots(fill="red") +
tm_shape(atlanta_MPO) + tm_borders(col = "black", lwd = 2)
From the plot above, there is one hospital located in Newton County which lies outside our research area but adjacent to Rockdale County. I decided to remove this hospital (Resurgens Orthopaedics). To be thorough, I attempts to exclude all hospitals located outside our defined research area.
#Spatial Join
hospital %<>% st_transform(32616) %>% st_make_valid()
atlanta_MPO %<>% st_transform(32616) %>% st_make_valid()
hosp_in_mpo <- st_join(hospital, atlanta_MPO, join = st_within, left = FALSE)
#Check Distribution of hospitals Again
tmap_mode("view")
tm_shape(hosp_in_mpo) + tm_dots(fill="red") +
tm_shape(atlanta_MPO) + tm_borders(col = "black", lwd = 2)
Now it looks good. I will continue to analyze.
Hospitals often locate in areas with larger populations to maximize potential patient volume and profit. Including total population helps assess whether hospital distribution aligns with population size or reflects inequitable access. #### 2. Population Density Population density may be an even stronger determinant than total population, as hospitals tend to cluster in densely populated urban areas. This variable captures spatial concentration of demand for healthcare services. #### 3. Population Aged 65+ and Under 5 Older adults and young children typically require more frequent medical care due to age-related health vulnerabilities. These age groups are important indicators of healthcare needs within a community. #### 4. Percentage of People with Disabilities Individuals with disabilities often face greater healthcare needs and accessibility challenges. Including this variable allows evaluation of whether hospital locations adequately serve populations with higher medical dependency. #### 5. Percentage Below Poverty Line Low-income populations may have limited access to healthcare, and hospitals might avoid locating in such areas due to lower profitability. This variable helps assess economic inequities in hospital accessibility. #### 6. Median Household Income Household income reflects overall economic capacity in an area. Higher-income regions may attract more healthcare facilities seeking financially stable patient bases, potentially creating inequities in service distribution. #### 7. Percentage of Households with No Vehicle Available Lack of vehicle ownership limits mobility and affects the ability to reach healthcare facilities. Including this variable helps identify areas where proximity to hospitals is especially critical for equitable access. #### 8. Percentage of Uninsured Population Regions with higher uninsured rates may be underserved by hospitals due to lower reimbursement potential. This variable captures a structural factor contributing to healthcare access disparities. #### 9. Percentage of Non-Hispanic White Population Racial composition can reveal potential inequities in healthcare infrastructure. Examining this variable helps determine whether hospital distribution disproportionately favors or disadvantages certain racial or ethnic groups.
## Collect data for the control variables
tidycensus::census_api_key(Sys.getenv("CENSUS_API_KEY"))
tract_control_vars <- suppressMessages(
get_acs(geography = "tract",
state = "GA",
county = c("Cherokee", "Clayton", "Cobb", "DeKalb", "Douglas", "Fayette", "Forsyth", "Fulton", "Gwinnett", "Henry", "Rockdale"),
variables = c(population = "B01003_001E", #Population
age_male_under5 = "B01001_003E", #Population Under 5
age_male_65to66 = "B01001_020E", #Population Aged 65+
age_male_67to69 = "B01001_021E", #Population Aged 65+
age_male_70to74 = "B01001_022E", #Population Aged 65+
age_male_75to79 = "B01001_023E", #Population Aged 65+
age_male_80to84 = "B01001_024E", #Population Aged 65+
age_male_85to = "B01001_025E", #Population Aged 65+
age_female_under5 = "B01001_027E", #Population Under 5
age_female_65to66 = "B01001_044E", #Population Aged 65+
age_female_67to69 = "B01001_045E", #Population Aged 65+
age_female_70to74 = "B01001_046E", #Population Aged 65+
age_female_75to79 = "B01001_047E", #Population Aged 65+
age_female_80to84 = "B01001_048E", #Population Aged 65+
age_female_85to = "B01001_049E", #Population Aged 65+
nodisab_under18 = "C18108_005E", #Number of Disabilities
nodisab_18to64 = "C18108_009E", #Number of Disabilities
nodisab_over64 = "C18108_013E", #Number of Disabilities
poverty = "B17001_002E", #Poverty Status in the Past 12 Months by Sex by Age
hhincome = "B19013_001E", #Median Household Income in the Past 12 Months (in 2023 Inflation-Adjusted Dollars)
household = "B08201_001E", #The number of Households
novehicle = "B08201_002E", #Household Size by Vehicles Available (No Vehicle)
uninsured_under19 = "B27010_017E", #Types of Health Insurance Coverage by Age
uninsured_19to34 = "B27010_033E", #Types of Health Insurance Coverage by Age
uninsured_35to64 = "B27010_050E", #Types of Health Insurance Coverage by Age
uninsured_over65 = "B27010_066E", #Types of Health Insurance Coverage by Age
race_ethnic_total = "B03002_001E", #Ethnicity
non_hispanic_white = "B03002_003E" #Ethnicity
),
year = 2023,
survey = "acs5",
geometry = TRUE,
output = "wide"))
# Check for missing values and fill them
skim(tract_control_vars)
Name | tract_control_vars |
Number of rows | 1248 |
Number of columns | 59 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 56 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
GEOID | 0 | 1 | 11 | 11 | 0 | 1248 | 0 |
NAME | 0 | 1 | 38 | 45 | 0 | 1248 | 0 |
geometry | 0 | 1 | 169 | 3960 | 0 | 1248 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
population | 0 | 1.00 | 4011.74 | 1514.92 | 0 | 2928.00 | 3876.5 | 4925.00 | 10640 | ▁▇▅▁▁ |
B01003_001M | 0 | 1.00 | 722.21 | 385.99 | 14 | 476.75 | 651.5 | 872.00 | 5042 | ▇▁▁▁▁ |
age_male_under5 | 0 | 1.00 | 120.52 | 106.13 | 0 | 44.75 | 95.0 | 167.00 | 1088 | ▇▂▁▁▁ |
B01001_003M | 0 | 1.00 | 98.48 | 76.98 | 5 | 46.00 | 82.0 | 129.00 | 839 | ▇▁▁▁▁ |
age_male_65to66 | 0 | 1.00 | 35.75 | 40.64 | 0 | 8.00 | 26.0 | 50.00 | 432 | ▇▁▁▁▁ |
B01001_020M | 0 | 1.00 | 40.48 | 37.08 | 3 | 16.00 | 31.0 | 50.00 | 529 | ▇▁▁▁▁ |
age_male_67to69 | 0 | 1.00 | 45.53 | 42.99 | 0 | 14.00 | 37.0 | 64.25 | 318 | ▇▂▁▁▁ |
B01001_021M | 0 | 1.00 | 45.03 | 35.28 | 2 | 21.00 | 37.0 | 57.00 | 345 | ▇▁▁▁▁ |
age_male_70to74 | 0 | 1.00 | 62.59 | 56.20 | 0 | 22.00 | 50.0 | 89.00 | 553 | ▇▂▁▁▁ |
B01001_022M | 0 | 1.00 | 53.10 | 43.07 | 2 | 25.00 | 44.0 | 66.25 | 543 | ▇▁▁▁▁ |
age_male_75to79 | 0 | 1.00 | 38.03 | 42.25 | 0 | 9.00 | 28.0 | 55.00 | 487 | ▇▁▁▁▁ |
B01001_023M | 0 | 1.00 | 38.03 | 30.21 | 2 | 16.00 | 31.0 | 48.00 | 297 | ▇▁▁▁▁ |
age_male_80to84 | 0 | 1.00 | 20.87 | 29.96 | 0 | 0.00 | 11.5 | 29.25 | 305 | ▇▁▁▁▁ |
B01001_024M | 0 | 1.00 | 28.34 | 25.21 | 2 | 14.00 | 20.0 | 34.00 | 252 | ▇▁▁▁▁ |
age_male_85to | 0 | 1.00 | 17.39 | 30.33 | 0 | 0.00 | 7.0 | 23.00 | 375 | ▇▁▁▁▁ |
B01001_025M | 0 | 1.00 | 27.28 | 27.71 | 2 | 14.00 | 19.0 | 29.00 | 344 | ▇▁▁▁▁ |
age_female_under5 | 0 | 1.00 | 116.04 | 104.48 | 0 | 44.00 | 90.0 | 163.00 | 1148 | ▇▁▁▁▁ |
B01001_027M | 0 | 1.00 | 96.42 | 74.55 | 7 | 45.75 | 80.0 | 127.00 | 560 | ▇▂▁▁▁ |
age_female_65to66 | 0 | 1.00 | 40.88 | 41.42 | 0 | 11.00 | 30.0 | 58.00 | 280 | ▇▂▁▁▁ |
B01001_044M | 0 | 1.00 | 42.74 | 36.04 | 2 | 19.00 | 33.0 | 54.00 | 332 | ▇▁▁▁▁ |
age_female_67to69 | 0 | 1.00 | 55.91 | 52.86 | 0 | 19.00 | 44.0 | 77.00 | 427 | ▇▂▁▁▁ |
B01001_045M | 0 | 1.00 | 51.20 | 45.07 | 2 | 24.00 | 40.0 | 62.00 | 506 | ▇▁▁▁▁ |
age_female_70to74 | 0 | 1.00 | 78.57 | 69.25 | 0 | 33.00 | 63.0 | 109.00 | 690 | ▇▁▁▁▁ |
B01001_046M | 0 | 1.00 | 60.17 | 52.41 | 2 | 32.00 | 50.0 | 73.00 | 664 | ▇▁▁▁▁ |
age_female_75to79 | 0 | 1.00 | 50.27 | 48.72 | 0 | 15.00 | 37.5 | 72.00 | 377 | ▇▂▁▁▁ |
B01001_047M | 0 | 1.00 | 45.27 | 34.61 | 2 | 21.00 | 38.0 | 58.00 | 304 | ▇▂▁▁▁ |
age_female_80to84 | 0 | 1.00 | 30.22 | 38.00 | 0 | 0.00 | 18.0 | 43.00 | 457 | ▇▁▁▁▁ |
B01001_048M | 0 | 1.00 | 34.85 | 31.32 | 3 | 14.00 | 24.0 | 43.00 | 316 | ▇▁▁▁▁ |
age_female_85to | 0 | 1.00 | 30.30 | 42.65 | 0 | 0.00 | 16.0 | 41.25 | 407 | ▇▁▁▁▁ |
B01001_049M | 0 | 1.00 | 35.72 | 35.37 | 2 | 14.00 | 22.0 | 45.00 | 398 | ▇▁▁▁▁ |
nodisab_under18 | 0 | 1.00 | 915.76 | 506.97 | 0 | 552.25 | 853.0 | 1215.25 | 3234 | ▅▇▃▁▁ |
C18108_005M | 0 | 1.00 | 322.43 | 218.05 | 11 | 181.75 | 283.0 | 409.00 | 2587 | ▇▁▁▁▁ |
nodisab_18to64 | 0 | 1.00 | 2315.00 | 918.85 | 0 | 1639.50 | 2219.0 | 2831.00 | 6925 | ▂▇▃▁▁ |
C18108_009M | 0 | 1.00 | 471.89 | 223.25 | 3 | 322.75 | 428.0 | 575.00 | 2207 | ▇▆▁▁▁ |
nodisab_over64 | 0 | 1.00 | 349.12 | 217.19 | 0 | 198.00 | 319.0 | 470.00 | 2041 | ▇▃▁▁▁ |
C18108_013M | 0 | 1.00 | 140.60 | 92.45 | 2 | 87.00 | 120.0 | 169.00 | 1190 | ▇▁▁▁▁ |
poverty | 0 | 1.00 | 428.87 | 394.12 | 0 | 151.00 | 306.5 | 595.25 | 2542 | ▇▂▁▁▁ |
B17001_002M | 0 | 1.00 | 285.64 | 241.17 | 4 | 120.00 | 216.0 | 382.00 | 2166 | ▇▂▁▁▁ |
hhincome | 16 | 0.99 | 98494.89 | 44960.63 | 14971 | 65883.25 | 89502.5 | 121433.75 | 250001 | ▃▇▃▂▁ |
B19013_001M | 28 | 0.98 | 24411.56 | 15086.09 | 1396 | 13469.00 | 20793.0 | 31790.00 | 108195 | ▇▅▁▁▁ |
household | 0 | 1.00 | 1493.31 | 548.61 | 0 | 1110.75 | 1454.0 | 1805.00 | 4472 | ▂▇▃▁▁ |
B08201_001M | 0 | 1.00 | 237.04 | 112.90 | 14 | 156.00 | 220.0 | 293.00 | 856 | ▆▇▂▁▁ |
novehicle | 0 | 1.00 | 86.58 | 112.45 | 0 | 13.00 | 48.0 | 117.25 | 853 | ▇▁▁▁▁ |
B08201_002M | 0 | 1.00 | 70.43 | 67.82 | 4 | 20.00 | 51.0 | 94.00 | 545 | ▇▂▁▁▁ |
uninsured_under19 | 0 | 1.00 | 74.17 | 109.23 | 0 | 0.00 | 36.0 | 98.25 | 1265 | ▇▁▁▁▁ |
B27010_017M | 0 | 1.00 | 82.83 | 105.70 | 2 | 14.00 | 50.0 | 104.25 | 1483 | ▇▁▁▁▁ |
uninsured_19to34 | 0 | 1.00 | 179.51 | 180.53 | 0 | 48.75 | 125.0 | 259.00 | 1416 | ▇▂▁▁▁ |
B27010_033M | 0 | 1.00 | 132.88 | 111.98 | 4 | 53.75 | 102.5 | 182.00 | 928 | ▇▂▁▁▁ |
uninsured_35to64 | 0 | 1.00 | 225.53 | 195.64 | 0 | 75.00 | 173.0 | 323.50 | 1039 | ▇▃▁▁▁ |
B27010_050M | 0 | 1.00 | 148.12 | 112.68 | 3 | 70.75 | 123.5 | 200.00 | 948 | ▇▂▁▁▁ |
uninsured_over65 | 0 | 1.00 | 7.01 | 21.72 | 0 | 0.00 | 0.0 | 0.00 | 242 | ▇▁▁▁▁ |
B27010_066M | 0 | 1.00 | 21.75 | 23.96 | 2 | 14.00 | 14.0 | 20.00 | 287 | ▇▁▁▁▁ |
race_ethnic_total | 0 | 1.00 | 4011.74 | 1514.92 | 0 | 2928.00 | 3876.5 | 4925.00 | 10640 | ▁▇▅▁▁ |
B03002_001M | 0 | 1.00 | 722.21 | 385.99 | 14 | 476.75 | 651.5 | 872.00 | 5042 | ▇▁▁▁▁ |
non_hispanic_white | 0 | 1.00 | 1531.63 | 1272.43 | 0 | 467.75 | 1304.5 | 2303.00 | 7192 | ▇▅▂▁▁ |
B03002_003M | 0 | 1.00 | 382.82 | 288.58 | 3 | 194.75 | 338.0 | 502.25 | 4070 | ▇▁▁▁▁ |
Some tracts have 0 residents. But this might be true. When I checked Census Tract 9800, Fulton, GA, this tract is a runway of Hartsfield-Jackson Airport. Also, some of tracts do not have household income data accordingly.
It is still possible that other variables to have 0 value (population by age, vehicle availiablity, ethnicity), although the possibility is low. But I decided to include those variables for now.
Finally, I decided remove rows with 0 population, 0 households, and NA household income.
library(units)
tract_control_vars_noNA <- tract_control_vars %>%
drop_na(population, household, hhincome) %>%
filter(population != 0,
household != 0,
hhincome != 0) %>%
mutate(area_mi2 = st_area(.) %>% set_units(mi^2))
# Create new variables
tract_control_vars_final <- tract_control_vars_noNA %>%
transmute(
GEOID,
NAME,
geometry,
population_1000 = population/1000, # 1,000 people unit
pop_den_1000 = (population/area_mi2)/1000, # 1,000 people unit
under5_1000 = (age_male_under5 + age_female_under5)/1000, # 1,000 people unit
over65_1000 = ((age_male_65to66 + age_male_67to69 + age_male_70to74 + age_male_75to79 + age_male_80to84 + age_male_85to) + (age_female_65to66 + age_female_67to69 + age_female_70to74 + age_female_75to79 + age_female_80to84 + age_female_85to))/1000, # 1,000 people unit
disable_pct = ((population - nodisab_under18 - nodisab_18to64 - nodisab_over64)/population)*100,
poverty_pct = (poverty/population)*100,
hhincome_10000 = hhincome/10000, # $10,000 unit
novehicle_pct = (novehicle/household)*100,
uninsured_pct = ((uninsured_under19 + uninsured_19to34 + uninsured_35to64 + uninsured_over65)/population)*100,
white_pct = (non_hispanic_white/race_ethnic_total)*100
)
# Join the POI data with the ACS data
hosp_in_mpo %<>% st_transform(32616) %>% st_make_valid()
tract_control_vars_final %<>% st_transform(32616) %>% st_make_valid()
tract_hosp <- st_join(hosp_in_mpo, tract_control_vars_final)
# Ensure coordinate reference systems (CRS) are appropriate for distance calculations
crs_info <- st_crs(tract_control_vars_final)
cat("CRS: \n", "EPSG:", crs_info$epsg, "\n", "NAME:", crs_info$Name, "\n")
## CRS:
## EPSG: 32616
## NAME: WGS 84 / UTM zone 16N
The coordinate reference system (EPSG:32616, WGS 84 / UTM Zone 16N) is appropriate for this study, as it accurately covers the Atlanta region.
The distance to the nearest hospital is a critical factor in ensuring timely medical care. Patients with severe injuries or acute conditions may experience worse outcomes if they cannot reach appropriate treatment quickly.
# Compute Centroid of each tract
tract_centroids <- st_centroid(tract_control_vars_final)
# The distance to the Nearest Hospital
nearest_hospital <- st_nearest_feature(tract_centroids, hosp_in_mpo)
dist_to_near_hosp <- st_distance(
tract_centroids,
hosp_in_mpo[nearest_hospital,],
by_element = TRUE)
# Convert meter to km (Cause 1meter unit is too small to interpret)
dist_to_near_hosp_km <- set_units(
dist_to_near_hosp, "km") %>%
drop_units()
# Combine
tract_control_vars_final <- tract_control_vars_final %>%
mutate(dist_nearest_hosp_km = dist_to_near_hosp_km)
The number of hospitals within a 10-mile radius is another key indicator of healthcare accessibility. In cases of multiple or large-scale emergencies, having several nearby hospitals increases the overall treatment capacity. Additionally, because hospitals often differ in specialties and services, a higher number of nearby hospitals can also represent greater diversity and availability of medical care.
#The number of hospitals within 10 miles (15min by drive) from centroid of tract
buffer <- st_buffer(tract_centroids, dist=16093.4) #10mile = 16093.4m
hosp_count <- st_join(buffer,
hosp_in_mpo) %>%
st_drop_geometry() %>%
rename(GEOID = GEOID.x) %>%
group_by(GEOID) %>%
summarise(hosp_within_10mi = n())
# Combine
tract_control_vars_final <- tract_control_vars_final %>%
left_join(hosp_count, by = "GEOID") %>%
mutate(hosp_within_10mi = replace_na(hosp_within_10mi, 0L))
tm_shape(tract_control_vars_final) + tm_fill(fill = "hosp_within_10mi") +
tm_shape(atlanta_MPO) + tm_borders(col = "black", lwd = 2)
## Registered S3 method overwritten by 'jsonify':
## method from
## print.json jsonlite
Overall, the downtown area has better access to multiple hospitals compared to the outer parts of the Atlanta MPO region.
tract_control_vars_final %>%
ggplot(aes(x = hhincome_10000, y = hosp_within_10mi)) +
geom_point(alpha = 0.5, size = 1.2) +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Annual Household Income",
y = "The number of hospitals within 10miles from tract center",
title = "Household Income vs. The number of hospitals") +
ggdark::dark_theme_gray() +
ggpubr::stat_cor(method = "pearson", label.x = 15, label.y = 35)
## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().
## `geom_smooth()` using formula = 'y ~ x'
There appears to be a slight positive correlation between household income and the number of hospitals within 10 miles of each tract center. However, given that the relationship is relatively weak, this trend may change once additional variables are incorporated into the analysis.
tm_shape(tract_control_vars_final) + tm_fill(fill = "dist_nearest_hosp_km") +
tm_shape(atlanta_MPO) + tm_borders(col = "black", lwd = 2)
Overall, the downtown area is located closer to the nearest hospitals compared to the outskirts of the Atlanta MPO region. Residents in outlying areas generally need to drive longer distances to reach the nearest hospital. However, since this analysis includes only hospitals within the Atlanta MPO boundary, it is possible that some residents in the outskirts have access to hospitals located just outside the study area.
tract_control_vars_final %>%
ggplot(aes(x = hhincome_10000, y = dist_nearest_hosp_km)) +
geom_point(alpha = 0.5, size = 1.2) +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Annual Household Income",
y = "Distance to nearest hospital",
title = "Household Income vs. Distance") +
ggdark::dark_theme_gray() +
ggpubr::stat_cor(method = "pearson", label.x = 16, label.y = 18)
## `geom_smooth()` using formula = 'y ~ x'
There appears to be a slight positive correlation between household income and the distance to the nearest hospital. However, given that the relationship is relatively weak, this pattern may change once additional variables are included in the analysis.
# Run the baseline model (glm, poisson)
m1 <- glm(hosp_within_10mi ~ population_1000 + pop_den_1000 + under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + novehicle_pct + uninsured_pct + white_pct,
data=tract_control_vars_final,
family = poisson(link = "log"))
summary(m1)
##
## Call:
## glm(formula = hosp_within_10mi ~ population_1000 + pop_den_1000 +
## under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 +
## novehicle_pct + uninsured_pct + white_pct, family = poisson(link = "log"),
## data = tract_control_vars_final)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.6571980 0.0459307 57.852 < 2e-16 ***
## population_1000 -0.1031654 0.0073274 -14.079 < 2e-16 ***
## pop_den_1000 0.0364205 0.0013247 27.493 < 2e-16 ***
## under5_1000 0.2119434 0.0533043 3.976 7.01e-05 ***
## over65_1000 0.0250713 0.0338770 0.740 0.45926
## disable_pct -0.0146427 0.0015056 -9.726 < 2e-16 ***
## poverty_pct 0.0070842 0.0009919 7.142 9.21e-13 ***
## hhincome_10000 0.0307579 0.0023795 12.926 < 2e-16 ***
## novehicle_pct 0.0140683 0.0010454 13.458 < 2e-16 ***
## uninsured_pct 0.0037633 0.0009123 4.125 3.70e-05 ***
## white_pct 0.0011692 0.0004025 2.904 0.00368 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 7435.9 on 1231 degrees of freedom
## Residual deviance: 5374.3 on 1221 degrees of freedom
## AIC: 10922
##
## Number of Fisher Scoring iterations: 5
dispersion <- sum(residuals(m1, type = "pearson")^2) / df.residual(m1)
dispersion
## [1] 4.057754
The dependent variable exhibits overdispersion, indicating that the initial model results may not be reliable. Therefore, I decided to use a negative binomial regression, which is a more suitable method for modeling count data that commonly display overdispersion.
library(MASS)
m_nb <- glm.nb(
hosp_within_10mi ~ population_1000 + pop_den_1000 + under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + novehicle_pct + uninsured_pct + white_pct,
data = tract_control_vars_final, link = log
)
summary(m_nb)
##
## Call:
## glm.nb(formula = hosp_within_10mi ~ population_1000 + pop_den_1000 +
## under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 +
## novehicle_pct + uninsured_pct + white_pct, data = tract_control_vars_final,
## link = log, init.theta = 4.585978058)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.4282721 0.1020630 23.792 < 2e-16 ***
## population_1000 -0.1054040 0.0157299 -6.701 2.07e-11 ***
## pop_den_1000 0.0735248 0.0042919 17.131 < 2e-16 ***
## under5_1000 0.1583818 0.1161037 1.364 0.172523
## over65_1000 0.0983450 0.0717502 1.371 0.170482
## disable_pct -0.0119353 0.0031265 -3.817 0.000135 ***
## poverty_pct 0.0070672 0.0022707 3.112 0.001856 **
## hhincome_10000 0.0383790 0.0054856 6.996 2.63e-12 ***
## novehicle_pct 0.0135683 0.0025262 5.371 7.83e-08 ***
## uninsured_pct 0.0045206 0.0021073 2.145 0.031936 *
## white_pct 0.0003326 0.0008581 0.388 0.698317
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.586) family taken to be 1)
##
## Null deviance: 1828.7 on 1231 degrees of freedom
## Residual deviance: 1329.3 on 1221 degrees of freedom
## AIC: 8704.4
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.586
## Std. Err.: 0.246
##
## 2 x log-likelihood: -8680.397
For easier interpretation, I computed the incidence rate ratios (IRRs).
irr_table <- tidy(m_nb, exponentiate = TRUE, conf.int = TRUE) %>%
dplyr::select(term, estimate, p.value) %>%
dplyr::rename(
IRR = estimate,
p_value = p.value
) %>%
dplyr::mutate(
IRR = round(IRR, 3),
p_value = round(p_value, 3)
)
irr_table
## # A tibble: 11 × 3
## term IRR p_value
## <chr> <dbl> <dbl>
## 1 (Intercept) 11.3 0
## 2 population_1000 0.9 0
## 3 pop_den_1000 1.08 0
## 4 under5_1000 1.17 0.173
## 5 over65_1000 1.10 0.17
## 6 disable_pct 0.988 0
## 7 poverty_pct 1.01 0.002
## 8 hhincome_10000 1.04 0
## 9 novehicle_pct 1.01 0
## 10 uninsured_pct 1.00 0.032
## 11 white_pct 1 0.698
# Run the baseline model (ols)
m2 <- lm(dist_nearest_hosp_km ~ population_1000 + pop_den_1000 + under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 + novehicle_pct + uninsured_pct + white_pct,
data=tract_control_vars_final)
# Check multicollinearity
library(car)
vif(m2)
## population_1000 pop_den_1000 under5_1000 over65_1000 disable_pct
## 2.392571 1.216188 1.736752 1.938854 1.486970
## poverty_pct hhincome_10000 novehicle_pct uninsured_pct white_pct
## 2.093447 2.731236 1.755614 1.665707 2.215819
There is no evidence of significant multicollinearity. Therefore, I proceeded to interpret the results.
summary(m2)
##
## Call:
## lm(formula = dist_nearest_hosp_km ~ population_1000 + pop_den_1000 +
## under5_1000 + over65_1000 + disable_pct + poverty_pct + hhincome_10000 +
## novehicle_pct + uninsured_pct + white_pct, data = tract_control_vars_final)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8547 -1.7864 -0.4893 1.1895 14.3458
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.639426 0.511600 9.068 < 2e-16 ***
## population_1000 0.282976 0.078077 3.624 0.000302 ***
## pop_den_1000 -0.249536 0.023145 -10.781 < 2e-16 ***
## under5_1000 0.232153 0.578846 0.401 0.688445
## over65_1000 -0.270603 0.355258 -0.762 0.446381
## disable_pct 0.022933 0.015341 1.495 0.135189
## poverty_pct -0.012732 0.011484 -1.109 0.267788
## hhincome_10000 -0.062405 0.027801 -2.245 0.024964 *
## novehicle_pct -0.038982 0.012952 -3.010 0.002669 **
## uninsured_pct -0.016189 0.010699 -1.513 0.130516
## white_pct 0.004915 0.004248 1.157 0.247521
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.654 on 1221 degrees of freedom
## Multiple R-squared: 0.1647, Adjusted R-squared: 0.1579
## F-statistic: 24.08 on 10 and 1221 DF, p-value: < 2.2e-16
The negative binomial regression results for the number of hospitals within 10 miles of each census tract centroid in the 11-county Atlanta metropolitan area revealed the following associations:
The OLS regression results for the distance (in kilometers) from each census tract centroid to the nearest hospital in the 11-county Atlanta metropolitan area revealed the following associations:
The analysis reveals spatial inequities in hospital accessibility across the 11-county Atlanta metropolitan region.
In the Negative Binomial Regression, hospital availability—measured as the number of hospitals within 10 miles—was significantly greater in census tracts with larger populations, higher population density, and higher median household income. In contrast, tracts with higher disability rates had fewer hospitals within a 10-mile radius, indicating that populations with greater healthcare needs may be underserved in terms of facility access. The percentage of households without a vehicle showed a positive but modest association, while poverty and uninsured rates were also positively associated, though their effects were relatively small.
The OLS regression on distance to the nearest hospital reveals evidence of inequity in spatial accessibility. Census tracts with higher household income and greater population density are situated significantly closer to hospitals. The percentage of households without a vehicle exhibited a negative but modest association with distance, which may suggest that hospitals tend to be located in areas where residents are less dependent on private vehicles; however, this relationship remains uncertain. On the other hand, demographic indicators of healthcare need, such as the shares of young children, elderly residents, and individuals with disabilities, were not significantly related to hospital proximity. This suggests a mismatch between where hospitals are located and where medical demand is likely to be greatest.
Taken together, these findings suggest that the current spatial pattern of hospital locations in the Atlanta MPO region reflects market-oriented and urban-centric placement, rather than equitable access based on community health needs.