Is the spatial distribution of hospitals in Fulton and DeKalb counties equitable?
From the data exploration detailed below, it does not appear that hospitals in Fulton and Dekalb counties are equitably distributed. The distance from the nearest hospital increases as the percentage of households without cars increases, suggesting that those who have the most accessibility challenges (given the location in Atlanta) also have to travel the furthest to get to a hospital. Additionally, as the percentage of non-white residents increases, distance to the nearest hospital increases. This relationship is stronger in Fulton county than in Dekalb county, suggesting less equitable distribution in Fulton. Those census tracts further from the City of Atlanta also have to travel further to get to a hospital. Strangely, at the highest population numbers, as the population of the census tract increases, the number of hospitals within 0.25 miles decreases.

Import Data

yelp <- st_read("https://raw.githubusercontent.com/ujhwang/urban-analytics-2024/main/Assignment/mini_3/yelp_hospital.geojson")

# Census data 
# I have chosen hhincome "B19013_001" (hospitals could be located in higher income areas), Race - using non-hispanic white "B03002_003" and pop "B01003_001" to calculate a percentage of non-white (see if they are equitably distributed between neighborhoods with different racial distributions), and percent of people who take public transit or walk to work as a proxy of accessibility 

tract <- get_acs(geography = "tract",
          state = "GA",
          county = c("Fulton","Dekalb"),
          variables = c(non_hispanic_white = "B03002_003",
                        pop = "B01003_001",
                        hhincome = "B19013_001",
                        cars_total_hh= "B25044_001",
                        cars_none = "B25044_002"),
          year = 2022,
          survey = "acs5", 
          geometry = TRUE,
          output = "wide") %>%
  mutate(minority_pct = (popE - non_hispanic_whiteE)/popE,
         noncar_pct = cars_noneE/cars_total_hhE,
         pop = popE,
         hhincome = hhincomeE) %>% 
  separate(col = "NAME", 
           into = c("tract", "county", "state"), 
           sep = "; ")  %>%
    select(GEOID, county, minority_pct, noncar_pct, hhincome, pop) %>%
  st_transform(4326)
## Getting data from the 2018-2022 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

Check the data

library(skimr)
skim(yelp)
## Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined
## `sfl` provided. Falling back to `character`.
Data summary
Name yelp
Number of rows 129
Number of columns 24
_______________________
Column type frequency:
character 18
logical 1
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
id 0 1.00 22 22 0 129 0
alias 0 1.00 17 73 0 129 0
name 0 1.00 8 64 0 115 0
image_url 0 1.00 0 68 86 44 0
url 0 1.00 174 230 0 129 0
categories 0 1.00 9 62 0 21 0
transactions 0 1.00 0 0 129 1 0
phone 0 1.00 0 12 5 107 0
display_phone 0 1.00 0 14 5 107 0
location.address1 2 0.98 0 34 14 87 0
location.address2 13 0.90 0 7 100 15 0
location.address3 23 0.82 0 52 104 3 0
location.city 0 1.00 6 14 0 13 0
location.zip_code 0 1.00 0 5 1 33 0
location.country 0 1.00 2 2 0 1 0
location.state 0 1.00 2 2 0 1 0
location.display_address 0 1.00 17 93 0 93 0
geometry 0 1.00 21 38 0 103 0

Variable type: logical

skim_variable n_missing complete_rate mean count
is_closed 0 1 0 FAL: 129

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
review_count 0 1 12.90 42.98 0.00 0.00 0.00 2.00 319.00 ▇▁▁▁▁
rating 0 1 1.05 1.47 0.00 0.00 0.00 2.00 5.00 ▇▁▂▁▁
distance 0 1 1188.10 735.72 204.09 564.81 1199.68 1647.70 4098.37 ▇▇▂▁▁
coordinates.latitude 0 1 33.86 0.12 33.60 33.77 33.81 33.92 34.07 ▁▇▅▅▅
coordinates.longitude 0 1 -84.34 0.06 -84.56 -84.39 -84.35 -84.32 -84.09 ▁▆▇▁▁
yelp %>% sapply(class) %>% print()
## $id
## [1] "character"
## 
## $alias
## [1] "character"
## 
## $name
## [1] "character"
## 
## $image_url
## [1] "character"
## 
## $is_closed
## [1] "logical"
## 
## $url
## [1] "character"
## 
## $review_count
## [1] "integer"
## 
## $categories
## [1] "character"
## 
## $rating
## [1] "numeric"
## 
## $transactions
## [1] "character"
## 
## $phone
## [1] "character"
## 
## $display_phone
## [1] "character"
## 
## $distance
## [1] "numeric"
## 
## $coordinates.latitude
## [1] "numeric"
## 
## $coordinates.longitude
## [1] "numeric"
## 
## $location.address1
## [1] "character"
## 
## $location.address2
## [1] "character"
## 
## $location.address3
## [1] "character"
## 
## $location.city
## [1] "character"
## 
## $location.zip_code
## [1] "character"
## 
## $location.country
## [1] "character"
## 
## $location.state
## [1] "character"
## 
## $location.display_address
## [1] "character"
## 
## $geometry
## [1] "sfc_POINT" "sfc"
yelp %>% head()
## Simple feature collection with 6 features and 23 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -84.39409 ymin: 33.81016 xmax: -84.26324 ymax: 34.0701
## Geodetic CRS:  WGS 84
##                       id                                            alias
## 1 fiSgDb29cZO-MffxwrT_LA northside-alpharetta-medical-campus-alpharetta-2
## 2 hWXJFx9zCVDmiO6eldgHBA                      regency-hospital-alpharetta
## 3 kuE7B_HLP57KnSIgn-jzpw northside-alpharetta-preventive-medicine-atlanta
## 4 0B80Az6DXX8ZLXZXhU1bCQ                         health-direct-alpharetta
## 5 EaD2lKlKh4NpeSD25_vrHw                 ultimatemedicalhousecall-atlanta
## 6 O4DoAVRc6oTjUe9ufieTjA                          shepherd-center-atlanta
##                                       name
## 1      Northside Alpharetta Medical Campus
## 2                         Regency Hospital
## 3 Northside Alpharetta Preventive Medicine
## 4                            Health Direct
## 5                 Ultimatemedicalhousecall
## 6                          Shepherd Center
##                                                              image_url
## 1 https://s3-media4.fl.yelpcdn.com/bphoto/8Pe7ZlobOywD_CakTalTdg/o.jpg
## 2                                                                     
## 3                                                                     
## 4                                                                     
## 5                                                                     
## 6 https://s3-media2.fl.yelpcdn.com/bphoto/7H4uUz1eDFV1a2fM-5-e2A/o.jpg
##   is_closed
## 1     FALSE
## 2     FALSE
## 3     FALSE
## 4     FALSE
## 5     FALSE
## 6     FALSE
##                                                                                                                                                                                                             url
## 1 https://www.yelp.com/biz/northside-alpharetta-medical-campus-alpharetta-2?adjust_creative=kfv-AqbqsBxwk_npQxknWQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=kfv-AqbqsBxwk_npQxknWQ
## 2                      https://www.yelp.com/biz/regency-hospital-alpharetta?adjust_creative=kfv-AqbqsBxwk_npQxknWQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=kfv-AqbqsBxwk_npQxknWQ
## 3 https://www.yelp.com/biz/northside-alpharetta-preventive-medicine-atlanta?adjust_creative=kfv-AqbqsBxwk_npQxknWQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=kfv-AqbqsBxwk_npQxknWQ
## 4                         https://www.yelp.com/biz/health-direct-alpharetta?adjust_creative=kfv-AqbqsBxwk_npQxknWQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=kfv-AqbqsBxwk_npQxknWQ
## 5                 https://www.yelp.com/biz/ultimatemedicalhousecall-atlanta?adjust_creative=kfv-AqbqsBxwk_npQxknWQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=kfv-AqbqsBxwk_npQxknWQ
## 6                          https://www.yelp.com/biz/shepherd-center-atlanta?adjust_creative=kfv-AqbqsBxwk_npQxknWQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=kfv-AqbqsBxwk_npQxknWQ
##   review_count                                                     categories
## 1            1                                                      Hospitals
## 2            0                                                      Hospitals
## 3            0                                                      Hospitals
## 4            0                                                      Hospitals
## 5            0                                                      Hospitals
## 6           34 Hospitals, Rehabilitation Center, Community Service/Non-Profit
##   rating transactions        phone  display_phone  distance
## 1    4.0              +17706674000 (770) 667-4000 1496.5068
## 2    0.0              +16786940553 (678) 694-0553 2125.5366
## 3    0.0              +17706674155 (770) 667-4155 1496.5068
## 4    0.0              +17708702500 (770) 870-2500 2460.2880
## 5    0.0                                           595.1797
## 6    3.2              +14043522020 (404) 352-2020  769.8365
##   coordinates.latitude coordinates.longitude          location.address1
## 1             34.07010             -84.26324 3400 A & C Old Milton Pkwy
## 2             34.05106             -84.27857            11175 Cicero Dr
## 3             34.07010             -84.26324     3400 A Old Milton Pkwy
## 4             34.06029             -84.28281       2550 Northwinds Pkwy
## 5             33.84957             -84.37021        3495 Buckhead Lp NE
## 6             33.81016             -84.39409       2020 Peachtree Rd NW
##   location.address2 location.address3 location.city location.zip_code
## 1                                        Alpharetta             30004
## 2           Ste 300              <NA>    Alpharetta             30022
## 3                                           Atlanta             30303
## 4                                        Alpharetta             30004
## 5                                <NA>       Atlanta             30305
## 6                                           Atlanta             30309
##   location.country location.state
## 1               US             GA
## 2               US             GA
## 3               US             GA
## 4               US             GA
## 5               US             GA
## 6               US             GA
##                           location.display_address                   geometry
## 1 3400 A & C Old Milton Pkwy, Alpharetta, GA 30004  POINT (-84.26324 34.0701)
## 2   11175 Cicero Dr, Ste 300, Alpharetta, GA 30022 POINT (-84.27857 34.05106)
## 3        3400 A Old Milton Pkwy, Atlanta, GA 30303  POINT (-84.26324 34.0701)
## 4       2550 Northwinds Pkwy, Alpharetta, GA 30004 POINT (-84.28281 34.06029)
## 5           3495 Buckhead Lp NE, Atlanta, GA 30305 POINT (-84.37021 33.84957)
## 6          2020 Peachtree Rd NW, Atlanta, GA 30309 POINT (-84.39409 33.81016)
skim(tract)
## Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No
## user-defined `sfl` provided. Falling back to `character`.
Data summary
Name tract
Number of rows 530
Number of columns 7
_______________________
Column type frequency:
character 3
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
GEOID 0 1 11 11 0 530 0
county 0 1 13 13 0 2 0
geometry 0 1 169 3727 0 530 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
minority_pct 2 1.00 0.65 0.29 0.02 0.37 0.69 0.94 1 ▂▃▃▃▇
noncar_pct 4 0.99 0.55 0.27 0.00 0.33 0.57 0.78 1 ▃▆▇▇▇
hhincome 11 0.98 94833.94 52684.71 15625.00 56193.00 81096.00 119728.50 250001 ▆▇▃▂▁
pop 0 1.00 3439.91 1307.62 0.00 2515.50 3319.00 4210.00 7857 ▁▇▇▃▁
tract <- tract %>% 
  drop_na(hhincome,minority_pct,noncar_pct)
skim(tract)
## Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No
## user-defined `sfl` provided. Falling back to `character`.
Data summary
Name tract
Number of rows 519
Number of columns 7
_______________________
Column type frequency:
character 3
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
GEOID 0 1 11 11 0 519 0
county 0 1 13 13 0 2 0
geometry 0 1 169 3727 0 519 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
minority_pct 0 1 0.64 0.29 0.02 0.36 0.69 0.94 1 ▂▃▃▃▇
noncar_pct 0 1 0.56 0.27 0.00 0.34 0.57 0.78 1 ▃▆▇▇▇
hhincome 0 1 94833.94 52684.71 15625.00 56193.00 81096.00 119728.50 250001 ▆▇▃▂▁
pop 0 1 3478.01 1283.94 624.00 2567.50 3357.00 4232.00 7857 ▂▇▆▂▁
tract %>% sapply(class) %>% print()
## $GEOID
## [1] "character"
## 
## $county
## [1] "character"
## 
## $minority_pct
## [1] "numeric"
## 
## $noncar_pct
## [1] "numeric"
## 
## $hhincome
## [1] "numeric"
## 
## $pop
## [1] "numeric"
## 
## $geometry
## [1] "sfc_MULTIPOLYGON" "sfc"
tract %>% head()
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.47194 ymin: 33.65339 xmax: -84.16538 ymax: 33.77463
## Geodetic CRS:  WGS 84
##         GEOID        county minority_pct noncar_pct hhincome  pop
## 1 13089023212 DeKalb County    0.9868970  0.6307460    64698 3358
## 2 13089020400 DeKalb County    0.1738649  0.7973422   176250 2709
## 3 13089020500 DeKalb County    0.5122017  0.5865003    83594 3688
## 4 13089022800 DeKalb County    0.3480134  0.7578786   159821 4178
## 5 13121010601 Fulton County    0.7274374  0.3304749    62671 3313
## 6 13121008301 Fulton County    0.9439439  0.4377224    31543 2997
##                         geometry
## 1 MULTIPOLYGON (((-84.18651 3...
## 2 MULTIPOLYGON (((-84.34921 3...
## 3 MULTIPOLYGON (((-84.34919 3...
## 4 MULTIPOLYGON (((-84.2965 33...
## 5 MULTIPOLYGON (((-84.4686 33...
## 6 MULTIPOLYGON (((-84.47144 3...
## tmap mode set to interactive viewing
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(tract) + tm_polygons(col = "hhincome")
tm_shape(yelp) + tm_dots(col="red") + tm_shape(tract) + tm_borders()
# Join data sets
## Check to see that the CRS for both the sf files are the same
head(tract$geometry)
## Geometry set for 6 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.47194 ymin: 33.65339 xmax: -84.16538 ymax: 33.77463
## Geodetic CRS:  WGS 84
## First 5 geometries:
## MULTIPOLYGON (((-84.18651 33.72737, -84.18381 3...
## MULTIPOLYGON (((-84.34921 33.76496, -84.34915 3...
## MULTIPOLYGON (((-84.34919 33.7481, -84.34919 33...
## MULTIPOLYGON (((-84.2965 33.76991, -84.29264 33...
## MULTIPOLYGON (((-84.4686 33.66347, -84.46678 33...
head(yelp$geometry)
## Geometry set for 6 features 
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -84.39409 ymin: 33.81016 xmax: -84.26324 ymax: 34.0701
## Geodetic CRS:  WGS 84
## First 5 geometries:
## POINT (-84.26324 34.0701)
## POINT (-84.27857 34.05106)
## POINT (-84.26324 34.0701)
## POINT (-84.28281 34.06029)
## POINT (-84.37021 33.84957)
# same geometry

# Join tract geometry with hospitals 
tract_hosp <- st_join(tract, yelp %>% mutate(count = 1))

# Calculate the count of hospitals at the tract level
tract_hosp_count <- tract_hosp %>%
  group_by(GEOID) %>%
  summarise(count = sum(count, na.rm = T))

## Join back the counts of hospitals to the Tract data
tract1 <- tract %>%
  left_join(tract_hosp_count %>% st_drop_geometry(), 
            by = "GEOID")

# check data 
skim(tract)
## Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No
## user-defined `sfl` provided. Falling back to `character`.
Data summary
Name tract
Number of rows 519
Number of columns 7
_______________________
Column type frequency:
character 3
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
GEOID 0 1 11 11 0 519 0
county 0 1 13 13 0 2 0
geometry 0 1 169 3727 0 519 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
minority_pct 0 1 0.64 0.29 0.02 0.36 0.69 0.94 1 ▂▃▃▃▇
noncar_pct 0 1 0.56 0.27 0.00 0.34 0.57 0.78 1 ▃▆▇▇▇
hhincome 0 1 94833.94 52684.71 15625.00 56193.00 81096.00 119728.50 250001 ▆▇▃▂▁
pop 0 1 3478.01 1283.94 624.00 2567.50 3357.00 4232.00 7857 ▂▇▆▂▁

Exploratory Analysis

# Exploratory analysis 
## Are hospitals where total population is higher?
tm_shape(tract) + tm_polygons(col="pop") +
  tm_shape(yelp) +tm_dots()
## Are hospitals in census tracts with higher income?
tm_shape(tract) + tm_polygons(col="hhincome") +
  tm_shape(yelp) +tm_dots()
### they don't appear to be distributed in higher income neighborhoods, and actually seem to be in lower income neighborhoods 

## Are hospitals in census tracts with higher non-white population?
tm_shape(tract) + tm_polygons(col="minority_pct") +
  tm_shape(yelp) +tm_dots()
### there does appear to be a lack of hospitals in the areas of the city that have  the highest percentage of minority residents, though they also don't seem to be distributed in areas with the highest percentage of white residents 

## Are hospitals in census tracts where fewer people have access to cars?
tm_shape(tract) + tm_polygons(col="noncar_pct") +
  tm_shape(yelp) +tm_dots()
### it looks like areas with the highest percentage of households without cars, there are few hospitals 

Number of hospitals within 0.25 mile from each Tract

## create buffer around tracts 
tract_buffer = st_buffer(tract, dist=402.336) # 0.25 miles to meters

## Find intersections between hospitals and buffer
intersections <- st_intersects(tract_buffer, yelp)

# Count the number of hospitals within the buffer for each census tract
tract_buffer$hospital_count <- sapply(intersections, length)

tm_shape(tract_buffer) + tm_polygons(col = "hospital_count") 
# a lot of census tracts do not have a hospital within 0.25 miles, but others have up to 20! 

Do the number of hospitals within .25 miles differ based on total population of the census tract?

# look at distance to hospital and minority population 
ggplot(data = tract_buffer) +
  geom_smooth(mapping = aes(x=pop, y=hospital_count))+
  labs(x = "Population of the census tract",
       y = "Count of hospitals within 0.25 miles")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# see if this differs by county
ggplot(data = tract_buffer) +
  geom_smooth(mapping = aes(x=pop, y=hospital_count))+
    facet_wrap(~county)+
  labs(x = "Population of the census tract",
       y = "Count of hospitals within 0.25 miles")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Distance from each census tract to the nearest hospital

## Calculate the centroid of each tract
tracts_centroid <- st_centroid(tract)
## Warning: st_centroid assumes attributes are constant over geometries
## Calculate the distance from each tract centroid to all hospitals
distances <- st_distance(tracts_centroid, yelp)

## Find the minimum distance for each census tract
tract$nearest_hospital_dist <- apply(distances, 1, min)

# Step 5: View the results
head(tract)
## Simple feature collection with 6 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.47194 ymin: 33.65339 xmax: -84.16538 ymax: 33.77463
## Geodetic CRS:  WGS 84
##         GEOID        county minority_pct noncar_pct hhincome  pop
## 1 13089023212 DeKalb County    0.9868970  0.6307460    64698 3358
## 2 13089020400 DeKalb County    0.1738649  0.7973422   176250 2709
## 3 13089020500 DeKalb County    0.5122017  0.5865003    83594 3688
## 4 13089022800 DeKalb County    0.3480134  0.7578786   159821 4178
## 5 13121010601 Fulton County    0.7274374  0.3304749    62671 3313
## 6 13121008301 Fulton County    0.9439439  0.4377224    31543 2997
##                         geometry nearest_hospital_dist
## 1 MULTIPOLYGON (((-84.18651 3...              4103.495
## 2 MULTIPOLYGON (((-84.34921 3...              1557.793
## 3 MULTIPOLYGON (((-84.34919 3...              1472.414
## 4 MULTIPOLYGON (((-84.2965 33...              1633.319
## 5 MULTIPOLYGON (((-84.4686 33...              2562.960
## 6 MULTIPOLYGON (((-84.47144 3...              3017.808
# visualize
tm_shape(tract) + tm_polygons(col = "nearest_hospital_dist") 

Does the distance to the nearest hospital differ based on the percentage of non-white residents in the census tract? Does this differ by county?

# look at distance to hospital and minority population 
ggplot(data = tract) +
  geom_smooth(mapping = aes(x=minority_pct, y=nearest_hospital_dist))+
  labs(x = "Percent of non-white residents",
       y = "Distance to a hospital (meters)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# see if this differs by county
ggplot(data = tract) +
  geom_smooth(mapping = aes(x=minority_pct, y=nearest_hospital_dist))+
  facet_wrap(~county)+
  labs(x = "Percent of non-white residents",
       y = "Distance to a hospital (meters)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Does the distance to the nearest hospital differ based on household income? Does this differ by county?

# look at distance to hospital and minority population 
ggplot(data = tract) +
  geom_smooth(mapping = aes(x=hhincome, y=nearest_hospital_dist))+
  labs(x = "Household income",
       y = "Distance to a hospital (meters)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# see if this differs by county
ggplot(data = tract) +
  geom_smooth(mapping = aes(x=hhincome, y=nearest_hospital_dist))+
  facet_wrap(~county)+
  labs(x = "Household income",
       y = "Distance to a hospital (meters)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Does the distance to the nearest hospital differ based on where residents have access to cars?

# look at distance to hospital and minority population 
ggplot(data = tract) +
  geom_smooth(mapping = aes(x=noncar_pct, y=nearest_hospital_dist))+
  labs(x = "Percent of households without a car",
       y = "Distance to a hospital (meters)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'