Healthcare Equity Analysis in Fulton and DeKalb, GA

CP8883 Intro to Urban Analytics Fall 2024 - Mini Assignment 3

Thanawit Suwannikom

2024-10-03

Introduction

This R Markdown document provides a brief spatial analysis of healthcare equity in Fulton and DeKalb counties, GA, using hospital POIs from Yelp along with Census Tract level 5 years estimates from ACS API.

The analysis is divided into 6 steps:

Load Data: Yelp Hospital POIs, Census Tract Estimates
Visualize Hospital Distribution on Map
Prepare Potential Meaningful Measures
Analysis on Count of Hospital per Census Tract
Analysis on Distance to the Nearest Hospital per Tract
Conclusion

1. Load Data

# Load Yelp Hospital Data
yelp_hospital <- st_read("https://raw.githubusercontent.com/ujhwang/urban-analytics-2024/main/Assignment/mini_3/yelp_hospital.geojson")

## Reading layer `yelp_hospital' from data source 
##   `https://raw.githubusercontent.com/ujhwang/urban-analytics-2024/main/Assignment/mini_3/yelp_hospital.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 129 features and 23 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -84.56242 ymin: 33.60009 xmax: -84.08677 ymax: 34.0701
## Geodetic CRS:  WGS 84

# Select Census Variables
## household income
## total population
## population by races
## population by sex
## median age
## population by education attainment

census_var = c(hhincome = 'B19019_001',
                pop = "B02001_001",
                race.white = "B02001_002", 
                race.black = 'B02001_003',
                race.indian = 'B02001_004',
                race.asian = 'B02001_005',
                age = 'B05004_001',
                sex.male = 'B01001_002',
                sex.female = 'B01001_026',
               edu.less_than_highschool = 'B06009_002',
               edu.high_school = 'B06009_003',
               edu.bachelor = 'B06009_005',
               edu.higher_bachelor = 'B06009_006')

# Load census tract data
census <- get_acs(geography = "tract", state = "GA", county = c("Fulton", "DeKalb"),
                 output = "wide", geometry = TRUE, year = 2022,
                 variables = census_var)

## Warning: • You have not set a Census API key. Users without a key are limited to 500
## queries per day and may experience performance limitations.
## ℹ For best results, get a Census API key at
## http://api.census.gov/data/key_signup.html and then supply the key to the
## `census_api_key()` function to use it throughout your tidycensus session.
## This warning is displayed once per session.

##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |================                                                      |  23%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |=====================                                                 |  29%  |                                                                              |======================                                                |  32%  |                                                                              |========================                                              |  35%  |                                                                              |==========================                                            |  37%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  46%  |                                                                              |==================================                                    |  49%  |                                                                              |====================================                                  |  51%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |================================================================      |  91%  |                                                                              |======================================================================| 100%

skim(yelp_hospital)

## Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined
## `sfl` provided. Falling back to `character`.

Data summary
Name	yelp_hospital
Number of rows	129
Number of columns	24
_______________________
Column type frequency:
character	18
logical	1
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique
id	0	1.00	22	22	0	129
alias	0	1.00	17	73	0	129
name	0	1.00	8	64	0	115
image_url	0	1.00	0	68	86	44
url	0	1.00	174	230	0	129
categories	0	1.00	9	62	0	21
transactions	0	1.00	0	0	129	1
phone	0	1.00	0	12	5	107
display_phone	0	1.00	0	14	5	107
location.address1	2	0.98	0	34	14	87
location.address2	13	0.90	0	7	100	15
location.address3	23	0.82	0	52	104	3
location.city	0	1.00	6	14	0	13
location.zip_code	0	1.00	0	5	1	33
location.country	0	1.00	2	2	0	1
location.state	0	1.00	2	2	0	1
location.display_address	0	1.00	17	93	0	93
geometry	0	1.00	21	38	0	103

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
is_closed	0	1	0	FAL: 129

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
review_count	1	12.90	42.98	0.00	0.00	0.00	2.00	319.00	▇▁▁▁▁
rating	1	1.05	1.47	0.00	0.00	0.00	2.00	5.00	▇▁▂▁▁
distance	1	1188.10	735.72	204.09	564.81	1199.68	1647.70	4098.37	▇▇▂▁▁
coordinates.latitude	1	33.86	0.12	33.60	33.77	33.81	33.92	34.07	▁▇▅▅▅
coordinates.longitude	1	-84.34	0.06	-84.56	-84.39	-84.35	-84.32	-84.09	▁▆▇▁▁

skim(census)

## Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No
## user-defined `sfl` provided. Falling back to `character`.

Data summary
Name	census
Number of rows	530
Number of columns	29
_______________________
Column type frequency:
character	3
numeric	26
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
GEOID	1	11	11	530
NAME	1	38	43	530
geometry	1	169	3727	530

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
hhincomeE	11	0.98	94833.94	52684.71	15625.0	56193.00	81096.00	119728.50	250001.0	▆▇▃▂▁
hhincomeM	21	0.96	25574.79	18745.90	149.0	13117.00	21335.00	32504.00	162485.0	▇▂▁▁▁
popE	0	1.00	3439.91	1307.62	0.0	2515.50	3319.00	4210.00	7857.0	▁▇▇▃▁
popM	0	1.00	672.81	409.22	14.0	419.00	601.00	802.25	3256.0	▇▅▁▁▁
race.whiteE	0	1.00	1279.43	1160.18	0.0	248.75	1008.50	2031.50	5156.0	▇▃▂▁▁
race.whiteM	0	1.00	322.13	252.86	2.0	141.25	280.50	440.75	2429.0	▇▂▁▁▁
race.blackE	0	1.00	1627.35	1554.90	0.0	383.75	1052.50	2666.50	7165.0	▇▂▂▁▁
race.blackM	0	1.00	503.10	430.45	14.0	206.50	398.50	679.25	3266.0	▇▂▁▁▁
race.indianE	0	1.00	14.29	52.81	0.0	0.00	0.00	7.00	658.0	▇▁▁▁▁
race.indianM	0	1.00	26.90	46.33	1.0	14.00	14.00	20.00	535.0	▇▁▁▁▁
race.asianE	0	1.00	241.52	385.04	0.0	4.00	96.00	272.75	2601.0	▇▁▁▁▁
race.asianM	0	1.00	137.27	153.44	2.0	20.00	88.50	192.75	918.0	▇▂▁▁▁
ageE	2	1.00	37.54	7.29	17.6	32.80	36.95	42.00	68.9	▁▇▆▁▁
ageM	2	1.00	5.37	3.41	0.3	3.10	4.60	6.80	18.9	▇▇▂▁▁
sex.maleE	0	1.00	1660.42	659.75	0.0	1196.75	1583.50	2064.00	4158.0	▂▇▆▂▁
sex.maleM	0	1.00	408.83	229.68	14.0	263.25	368.50	483.75	2230.0	▇▃▁▁▁
sex.femaleE	0	1.00	1779.50	735.23	0.0	1268.25	1697.50	2207.50	4423.0	▂▇▆▂▁
sex.femaleM	0	1.00	419.86	249.65	14.0	268.50	367.00	499.00	2040.0	▇▅▁▁▁
edu.less_than_highschoolE	0	1.00	175.78	182.22	0.0	43.00	121.00	254.00	1086.0	▇▂▁▁▁
edu.less_than_highschoolM	0	1.00	116.25	99.28	1.0	46.00	94.00	162.75	784.0	▇▂▁▁▁
edu.high_schoolE	0	1.00	421.71	341.77	0.0	144.25	318.00	635.50	1571.0	▇▃▂▁▁
edu.high_schoolM	0	1.00	201.48	146.26	14.0	94.00	165.50	272.00	1060.0	▇▃▁▁▁
edu.bachelorE	0	1.00	715.72	420.80	0.0	389.00	673.00	987.50	2094.0	▆▇▆▂▁
edu.bachelorM	0	1.00	241.40	136.96	14.0	152.00	218.00	291.50	1176.0	▇▅▁▁▁
edu.higher_bachelorE	0	1.00	530.86	386.35	0.0	214.00	457.00	767.00	1812.0	▇▆▃▂▁
edu.higher_bachelorM	0	1.00	191.58	112.46	12.0	114.00	172.00	244.00	814.0	▇▆▁▁▁

From data skimming, there are some missing data, lets remove them since there are only small portion of our data.

print(paste0("Before dropping NA: ", nrow(census)))

## [1] "Before dropping NA: 530"

census <- census %>% drop_na()
print(paste0("After dropping NA: ", nrow(census)))

## [1] "After dropping NA: 509"

2. Visualize Hospital Distribution on Map

tmap_mode("view");

## tmap mode set to interactive viewing

tm_shape(census) + 
  tm_borders(col = "grey") + 
  tm_shape(yelp_hospital) + 
  tm_dots(col = "orange", size = 0.1, id="name") +
  tm_layout(title = "Hospitals in Fulton and DeKalb Counties")

By looking at the map, we can see clusters of hospitals at the center of Fulton and DeKalb counties and some places spread outside the center.

3. Calculate Potential Meaningful Measures

# Population Density
# Proportion of Population by variables
census <- census %>%
  mutate(pop_density = popE / st_area(geometry),
         pct_white_pop = race.whiteE / popE,
         pct_black_pop = race.blackE / popE,
         pct_indian_pop = race.indianE / popE,
         pct_asian_pop = race.asianE / popE,
         pct_male = sex.maleE / popE,
         pct_female = sex.femaleE / popE,
         pct_edu_less_than_hs = edu.less_than_highschoolE / popE,
         pct_edu_hs = edu.high_schoolE / popE,
         pct_edu_ba = edu.bachelorE / popE,
         pct_edu_hh = edu.higher_bachelorE / popE
         )

Visualize Population Density vs Hospital POIs

tm_shape(census) + 
  tm_polygons("pop_density", style = "quantile", palette = "Blues", 
              title = "Population Density") +
  tm_shape(yelp_hospital) + 
  tm_dots(col = "orange", size = 0.1) +
  tm_layout(title = "Population Density and Hospitals")

We can see that most of hospitals are located in dense population areas.

4. Analysis on Count of Hospital per Census Tract

# Count Hospital by Census Tract

census_var_new = c(
    'pop_density',
    'pct_white_pop',
    'pct_black_pop',
    'pct_indian_pop',
    'pct_asian_pop',
    'pct_male',
    'pct_female',
    'pct_edu_less_than_hs',
    'pct_edu_hs',
    'pct_edu_ba',
    'pct_edu_hh',
    'ageE'
)

# Convert CRS to be the same as census data
yelp_hospital <- st_transform(yelp_hospital, st_crs(census))

# Ensure that the census tracts cover the point locations of the hospitals
hospitals_in_tract <- st_join(yelp_hospital, census, join = st_within)

# Count hospitals per tract based on the spatial join
hospital_count_per_tract <- hospitals_in_tract %>%
  group_by(GEOID) %>%  
  summarise(hospital_count = n()) %>% 
  st_drop_geometry()

# Merge hospital counts with the original census data
census_hospital <- census %>%
  left_join(hospital_count_per_tract, by = "GEOID")

# Replace NA with 0
census_hospital$hospital_count[is.na(census_hospital$hospital_count)] <- 0

# Create count group
census_hospital <- census_hospital %>%
  mutate(
    count_group = case_when(
      hospital_count == 0 ~ "1) No hospital",
      hospital_count <= 5 ~ "2) 1 - 5 hospitals",
      hospital_count <= 10 ~ "3) 6 - 10 hospitals",
      hospital_count <= 20 ~ "4) 11 - 20 hospitals",
      hospital_count > 20 ~ "5) more than 20 hospitals"
    )
  )

Visualize Count of Hospitals by Census Tract

tm_shape(census_hospital) + 
  tm_polygons("count_group", palette = "Greens", 
              title = "Hospital Count per Census Tract") +
  tm_layout(title = "Hospital Count per Census Tract")

From the map we can see that most of the census tracts don’t have hospitals within. Let see correlation between counts of hospitals and census variables

census_cnt_corr <- census_hospital %>%
  select(
    hospital_count,
    pop_density,
    pct_white_pop,
    pct_black_pop,
    pct_indian_pop,
    pct_asian_pop,
    pct_male,
    pct_female,
    pct_edu_less_than_hs,
    pct_edu_hs,
    pct_edu_ba,
    pct_edu_hh,
    ageE,
    hhincomeE
  ) %>% st_drop_geometry()

# Create a correlation matrix
correlation_matrix <- cor(census_cnt_corr)

print(correlation_matrix)

##                      hospital_count pop_density pct_white_pop pct_black_pop
## hospital_count          1.000000000 -0.01517637    0.08690291   -0.08157072
## pop_density            -0.015176374  1.00000000    0.11716126   -0.15914544
## pct_white_pop           0.086902912  0.11716126    1.00000000   -0.92540128
## pct_black_pop          -0.081570716 -0.15914544   -0.92540128    1.00000000
## pct_indian_pop         -0.014409910  0.04910967    0.04203995   -0.15706092
## pct_asian_pop           0.033687199  0.11371898    0.21186665   -0.47733750
## pct_male                0.026848667  0.17320322    0.18404511   -0.25342064
## pct_female             -0.026848667 -0.17320322   -0.18404511    0.25342064
## pct_edu_less_than_hs   -0.055405366 -0.06028347   -0.44634269    0.32938105
## pct_edu_hs             -0.055914499 -0.21209119   -0.71153261    0.73416799
## pct_edu_ba              0.062953096  0.07161504    0.74232583   -0.68043879
## pct_edu_hh              0.031349363  0.11890940    0.72732894   -0.69317017
## ageE                   -0.014432403 -0.33492742    0.29014762   -0.19874556
## hhincomeE              -0.009453004 -0.09521656    0.71356444   -0.66714310
##                      pct_indian_pop pct_asian_pop     pct_male   pct_female
## hospital_count          -0.01440991    0.03368720  0.026848667 -0.026848667
## pop_density              0.04910967    0.11371898  0.173203224 -0.173203224
## pct_white_pop            0.04203995    0.21186665  0.184045111 -0.184045111
## pct_black_pop           -0.15706092   -0.47733750 -0.253420639  0.253420639
## pct_indian_pop           1.00000000    0.01132637  0.075523654 -0.075523654
## pct_asian_pop            0.01132637    1.00000000  0.211386743 -0.211386743
## pct_male                 0.07552365    0.21138674  1.000000000 -1.000000000
## pct_female              -0.07552365   -0.21138674 -1.000000000  1.000000000
## pct_edu_less_than_hs     0.28427544   -0.13450174  0.001241394 -0.001241394
## pct_edu_hs              -0.01112542   -0.39493137 -0.180774036  0.180774036
## pct_edu_ba              -0.08589402    0.24474620  0.131508698 -0.131508698
## pct_edu_hh              -0.05607817    0.32785808  0.147637292 -0.147637292
## ageE                    -0.10860071   -0.01488170 -0.003952685  0.003952685
## hhincomeE               -0.06055940    0.25895970  0.134381219 -0.134381219
##                      pct_edu_less_than_hs   pct_edu_hs  pct_edu_ba  pct_edu_hh
## hospital_count               -0.055405366 -0.055914499  0.06295310  0.03134936
## pop_density                  -0.060283474 -0.212091191  0.07161504  0.11890940
## pct_white_pop                -0.446342692 -0.711532613  0.74232583  0.72732894
## pct_black_pop                 0.329381048  0.734167991 -0.68043879 -0.69317017
## pct_indian_pop                0.284275442 -0.011125424 -0.08589402 -0.05607817
## pct_asian_pop                -0.134501745 -0.394931365  0.24474620  0.32785808
## pct_male                      0.001241394 -0.180774036  0.13150870  0.14763729
## pct_female                   -0.001241394  0.180774036 -0.13150870 -0.14763729
## pct_edu_less_than_hs          1.000000000  0.455303677 -0.56762486 -0.55234627
## pct_edu_hs                    0.455303677  1.000000000 -0.66574144 -0.69979673
## pct_edu_ba                   -0.567624858 -0.665741440  1.00000000  0.68358518
## pct_edu_hh                   -0.552346272 -0.699796728  0.68358518  1.00000000
## ageE                         -0.172619036 -0.007637321  0.28885331  0.32479420
## hhincomeE                    -0.472201217 -0.634200847  0.59676166  0.61659244
##                              ageE    hhincomeE
## hospital_count       -0.014432403 -0.009453004
## pop_density          -0.334927416 -0.095216561
## pct_white_pop         0.290147617  0.713564445
## pct_black_pop        -0.198745562 -0.667143099
## pct_indian_pop       -0.108600714 -0.060559401
## pct_asian_pop        -0.014881702  0.258959695
## pct_male             -0.003952685  0.134381219
## pct_female            0.003952685 -0.134381219
## pct_edu_less_than_hs -0.172619036 -0.472201217
## pct_edu_hs           -0.007637321 -0.634200847
## pct_edu_ba            0.288853311  0.596761659
## pct_edu_hh            0.324794202  0.616592444
## ageE                  1.000000000  0.400222739
## hhincomeE             0.400222739  1.000000000

# Melt the correlation matrix into long format for plotting heatmap
melted_corr_matrix <- melt(correlation_matrix)

# Plot the heatmap
ggplot(data = melted_corr_matrix, aes(Var1, Var2, fill = value)) +
  geom_tile(color = "white") + 
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), space = "Lab", 
                       name="Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, 
                                   size = 8, hjust = 1)) +
  coord_fixed() +
  labs(title = "Correlation Heatmap - Count", x = "", y = "")

From heatmap of the correlation, there is no strong correlation between number of hospital and other variables.

We should select a more suitable measure for determining the distribution of hospital per tract.

6. Analysis on Distance to the Nearest Hospital per Tract

Calculate distance to the nearest hospital per census tract

# Convert geometries to centroids for census tracts
census_centroids <- st_centroid(census) %>%
  st_transform(crs = 32616) # Convert CRS to 32616 for measuring distance in meters

## Warning: st_centroid assumes attributes are constant over geometries

# Calculate distance to nearest hospital
distances <- st_distance(census_centroids, 
                         yelp_hospital %>% st_transform(crs = 32616))

# Get minimum distance to nearest hospital for each tract
census_hospital$min_distance_hospital <- apply(distances, 1, min)

Visualize nearest distance of tract’s centroids to the nearest hospital

tm_shape(census_hospital) + 
  tm_polygons("min_distance_hospital", style = "quantile", palette = "Blues", 
              title = "Distance to Nearest Hospital") +
  tm_shape(yelp_hospital) + 
  tm_dots(col = "orange", size = 0.1) +
  tm_layout(title = "Distance to Nearest Hospital")

Look at the distribution of the distance

ggplot(census_hospital) +
  geom_histogram(aes(x = min_distance_hospital), bins = 20) + 
  labs(title = "Histogram Distance to Nearest Hospital", x = "Distance (m)", y = "Count") +
  theme_minimal()

Most of the tracts have less than 6 miles (10km) nearest distance to hospital while there are some tract that have distance higher and is up to 15.53 miles (25km).

Take a look at the correlations between census variables and distance to hospital.

census_dist_corr <- census_hospital %>%
  select(
    min_distance_hospital,
    pop_density,
    pct_white_pop,
    pct_black_pop,
    pct_indian_pop,
    pct_asian_pop,
    pct_male,
    pct_female,
    pct_edu_less_than_hs,
    pct_edu_hs,
    pct_edu_ba,
    pct_edu_hh,
    ageE,
    hhincomeE
  ) %>% st_drop_geometry()

# Create a correlation matrix
correlation_matrix_dist <- cor(census_dist_corr)

print(correlation_matrix_dist)

##                       min_distance_hospital pop_density pct_white_pop
## min_distance_hospital            1.00000000 -0.30439882   -0.34641883
## pop_density                     -0.30439882  1.00000000    0.11716126
## pct_white_pop                   -0.34641883  0.11716126    1.00000000
## pct_black_pop                    0.38426955 -0.15914544   -0.92540128
## pct_indian_pop                  -0.11567113  0.04910967    0.04203995
## pct_asian_pop                   -0.20624509  0.11371898    0.21186665
## pct_male                        -0.10043312  0.17320322    0.18404511
## pct_female                       0.10043312 -0.17320322   -0.18404511
## pct_edu_less_than_hs             0.04567122 -0.06028347   -0.44634269
## pct_edu_hs                       0.22501376 -0.21209119   -0.71153261
## pct_edu_ba                      -0.23643005  0.07161504    0.74232583
## pct_edu_hh                      -0.27934219  0.11890940    0.72732894
## ageE                             0.08340910 -0.33492742    0.29014762
## hhincomeE                       -0.05594126 -0.09521656    0.71356444
##                       pct_black_pop pct_indian_pop pct_asian_pop     pct_male
## min_distance_hospital     0.3842695    -0.11567113   -0.20624509 -0.100433120
## pop_density              -0.1591454     0.04910967    0.11371898  0.173203224
## pct_white_pop            -0.9254013     0.04203995    0.21186665  0.184045111
## pct_black_pop             1.0000000    -0.15706092   -0.47733750 -0.253420639
## pct_indian_pop           -0.1570609     1.00000000    0.01132637  0.075523654
## pct_asian_pop            -0.4773375     0.01132637    1.00000000  0.211386743
## pct_male                 -0.2534206     0.07552365    0.21138674  1.000000000
## pct_female                0.2534206    -0.07552365   -0.21138674 -1.000000000
## pct_edu_less_than_hs      0.3293810     0.28427544   -0.13450174  0.001241394
## pct_edu_hs                0.7341680    -0.01112542   -0.39493137 -0.180774036
## pct_edu_ba               -0.6804388    -0.08589402    0.24474620  0.131508698
## pct_edu_hh               -0.6931702    -0.05607817    0.32785808  0.147637292
## ageE                     -0.1987456    -0.10860071   -0.01488170 -0.003952685
## hhincomeE                -0.6671431    -0.06055940    0.25895970  0.134381219
##                         pct_female pct_edu_less_than_hs   pct_edu_hs
## min_distance_hospital  0.100433120          0.045671223  0.225013757
## pop_density           -0.173203224         -0.060283474 -0.212091191
## pct_white_pop         -0.184045111         -0.446342692 -0.711532613
## pct_black_pop          0.253420639          0.329381048  0.734167991
## pct_indian_pop        -0.075523654          0.284275442 -0.011125424
## pct_asian_pop         -0.211386743         -0.134501745 -0.394931365
## pct_male              -1.000000000          0.001241394 -0.180774036
## pct_female             1.000000000         -0.001241394  0.180774036
## pct_edu_less_than_hs  -0.001241394          1.000000000  0.455303677
## pct_edu_hs             0.180774036          0.455303677  1.000000000
## pct_edu_ba            -0.131508698         -0.567624858 -0.665741440
## pct_edu_hh            -0.147637292         -0.552346272 -0.699796728
## ageE                   0.003952685         -0.172619036 -0.007637321
## hhincomeE             -0.134381219         -0.472201217 -0.634200847
##                        pct_edu_ba  pct_edu_hh         ageE   hhincomeE
## min_distance_hospital -0.23643005 -0.27934219  0.083409101 -0.05594126
## pop_density            0.07161504  0.11890940 -0.334927416 -0.09521656
## pct_white_pop          0.74232583  0.72732894  0.290147617  0.71356444
## pct_black_pop         -0.68043879 -0.69317017 -0.198745562 -0.66714310
## pct_indian_pop        -0.08589402 -0.05607817 -0.108600714 -0.06055940
## pct_asian_pop          0.24474620  0.32785808 -0.014881702  0.25895970
## pct_male               0.13150870  0.14763729 -0.003952685  0.13438122
## pct_female            -0.13150870 -0.14763729  0.003952685 -0.13438122
## pct_edu_less_than_hs  -0.56762486 -0.55234627 -0.172619036 -0.47220122
## pct_edu_hs            -0.66574144 -0.69979673 -0.007637321 -0.63420085
## pct_edu_ba             1.00000000  0.68358518  0.288853311  0.59676166
## pct_edu_hh             0.68358518  1.00000000  0.324794202  0.61659244
## ageE                   0.28885331  0.32479420  1.000000000  0.40022274
## hhincomeE              0.59676166  0.61659244  0.400222739  1.00000000

# Melt the correlation matrix into long format for ploting heatmap
melted_corr_matrix_dist <- melt(correlation_matrix_dist)

# Plot the heatmap
ggplot(data = melted_corr_matrix_dist, aes(Var1, Var2, fill = value)) +
  geom_tile(color = "white") + 
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), space = "Lab", 
                       name="Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, 
                                   size = 8, hjust = 1)) + 
  coord_fixed() +
  labs(title = "Correlation Heatmap - Distance", x = "", y = "")

From the Correlation Matrix, the blue cells indicate negative correlation (more values -> short distance) and the red cells show positive correlation (more value -> far away from hospitals). The variables with negative correlations are population density, percentage of white/indian/asian population, percentage of population with bachelor degree and higher, and percentage of male population.

ggplot(census_hospital, aes(x = hhincomeE, y = min_distance_hospital)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", col = "red") +
  labs(title = "Income vs Distance to Nearest Hospital", 
       x = "Household Income", 
       y = "Distance to Hospital")

## `geom_smooth()` using formula = 'y ~ x'

ggplot(census_hospital, aes(x = pct_black_pop, y = min_distance_hospital)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", col = "blue") +
  labs(title = "Pct. of Black Population vs Distance to Nearest Hospital", 
       x = "% Black Population", 
       y = "Distance to Hospital")

## `geom_smooth()` using formula = 'y ~ x'

ggplot(census_hospital, aes(x = pct_female, y = min_distance_hospital)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", col = "pink") +
  labs(title = "Pct. of Female Population vs Distance to Nearest Hospital", 
       x = "% Female Population", 
       y = "Distance to Hospital")

## `geom_smooth()` using formula = 'y ~ x'

ggplot(census_hospital, aes(x = pct_edu_hs + pct_edu_less_than_hs, y = min_distance_hospital)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", col = "Green") +
  labs(title = "Pct. of High School and Less than High School Eduation Attainment \n vs Distance to Nearest Hospital", 
       x = "% Population with High School Attainment or Lower", 
       y = "Distance to Hospital")

## `geom_smooth()` using formula = 'y ~ x'

By looking at the correlation and the scatter plots with linear trends, we can see some trends on % black population and % population with high school or less than high school education attainment that increases along with the distance to hospital. This could indicate inequity of spatial distribution of hospitals.

6. Conclusion

In conclusion, when looking at the number of hospitals per census tract we cannot see any pattern or correlation because a small numbers of hospitals which leads to small dataset (most of the value is NA). While looking at the distance to the nearest hospital, we see some correlations between the distance and census variables that suggest the inequity of spatial distribution of hospitals in Fulton and DeKalb counties. Two indicators from this analysis are proportion of black population and population with high school or less than high school education attainment which can be seen in both correlation matrix heatmap and scatter plots with linear trends.