Introduction
This R Markdown document provides a brief spatial analysis of healthcare equity in Fulton and DeKalb counties, GA, using hospital POIs from Yelp along with Census Tract level 5 years estimates from ACS API.
The analysis is divided into 6 steps:
- Load Data: Yelp Hospital POIs, Census Tract Estimates
- Visualize Hospital Distribution on Map
- Prepare Potential Meaningful Measures
- Analysis on Count of Hospital per Census Tract
- Analysis on Distance to the Nearest Hospital per Tract
- Conclusion
1. Load Data
# Load Yelp Hospital Data
yelp_hospital <- st_read("https://raw.githubusercontent.com/ujhwang/urban-analytics-2024/main/Assignment/mini_3/yelp_hospital.geojson")
## Reading layer `yelp_hospital' from data source
## `https://raw.githubusercontent.com/ujhwang/urban-analytics-2024/main/Assignment/mini_3/yelp_hospital.geojson'
## using driver `GeoJSON'
## Simple feature collection with 129 features and 23 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -84.56242 ymin: 33.60009 xmax: -84.08677 ymax: 34.0701
## Geodetic CRS: WGS 84
# Select Census Variables
## household income
## total population
## population by races
## population by sex
## median age
## population by education attainment
census_var = c(hhincome = 'B19019_001',
pop = "B02001_001",
race.white = "B02001_002",
race.black = 'B02001_003',
race.indian = 'B02001_004',
race.asian = 'B02001_005',
age = 'B05004_001',
sex.male = 'B01001_002',
sex.female = 'B01001_026',
edu.less_than_highschool = 'B06009_002',
edu.high_school = 'B06009_003',
edu.bachelor = 'B06009_005',
edu.higher_bachelor = 'B06009_006')
# Load census tract data
census <- get_acs(geography = "tract", state = "GA", county = c("Fulton", "DeKalb"),
output = "wide", geometry = TRUE, year = 2022,
variables = census_var)
## Warning: • You have not set a Census API key. Users without a key are limited to 500
## queries per day and may experience performance limitations.
## ℹ For best results, get a Census API key at
## http://api.census.gov/data/key_signup.html and then supply the key to the
## `census_api_key()` function to use it throughout your tidycensus session.
## This warning is displayed once per session.
## | | | 0% | |= | 1% | |== | 2% | |== | 3% | |=== | 4% | |==== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |====== | 9% | |======= | 10% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |============== | 21% | |=============== | 21% | |================ | 23% | |================== | 26% | |=================== | 27% | |===================== | 29% | |====================== | 32% | |======================== | 35% | |========================== | 37% | |=========================== | 39% | |============================ | 40% | |=============================== | 44% | |================================ | 46% | |================================== | 49% | |==================================== | 51% | |===================================================== | 76% | |====================================================== | 77% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 80% | |========================================================= | 82% | |========================================================== | 83% | |================================================================ | 91% | |======================================================================| 100%
skim(yelp_hospital)
## Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined
## `sfl` provided. Falling back to `character`.
Name | yelp_hospital |
Number of rows | 129 |
Number of columns | 24 |
_______________________ | |
Column type frequency: | |
character | 18 |
logical | 1 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
id | 0 | 1.00 | 22 | 22 | 0 | 129 | 0 |
alias | 0 | 1.00 | 17 | 73 | 0 | 129 | 0 |
name | 0 | 1.00 | 8 | 64 | 0 | 115 | 0 |
image_url | 0 | 1.00 | 0 | 68 | 86 | 44 | 0 |
url | 0 | 1.00 | 174 | 230 | 0 | 129 | 0 |
categories | 0 | 1.00 | 9 | 62 | 0 | 21 | 0 |
transactions | 0 | 1.00 | 0 | 0 | 129 | 1 | 0 |
phone | 0 | 1.00 | 0 | 12 | 5 | 107 | 0 |
display_phone | 0 | 1.00 | 0 | 14 | 5 | 107 | 0 |
location.address1 | 2 | 0.98 | 0 | 34 | 14 | 87 | 0 |
location.address2 | 13 | 0.90 | 0 | 7 | 100 | 15 | 0 |
location.address3 | 23 | 0.82 | 0 | 52 | 104 | 3 | 0 |
location.city | 0 | 1.00 | 6 | 14 | 0 | 13 | 0 |
location.zip_code | 0 | 1.00 | 0 | 5 | 1 | 33 | 0 |
location.country | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
location.state | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
location.display_address | 0 | 1.00 | 17 | 93 | 0 | 93 | 0 |
geometry | 0 | 1.00 | 21 | 38 | 0 | 103 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
is_closed | 0 | 1 | 0 | FAL: 129 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
review_count | 0 | 1 | 12.90 | 42.98 | 0.00 | 0.00 | 0.00 | 2.00 | 319.00 | ▇▁▁▁▁ |
rating | 0 | 1 | 1.05 | 1.47 | 0.00 | 0.00 | 0.00 | 2.00 | 5.00 | ▇▁▂▁▁ |
distance | 0 | 1 | 1188.10 | 735.72 | 204.09 | 564.81 | 1199.68 | 1647.70 | 4098.37 | ▇▇▂▁▁ |
coordinates.latitude | 0 | 1 | 33.86 | 0.12 | 33.60 | 33.77 | 33.81 | 33.92 | 34.07 | ▁▇▅▅▅ |
coordinates.longitude | 0 | 1 | -84.34 | 0.06 | -84.56 | -84.39 | -84.35 | -84.32 | -84.09 | ▁▆▇▁▁ |
skim(census)
## Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No
## user-defined `sfl` provided. Falling back to `character`.
Name | census |
Number of rows | 530 |
Number of columns | 29 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 26 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
GEOID | 0 | 1 | 11 | 11 | 0 | 530 | 0 |
NAME | 0 | 1 | 38 | 43 | 0 | 530 | 0 |
geometry | 0 | 1 | 169 | 3727 | 0 | 530 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
hhincomeE | 11 | 0.98 | 94833.94 | 52684.71 | 15625.0 | 56193.00 | 81096.00 | 119728.50 | 250001.0 | ▆▇▃▂▁ |
hhincomeM | 21 | 0.96 | 25574.79 | 18745.90 | 149.0 | 13117.00 | 21335.00 | 32504.00 | 162485.0 | ▇▂▁▁▁ |
popE | 0 | 1.00 | 3439.91 | 1307.62 | 0.0 | 2515.50 | 3319.00 | 4210.00 | 7857.0 | ▁▇▇▃▁ |
popM | 0 | 1.00 | 672.81 | 409.22 | 14.0 | 419.00 | 601.00 | 802.25 | 3256.0 | ▇▅▁▁▁ |
race.whiteE | 0 | 1.00 | 1279.43 | 1160.18 | 0.0 | 248.75 | 1008.50 | 2031.50 | 5156.0 | ▇▃▂▁▁ |
race.whiteM | 0 | 1.00 | 322.13 | 252.86 | 2.0 | 141.25 | 280.50 | 440.75 | 2429.0 | ▇▂▁▁▁ |
race.blackE | 0 | 1.00 | 1627.35 | 1554.90 | 0.0 | 383.75 | 1052.50 | 2666.50 | 7165.0 | ▇▂▂▁▁ |
race.blackM | 0 | 1.00 | 503.10 | 430.45 | 14.0 | 206.50 | 398.50 | 679.25 | 3266.0 | ▇▂▁▁▁ |
race.indianE | 0 | 1.00 | 14.29 | 52.81 | 0.0 | 0.00 | 0.00 | 7.00 | 658.0 | ▇▁▁▁▁ |
race.indianM | 0 | 1.00 | 26.90 | 46.33 | 1.0 | 14.00 | 14.00 | 20.00 | 535.0 | ▇▁▁▁▁ |
race.asianE | 0 | 1.00 | 241.52 | 385.04 | 0.0 | 4.00 | 96.00 | 272.75 | 2601.0 | ▇▁▁▁▁ |
race.asianM | 0 | 1.00 | 137.27 | 153.44 | 2.0 | 20.00 | 88.50 | 192.75 | 918.0 | ▇▂▁▁▁ |
ageE | 2 | 1.00 | 37.54 | 7.29 | 17.6 | 32.80 | 36.95 | 42.00 | 68.9 | ▁▇▆▁▁ |
ageM | 2 | 1.00 | 5.37 | 3.41 | 0.3 | 3.10 | 4.60 | 6.80 | 18.9 | ▇▇▂▁▁ |
sex.maleE | 0 | 1.00 | 1660.42 | 659.75 | 0.0 | 1196.75 | 1583.50 | 2064.00 | 4158.0 | ▂▇▆▂▁ |
sex.maleM | 0 | 1.00 | 408.83 | 229.68 | 14.0 | 263.25 | 368.50 | 483.75 | 2230.0 | ▇▃▁▁▁ |
sex.femaleE | 0 | 1.00 | 1779.50 | 735.23 | 0.0 | 1268.25 | 1697.50 | 2207.50 | 4423.0 | ▂▇▆▂▁ |
sex.femaleM | 0 | 1.00 | 419.86 | 249.65 | 14.0 | 268.50 | 367.00 | 499.00 | 2040.0 | ▇▅▁▁▁ |
edu.less_than_highschoolE | 0 | 1.00 | 175.78 | 182.22 | 0.0 | 43.00 | 121.00 | 254.00 | 1086.0 | ▇▂▁▁▁ |
edu.less_than_highschoolM | 0 | 1.00 | 116.25 | 99.28 | 1.0 | 46.00 | 94.00 | 162.75 | 784.0 | ▇▂▁▁▁ |
edu.high_schoolE | 0 | 1.00 | 421.71 | 341.77 | 0.0 | 144.25 | 318.00 | 635.50 | 1571.0 | ▇▃▂▁▁ |
edu.high_schoolM | 0 | 1.00 | 201.48 | 146.26 | 14.0 | 94.00 | 165.50 | 272.00 | 1060.0 | ▇▃▁▁▁ |
edu.bachelorE | 0 | 1.00 | 715.72 | 420.80 | 0.0 | 389.00 | 673.00 | 987.50 | 2094.0 | ▆▇▆▂▁ |
edu.bachelorM | 0 | 1.00 | 241.40 | 136.96 | 14.0 | 152.00 | 218.00 | 291.50 | 1176.0 | ▇▅▁▁▁ |
edu.higher_bachelorE | 0 | 1.00 | 530.86 | 386.35 | 0.0 | 214.00 | 457.00 | 767.00 | 1812.0 | ▇▆▃▂▁ |
edu.higher_bachelorM | 0 | 1.00 | 191.58 | 112.46 | 12.0 | 114.00 | 172.00 | 244.00 | 814.0 | ▇▆▁▁▁ |
From data skimming, there are some missing data, lets remove them since there are only small portion of our data.
print(paste0("Before dropping NA: ", nrow(census)))
## [1] "Before dropping NA: 530"
census <- census %>% drop_na()
print(paste0("After dropping NA: ", nrow(census)))
## [1] "After dropping NA: 509"
2. Visualize Hospital Distribution on Map
tmap_mode("view");
## tmap mode set to interactive viewing
tm_shape(census) +
tm_borders(col = "grey") +
tm_shape(yelp_hospital) +
tm_dots(col = "orange", size = 0.1, id="name") +
tm_layout(title = "Hospitals in Fulton and DeKalb Counties")
By looking at the map, we can see clusters of hospitals at the center of Fulton and DeKalb counties and some places spread outside the center.
3. Calculate Potential Meaningful Measures
# Population Density
# Proportion of Population by variables
census <- census %>%
mutate(pop_density = popE / st_area(geometry),
pct_white_pop = race.whiteE / popE,
pct_black_pop = race.blackE / popE,
pct_indian_pop = race.indianE / popE,
pct_asian_pop = race.asianE / popE,
pct_male = sex.maleE / popE,
pct_female = sex.femaleE / popE,
pct_edu_less_than_hs = edu.less_than_highschoolE / popE,
pct_edu_hs = edu.high_schoolE / popE,
pct_edu_ba = edu.bachelorE / popE,
pct_edu_hh = edu.higher_bachelorE / popE
)
Visualize Population Density vs Hospital POIs
tm_shape(census) +
tm_polygons("pop_density", style = "quantile", palette = "Blues",
title = "Population Density") +
tm_shape(yelp_hospital) +
tm_dots(col = "orange", size = 0.1) +
tm_layout(title = "Population Density and Hospitals")
We can see that most of hospitals are located in dense population areas.
4. Analysis on Count of Hospital per Census Tract
# Count Hospital by Census Tract
census_var_new = c(
'pop_density',
'pct_white_pop',
'pct_black_pop',
'pct_indian_pop',
'pct_asian_pop',
'pct_male',
'pct_female',
'pct_edu_less_than_hs',
'pct_edu_hs',
'pct_edu_ba',
'pct_edu_hh',
'ageE'
)
# Convert CRS to be the same as census data
yelp_hospital <- st_transform(yelp_hospital, st_crs(census))
# Ensure that the census tracts cover the point locations of the hospitals
hospitals_in_tract <- st_join(yelp_hospital, census, join = st_within)
# Count hospitals per tract based on the spatial join
hospital_count_per_tract <- hospitals_in_tract %>%
group_by(GEOID) %>%
summarise(hospital_count = n()) %>%
st_drop_geometry()
# Merge hospital counts with the original census data
census_hospital <- census %>%
left_join(hospital_count_per_tract, by = "GEOID")
# Replace NA with 0
census_hospital$hospital_count[is.na(census_hospital$hospital_count)] <- 0
# Create count group
census_hospital <- census_hospital %>%
mutate(
count_group = case_when(
hospital_count == 0 ~ "1) No hospital",
hospital_count <= 5 ~ "2) 1 - 5 hospitals",
hospital_count <= 10 ~ "3) 6 - 10 hospitals",
hospital_count <= 20 ~ "4) 11 - 20 hospitals",
hospital_count > 20 ~ "5) more than 20 hospitals"
)
)
Visualize Count of Hospitals by Census Tract
tm_shape(census_hospital) +
tm_polygons("count_group", palette = "Greens",
title = "Hospital Count per Census Tract") +
tm_layout(title = "Hospital Count per Census Tract")
From the map we can see that most of the census tracts don’t have hospitals within. Let see correlation between counts of hospitals and census variables
census_cnt_corr <- census_hospital %>%
select(
hospital_count,
pop_density,
pct_white_pop,
pct_black_pop,
pct_indian_pop,
pct_asian_pop,
pct_male,
pct_female,
pct_edu_less_than_hs,
pct_edu_hs,
pct_edu_ba,
pct_edu_hh,
ageE,
hhincomeE
) %>% st_drop_geometry()
# Create a correlation matrix
correlation_matrix <- cor(census_cnt_corr)
print(correlation_matrix)
## hospital_count pop_density pct_white_pop pct_black_pop
## hospital_count 1.000000000 -0.01517637 0.08690291 -0.08157072
## pop_density -0.015176374 1.00000000 0.11716126 -0.15914544
## pct_white_pop 0.086902912 0.11716126 1.00000000 -0.92540128
## pct_black_pop -0.081570716 -0.15914544 -0.92540128 1.00000000
## pct_indian_pop -0.014409910 0.04910967 0.04203995 -0.15706092
## pct_asian_pop 0.033687199 0.11371898 0.21186665 -0.47733750
## pct_male 0.026848667 0.17320322 0.18404511 -0.25342064
## pct_female -0.026848667 -0.17320322 -0.18404511 0.25342064
## pct_edu_less_than_hs -0.055405366 -0.06028347 -0.44634269 0.32938105
## pct_edu_hs -0.055914499 -0.21209119 -0.71153261 0.73416799
## pct_edu_ba 0.062953096 0.07161504 0.74232583 -0.68043879
## pct_edu_hh 0.031349363 0.11890940 0.72732894 -0.69317017
## ageE -0.014432403 -0.33492742 0.29014762 -0.19874556
## hhincomeE -0.009453004 -0.09521656 0.71356444 -0.66714310
## pct_indian_pop pct_asian_pop pct_male pct_female
## hospital_count -0.01440991 0.03368720 0.026848667 -0.026848667
## pop_density 0.04910967 0.11371898 0.173203224 -0.173203224
## pct_white_pop 0.04203995 0.21186665 0.184045111 -0.184045111
## pct_black_pop -0.15706092 -0.47733750 -0.253420639 0.253420639
## pct_indian_pop 1.00000000 0.01132637 0.075523654 -0.075523654
## pct_asian_pop 0.01132637 1.00000000 0.211386743 -0.211386743
## pct_male 0.07552365 0.21138674 1.000000000 -1.000000000
## pct_female -0.07552365 -0.21138674 -1.000000000 1.000000000
## pct_edu_less_than_hs 0.28427544 -0.13450174 0.001241394 -0.001241394
## pct_edu_hs -0.01112542 -0.39493137 -0.180774036 0.180774036
## pct_edu_ba -0.08589402 0.24474620 0.131508698 -0.131508698
## pct_edu_hh -0.05607817 0.32785808 0.147637292 -0.147637292
## ageE -0.10860071 -0.01488170 -0.003952685 0.003952685
## hhincomeE -0.06055940 0.25895970 0.134381219 -0.134381219
## pct_edu_less_than_hs pct_edu_hs pct_edu_ba pct_edu_hh
## hospital_count -0.055405366 -0.055914499 0.06295310 0.03134936
## pop_density -0.060283474 -0.212091191 0.07161504 0.11890940
## pct_white_pop -0.446342692 -0.711532613 0.74232583 0.72732894
## pct_black_pop 0.329381048 0.734167991 -0.68043879 -0.69317017
## pct_indian_pop 0.284275442 -0.011125424 -0.08589402 -0.05607817
## pct_asian_pop -0.134501745 -0.394931365 0.24474620 0.32785808
## pct_male 0.001241394 -0.180774036 0.13150870 0.14763729
## pct_female -0.001241394 0.180774036 -0.13150870 -0.14763729
## pct_edu_less_than_hs 1.000000000 0.455303677 -0.56762486 -0.55234627
## pct_edu_hs 0.455303677 1.000000000 -0.66574144 -0.69979673
## pct_edu_ba -0.567624858 -0.665741440 1.00000000 0.68358518
## pct_edu_hh -0.552346272 -0.699796728 0.68358518 1.00000000
## ageE -0.172619036 -0.007637321 0.28885331 0.32479420
## hhincomeE -0.472201217 -0.634200847 0.59676166 0.61659244
## ageE hhincomeE
## hospital_count -0.014432403 -0.009453004
## pop_density -0.334927416 -0.095216561
## pct_white_pop 0.290147617 0.713564445
## pct_black_pop -0.198745562 -0.667143099
## pct_indian_pop -0.108600714 -0.060559401
## pct_asian_pop -0.014881702 0.258959695
## pct_male -0.003952685 0.134381219
## pct_female 0.003952685 -0.134381219
## pct_edu_less_than_hs -0.172619036 -0.472201217
## pct_edu_hs -0.007637321 -0.634200847
## pct_edu_ba 0.288853311 0.596761659
## pct_edu_hh 0.324794202 0.616592444
## ageE 1.000000000 0.400222739
## hhincomeE 0.400222739 1.000000000
# Melt the correlation matrix into long format for plotting heatmap
melted_corr_matrix <- melt(correlation_matrix)
# Plot the heatmap
ggplot(data = melted_corr_matrix, aes(Var1, Var2, fill = value)) +
geom_tile(color = "white") +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), space = "Lab",
name="Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 8, hjust = 1)) +
coord_fixed() +
labs(title = "Correlation Heatmap - Count", x = "", y = "")
From heatmap of the correlation, there is no strong correlation between
number of hospital and other variables.
We should select a more suitable measure for determining the distribution of hospital per tract.
6. Analysis on Distance to the Nearest Hospital per Tract
Calculate distance to the nearest hospital per census tract
# Convert geometries to centroids for census tracts
census_centroids <- st_centroid(census) %>%
st_transform(crs = 32616) # Convert CRS to 32616 for measuring distance in meters
## Warning: st_centroid assumes attributes are constant over geometries
# Calculate distance to nearest hospital
distances <- st_distance(census_centroids,
yelp_hospital %>% st_transform(crs = 32616))
# Get minimum distance to nearest hospital for each tract
census_hospital$min_distance_hospital <- apply(distances, 1, min)
Visualize nearest distance of tract’s centroids to the nearest hospital
tm_shape(census_hospital) +
tm_polygons("min_distance_hospital", style = "quantile", palette = "Blues",
title = "Distance to Nearest Hospital") +
tm_shape(yelp_hospital) +
tm_dots(col = "orange", size = 0.1) +
tm_layout(title = "Distance to Nearest Hospital")
Look at the distribution of the distance
ggplot(census_hospital) +
geom_histogram(aes(x = min_distance_hospital), bins = 20) +
labs(title = "Histogram Distance to Nearest Hospital", x = "Distance (m)", y = "Count") +
theme_minimal()
Most of the tracts have less than 6 miles (10km) nearest distance to
hospital while there are some tract that have distance higher and is up
to 15.53 miles (25km).
Take a look at the correlations between census variables and distance to hospital.
census_dist_corr <- census_hospital %>%
select(
min_distance_hospital,
pop_density,
pct_white_pop,
pct_black_pop,
pct_indian_pop,
pct_asian_pop,
pct_male,
pct_female,
pct_edu_less_than_hs,
pct_edu_hs,
pct_edu_ba,
pct_edu_hh,
ageE,
hhincomeE
) %>% st_drop_geometry()
# Create a correlation matrix
correlation_matrix_dist <- cor(census_dist_corr)
print(correlation_matrix_dist)
## min_distance_hospital pop_density pct_white_pop
## min_distance_hospital 1.00000000 -0.30439882 -0.34641883
## pop_density -0.30439882 1.00000000 0.11716126
## pct_white_pop -0.34641883 0.11716126 1.00000000
## pct_black_pop 0.38426955 -0.15914544 -0.92540128
## pct_indian_pop -0.11567113 0.04910967 0.04203995
## pct_asian_pop -0.20624509 0.11371898 0.21186665
## pct_male -0.10043312 0.17320322 0.18404511
## pct_female 0.10043312 -0.17320322 -0.18404511
## pct_edu_less_than_hs 0.04567122 -0.06028347 -0.44634269
## pct_edu_hs 0.22501376 -0.21209119 -0.71153261
## pct_edu_ba -0.23643005 0.07161504 0.74232583
## pct_edu_hh -0.27934219 0.11890940 0.72732894
## ageE 0.08340910 -0.33492742 0.29014762
## hhincomeE -0.05594126 -0.09521656 0.71356444
## pct_black_pop pct_indian_pop pct_asian_pop pct_male
## min_distance_hospital 0.3842695 -0.11567113 -0.20624509 -0.100433120
## pop_density -0.1591454 0.04910967 0.11371898 0.173203224
## pct_white_pop -0.9254013 0.04203995 0.21186665 0.184045111
## pct_black_pop 1.0000000 -0.15706092 -0.47733750 -0.253420639
## pct_indian_pop -0.1570609 1.00000000 0.01132637 0.075523654
## pct_asian_pop -0.4773375 0.01132637 1.00000000 0.211386743
## pct_male -0.2534206 0.07552365 0.21138674 1.000000000
## pct_female 0.2534206 -0.07552365 -0.21138674 -1.000000000
## pct_edu_less_than_hs 0.3293810 0.28427544 -0.13450174 0.001241394
## pct_edu_hs 0.7341680 -0.01112542 -0.39493137 -0.180774036
## pct_edu_ba -0.6804388 -0.08589402 0.24474620 0.131508698
## pct_edu_hh -0.6931702 -0.05607817 0.32785808 0.147637292
## ageE -0.1987456 -0.10860071 -0.01488170 -0.003952685
## hhincomeE -0.6671431 -0.06055940 0.25895970 0.134381219
## pct_female pct_edu_less_than_hs pct_edu_hs
## min_distance_hospital 0.100433120 0.045671223 0.225013757
## pop_density -0.173203224 -0.060283474 -0.212091191
## pct_white_pop -0.184045111 -0.446342692 -0.711532613
## pct_black_pop 0.253420639 0.329381048 0.734167991
## pct_indian_pop -0.075523654 0.284275442 -0.011125424
## pct_asian_pop -0.211386743 -0.134501745 -0.394931365
## pct_male -1.000000000 0.001241394 -0.180774036
## pct_female 1.000000000 -0.001241394 0.180774036
## pct_edu_less_than_hs -0.001241394 1.000000000 0.455303677
## pct_edu_hs 0.180774036 0.455303677 1.000000000
## pct_edu_ba -0.131508698 -0.567624858 -0.665741440
## pct_edu_hh -0.147637292 -0.552346272 -0.699796728
## ageE 0.003952685 -0.172619036 -0.007637321
## hhincomeE -0.134381219 -0.472201217 -0.634200847
## pct_edu_ba pct_edu_hh ageE hhincomeE
## min_distance_hospital -0.23643005 -0.27934219 0.083409101 -0.05594126
## pop_density 0.07161504 0.11890940 -0.334927416 -0.09521656
## pct_white_pop 0.74232583 0.72732894 0.290147617 0.71356444
## pct_black_pop -0.68043879 -0.69317017 -0.198745562 -0.66714310
## pct_indian_pop -0.08589402 -0.05607817 -0.108600714 -0.06055940
## pct_asian_pop 0.24474620 0.32785808 -0.014881702 0.25895970
## pct_male 0.13150870 0.14763729 -0.003952685 0.13438122
## pct_female -0.13150870 -0.14763729 0.003952685 -0.13438122
## pct_edu_less_than_hs -0.56762486 -0.55234627 -0.172619036 -0.47220122
## pct_edu_hs -0.66574144 -0.69979673 -0.007637321 -0.63420085
## pct_edu_ba 1.00000000 0.68358518 0.288853311 0.59676166
## pct_edu_hh 0.68358518 1.00000000 0.324794202 0.61659244
## ageE 0.28885331 0.32479420 1.000000000 0.40022274
## hhincomeE 0.59676166 0.61659244 0.400222739 1.00000000
# Melt the correlation matrix into long format for ploting heatmap
melted_corr_matrix_dist <- melt(correlation_matrix_dist)
# Plot the heatmap
ggplot(data = melted_corr_matrix_dist, aes(Var1, Var2, fill = value)) +
geom_tile(color = "white") +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1, 1), space = "Lab",
name="Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 8, hjust = 1)) +
coord_fixed() +
labs(title = "Correlation Heatmap - Distance", x = "", y = "")
From the Correlation Matrix, the blue cells indicate negative
correlation (more values -> short distance) and the red cells show
positive correlation (more value -> far away from hospitals). The
variables with negative correlations are population density, percentage
of white/indian/asian population, percentage of population with bachelor
degree and higher, and percentage of male population.
ggplot(census_hospital, aes(x = hhincomeE, y = min_distance_hospital)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", col = "red") +
labs(title = "Income vs Distance to Nearest Hospital",
x = "Household Income",
y = "Distance to Hospital")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(census_hospital, aes(x = pct_black_pop, y = min_distance_hospital)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", col = "blue") +
labs(title = "Pct. of Black Population vs Distance to Nearest Hospital",
x = "% Black Population",
y = "Distance to Hospital")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(census_hospital, aes(x = pct_female, y = min_distance_hospital)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", col = "pink") +
labs(title = "Pct. of Female Population vs Distance to Nearest Hospital",
x = "% Female Population",
y = "Distance to Hospital")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(census_hospital, aes(x = pct_edu_hs + pct_edu_less_than_hs, y = min_distance_hospital)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", col = "Green") +
labs(title = "Pct. of High School and Less than High School Eduation Attainment \n vs Distance to Nearest Hospital",
x = "% Population with High School Attainment or Lower",
y = "Distance to Hospital")
## `geom_smooth()` using formula = 'y ~ x'
By looking at the correlation and the scatter plots with linear trends,
we can see some trends on % black population and % population with high
school or less than high school education attainment that increases
along with the distance to hospital. This could indicate inequity of
spatial distribution of hospitals.
6. Conclusion
In conclusion, when looking at the number of hospitals per census tract we cannot see any pattern or correlation because a small numbers of hospitals which leads to small dataset (most of the value is NA). While looking at the distance to the nearest hospital, we see some correlations between the distance and census variables that suggest the inequity of spatial distribution of hospitals in Fulton and DeKalb counties. Two indicators from this analysis are proportion of black population and population with high school or less than high school education attainment which can be seen in both correlation matrix heatmap and scatter plots with linear trends.