Setup
library(knitr)
library(rmdformats)
library(broom)
## Global options
opts_chunk$set(comment=NA)
opts_knit$set(width=75)
Package Loading
library(Rcpp)
library(tmap)
library(tmaptools)
library(sf)
library(biscale)
library(insurancerating)
library(stringr)
library(cowplot)
library(cartogram)
library(Hmisc)
library(magrittr)
library(janitor)
# add other packages here as desired
library(tibble)
library(tidyverse)
Import the oh_counties_2020 Data and apply clean_names()
Here, I imported the data from the csv file, used clean_names() from the janitor package to remove “-” from the variable names , replaced them with “_” and ensured all of the names are in snake case, which is the default option of the clean_names function. I also set fips as a character data type because it an id string and not a numeric value. Finally, I selected the variables county,food_insecure, african_am, population and displayed the tibble, which has 88 observations and 4 variable.
oh20 <- read_csv("/Users/ddmsel/Desktop/PQHS432/data/oh_counties_2020.csv")
oh20 <- oh20 %>%
mutate(fips = as.character(fips))%>%
clean_names()%>%
arrange(county)%>%
select(county,food_insecure, african_am,population)
oh20
1 Map
1.1 Import and Filter County Map Data (.shp) from census.gov
Below, is the code used to make a county level map comparing race and income in Ohio. First, I imported the lines or county borders for the entire US in 2020 from the .shp file I downloaded from census.gov.[1] I then filtered the dataframe by the STATEFP variable == to “39” to get only the county borders for Ohio. Finally, I applied the clean names function to the dateframe to convert all of the variable names to snake_case.
oh_county_shp <- st_read(dsn ="/Users/ddmsel/Desktop/PQHS432/data/cb_2020_us_county_500k.shp")
Reading layer `cb_2020_us_county_500k' from data source
`/Users/ddmsel/Desktop/PQHS432/data/cb_2020_us_county_500k.shp'
using driver `ESRI Shapefile'
Simple feature collection with 3234 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
Geodetic CRS: NAD83
oh_county_shp <- oh_county_shp %>%
filter(., STATEFP == "39") %>%
clean_names()%>%
arrange(name)%>%
select(name,geometry)
Below is the dataframe with 88 rows corresponing to the 88 counties in Ohio and 18 variables including the geometry variable, which shows the boundaries for the counties.
oh_county_shp
Here, I did a sanity check to see if the counties were in the same order, had the same amount of elements and that the elements in each dataframe were identical.
str(oh_county_shp$name)
chr [1:88] "Adams" "Allen" "Ashland" "Ashtabula" "Athens" "Auglaize" ...
str(oh20$county)
chr [1:88] "Adams" "Allen" "Ashland" "Ashtabula" "Athens" "Auglaize" ...
identical(oh_county_shp$name,oh20$county)
[1] TRUE
1.2 Join oh_county_shp and oh20
Next, I joined the oh20, County Health Data, to the oh_county_shp using the dplyr package lefft_join. Note in this case left_join was needed so that the resulting mergeded dataframe was not a tibble becuase tibbles do not support the geometry column needed for mapping.
oh20 <- oh_county_shp %>%
left_join(oh20, by = c("name"="county"))
oh20
Using the describe fuction of the Hmisc package, I generated some summary statistics of the food_insecure population and the african-am varaible. Thre is no missing data for any variable and the rages of values seem reasonable for each percentage.
describe(oh20$food_insecure)
oh20$food_insecure
n missing distinct Info Mean Gmd .05 .10
88 0 55 0.999 13.46 2.722 9.60 10.40
.25 .50 .75 .90 .95
11.80 13.50 15.03 16.43 17.20
lowest : 8.1 8.9 9.2 9.6 9.7, highest: 17.2 17.4 17.5 18.4 19.3
describe(oh20$african_am)
oh20$african_am
n missing distinct Info Mean Gmd .05 .10
88 0 83 1 4.254 5.078 0.4775 0.6040
.25 .50 .75 .90 .95
0.9075 2.1200 4.4150 8.9120 18.0075
lowest : 0.34 0.37 0.38 0.44 0.46, highest: 19.60 21.09 22.90 26.21 29.61
describe(oh20$population)
oh20$population
n missing distinct Info Mean Gmd .05 .10
88 0 88 1 132835 160359 16429 27084
.25 .50 .75 .90 .95
36760 58295 123498 255359 496480
lowest : 13139 13790 14354 14604 15174
highest: 532331 541918 816684 1243857 1310300
Below, I use the bi_class fuction of the biscale package to convert the percentiles of food_insecure and african_am to a biscale ranging from 1-1 indicating a low percentage of African Americans (Black) and a low percentage of food insecurity and 3-3 indicating a high percentage of food insecurity and a high level of African Americans. Breaks for the classification can be specificied with the style parameter as “quintile” or “fisher”. To better approximate the natural clustering in race and food insecurity I used fisher breaks.
data <- bi_class(oh20, x = african_am, y = food_insecure, style = "fisher", dim = 3)
data <- st_transform(data, 2163)
data%>% arrange(desc(bi_class))
The frequncy of each biclass is listed below.
describe(data$bi_class)
data$bi_class
n missing distinct
88 0 7
lowest : 1-1 1-2 1-3 2-1 2-2, highest: 1-3 2-1 2-2 2-3 3-3
Value 1-1 1-2 1-3 2-1 2-2 2-3 3-3
Frequency 32 23 12 2 11 3 5
Proportion 0.364 0.261 0.136 0.023 0.125 0.034 0.057
The levels of the breaks for food_insecure are [8.1,12.7] (12.7,15.3] (15.3,19.3] and are [0.34,4.91] (4.91,17.3] (17.3,29.6] for african_am.
fisher(oh20$food_insecure, n = 3, diglab = 3)[[1]]
[1] (15.3,19.3]
Levels: [8.1,12.7] (12.7,15.3] (15.3,19.3]
fisher(oh20$african_am, n = 3, diglab = 3)[[1]]
[1] [0.34,4.91]
Levels: [0.34,4.91] (4.91,17.3] (17.3,29.6]
1.3 Map of Race and Food Insecurity in 2020 by County
Below is the plot of the bivarite legend and the map using the the geom_sf package of ggplot.
legend <- bi_legend(pal = "DkBlue",
dim = 3,
xlab = "% Black",
ylab = "% Food Insecure",
size = 7)
# create map
map <- ggplot() +
geom_sf(data = data, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
bi_scale_fill(pal = "DkBlue", dim = 3) +
bi_theme()
To draw the “final plot” the cowplot package was used to combine the plot and the legend and the the first final plot was displayed below. While this map did show the relationship among race, food insecurity and the location of the county, population of the county also seemed to be an important variable and was not conveyed by this plot.
# combine map with legend
finalPlot1 <- ggdraw() +
draw_plot(map, x = 0, y = .1, width = .9, height = .9, scale = 1) +
draw_plot(legend, 0.2, 0.1, 1, .2)
finalPlot1
1.4 Map of Race, Food Insecurity and Polulation
The plot above conveyed three variables: percentage of Blacks, percentage of food insecurity and the spatial location of the county. Population seemed to be important to the trend among these variables. To add population to the map I decided to resize the shapes of the county bondaries based on population using the cartogram pacakge. Below is the code to transform the map polygons from a geographic to a projected coordinate system using st_transform(), which is needed for the cartogram. Then I reassigned the bi_class to the new dataframe as I did above. Finally, I used the cartogram_ncont() package to resize the county polygons proportionally based on the “population” of the county.
oh_map = st_transform(oh20, 2163)
oh_map <- bi_class(oh_map, x = african_am, y = food_insecure, style = "fisher", dim = 3)
oh_map_cart = cartogram_ncont(oh_map, "population", k = 1, inplace = TRUE)
The plot of the new map is created using ggplot in a method similar to above.
map2 <- ggplot() +
geom_sf(data = oh_map_cart, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
bi_scale_fill(pal = "DkBlue", dim = 3) +
bi_theme()
# combine map with legend
finalPlot2 <- ggdraw() +
draw_plot(map2, x = 0, y = .1, width = .9, height = .9, scale = 1) +
draw_plot(legend, 0.2, 0.1, 1, 0.2)
finalPlot2
1.5 Figure1 Final Combined and Labeled Map
The code below usese the cowplot() package to combine the plots, legend and add a title to the plot. The final combined plot is displayed below and labled Figure1.
plot_grid<- plot_grid(map, map2, legend, ncol = 3, scale = 1, labels = c('A', 'B', ''), rel_widths= c(1, 1, .5))
# now add the title
title <- ggdraw() +
draw_label(
"Figure 1: Spatial Relationship among Race, Food Insecurity and Population Level\nfrom Ohio County Health Ranking Data 2020 ",
fontface = 'bold',
x = 0,vjust=.5,
hjust = 0,
size = 13
) +
theme(
# add margin on the left of the drawing canvas,
# so title is aligned with left edge of first plot
plot.margin = margin(0, 0, 0, 7)
)
caption <- ggdraw() +
draw_label(str_wrap(
"Percentage Black and percentage food insecure are shown as a biscale. Blue indicates high Black % and high % of food insecurity. Population of the county is shown by the relative size of the county outline in B. High percentages of food insecurity and high percentages Black residents are in populous counties. Additionally, there is a cluster of largely white counties with high food insecurity South-East Ohio along the West Virginia border. (Breaks for % Black are 4.9% and 17.3% and breaks for % food insecure are 12.7% and 15.3%)",width = 140),
x = 0,
hjust = 0,
size = 8
)
plot_grid(
title, plot_grid,caption,
ncol = 1,
# rel_heights values control vertical title margins
rel_heights = c(0.3, 1)
)
Figure 1: Percentage Black and percentage food insecure are shown as a biscale. Blue indicates high Black % and high % of food insecurity. Population of the county is shown by the relative size of the county outline in B. High percentages of food insecurity and high percentages Black residents are in populous counties. Additionally, there is a cluster of largely white counties with high food insecurity South-East Ohio along the West Virginia border. (Fisher Breaks for % Black are 4.9% and 17.3% and breaks for % food insecure are 12.7% and 15.3%)
#References
[1] https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html
Session Information
xfun::session_info()
R version 4.1.3 (2022-03-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.2.1
Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
Package version:
abind_1.4-5 ape_5.6.2 arm_1.12-2
askpass_1.1 assertthat_0.2.1 backports_1.4.1
base64enc_0.1-3 biscale_0.2.0 bit_4.0.4
bit64_4.0.5 blob_1.2.2 bookdown_0.24
boot_1.3-28 brio_1.1.3 broom_0.7.12
bslib_0.3.1 cachem_1.0.6 callr_3.7.0
cartogram_0.2.2 cellranger_1.1.0 checkmate_2.0.0
ciTools_0.6.1 class_7.3-20 classInt_0.4-3
cli_3.2.0 clipr_0.8.0 cluster_2.1.2
coda_0.19-4 codetools_0.2-18 colorspace_2.0-3
commonmark_1.8.0 compiler_4.1.3 cowplot_1.1.1
cpp11_0.4.2 crayon_1.5.0 crosstalk_1.2.0
curl_4.3.2 data.table_1.14.2 DBI_1.1.2
dbplyr_2.1.1 desc_1.4.1 DHARMa_0.4.5
dichromat_2.0-0 diffobj_0.3.5 digest_0.6.29
doParallel_1.0.17 dplyr_1.0.8 dtplyr_1.2.1
e1071_1.7-9 ellipsis_0.3.2 evaluate_0.15
evtree_1.0-8 fansi_1.0.2 farver_2.1.0
fastmap_1.1.0 fitdistrplus_1.1-8 fontawesome_0.2.2
forcats_0.5.1 foreach_1.5.2 foreign_0.8-82
Formula_1.2-4 fs_1.5.2 gap_1.2.3.1
gargle_1.2.0 generics_0.1.2 geojsonsf_2.0.2
geometries_0.2.0 ggplot2_3.3.5 glue_1.6.2
googledrive_2.0.0 googlesheets4_1.0.0 graphics_4.1.3
grDevices_4.1.3 grid_4.1.3 gridExtra_2.3
gtable_0.3.0 haven_2.4.3 highr_0.9
Hmisc_4.6-0 hms_1.1.1 htmlTable_2.4.0
htmltools_0.5.2 htmlwidgets_1.5.4 httpuv_1.6.5
httr_1.4.2 ids_1.0.1 insight_0.16.0
insurancerating_0.6.9 inum_1.0-4 isoband_0.2.5
iterators_1.0.14 janitor_2.1.0 jpeg_0.1-9
jquerylib_0.1.4 jsonify_1.2.1 jsonlite_1.8.0
KernSmooth_2.23-20 knitr_1.37 labeling_0.4.2
later_1.3.0 lattice_0.20-45 latticeExtra_0.6-29
lazyeval_0.2.2 leafem_0.1.6 leaflet_2.1.0
leaflet.providers_1.9.0 leafsync_0.1.0 libcoin_1.0-9
lifecycle_1.0.1 lme4_1.1-28 lmtest_0.9.39
lubridate_1.8.0 lwgeom_0.2-8 magrittr_2.0.2
markdown_1.1 MASS_7.3-55 Matrix_1.4-0
methods_4.1.3 mgcv_1.8-39 mime_0.12
minqa_1.2.4 modelr_0.1.8 munsell_0.5.0
mvtnorm_1.1-3 nlme_3.1-155 nloptr_2.0.0
nnet_7.3-17 openssl_2.0.0 packcircles_0.3.4
parallel_4.1.3 partykit_1.2-15 patchwork_1.1.1
pillar_1.7.0 pkgconfig_2.0.3 pkgload_1.2.4
plyr_1.8.6 png_0.1-7 praise_1.0.0
prettyunits_1.1.1 processx_3.5.2 progress_1.2.2
promises_1.2.0.1 proxy_0.4-26 ps_1.6.0
purrr_0.3.4 qgam_1.3.4 R6_2.5.1
rapidjsonr_1.2.0 rappdirs_0.3.3 raster_3.5-15
RColorBrewer_1.1-2 Rcpp_1.0.8 RcppEigen_0.3.3.9.1
readr_2.1.2 readxl_1.3.1 rematch_1.0.1
rematch2_2.1.2 reprex_2.0.1 rlang_1.0.2
rmarkdown_2.13 rmdformats_1.0.3 rpart_4.1.16
rprojroot_2.0.2 rstudioapi_0.13 rvest_1.0.2
s2_1.0.7 sass_0.4.0 scales_1.1.1
selectr_0.4.2 sf_1.0-7 sfheaders_0.4.0
shiny_1.7.1 snakecase_0.11.0 sourcetools_0.1.7
sp_1.4-6 splines_4.1.3 stars_0.5-5
stats_4.1.3 stringi_1.7.6 stringr_1.4.0
survival_3.3-1 sys_3.4 terra_1.5-21
testthat_3.1.2 tibble_3.1.6 tidyr_1.2.0
tidyselect_1.1.2 tidyverse_1.3.1 tinytex_0.37
tmap_3.3-3 tmaptools_3.1-1 tools_4.1.3
tzdb_0.2.0 units_0.8-0 utf8_1.2.2
utils_4.1.3 uuid_1.0.3 vctrs_0.3.8
viridis_0.6.2 viridisLite_0.4.0 vroom_1.5.7
waldo_0.3.1 widgetframe_0.3.1 withr_2.5.0
wk_0.6.0 xfun_0.30 XML_3.99-0.9
xml2_1.3.3 xtable_1.8.4 yaml_2.3.5
zoo_1.8.9