An individual’s socioeconomic status will often indicate their access (or lack there of) to preventative health measures. In this project, the relationship between the amount of mammograms per a given population to the median income of the given population. It is important to understand this relationship in order to promote policy changes in favor of those who lack access to preventative healthcare. Furthermore, this may encourage further research into healthcare access in general to North Carolinians.
#Call on libraries
library(tidyverse)
library(sf)
library(tigris)
library(ggplot2)
library(tmap)
library(spdep)
library(knitr)
library(dplyr)
library(tidyr)### Relative File Path
dat_path <- file.path("../Code", "analytic_data2023_0.csv")
### Read data
dat <- read_csv(dat_path)
### Ensure all data used is in numeric form
dat$`Mammography Screening raw value` <- as.numeric(dat$`Mammography Screening raw value`)
### Rename columns
dat <- dat |>
rename(mamm_raw = `Mammography Screening raw value`,
pop = `Population raw value`,
state = `State Abbreviation`,
county = `Name`)
### Select columns that you need to use
dat<- dat |>
select(mamm_raw,
pop,
state,
county)
### Filter by state
dat <- dat |>
filter(state == "NC")
### Mutate data to create equation for the percentage of the population that has gotten a mammography
dat <- dat |>
mutate(per_pop_mamm = mamm_raw * 100)
### Select spatial data layer
nc_cnty <- read_sf("../Code/nc_county_bound.gpkg")
## Table join spatial layer with main data layer
dat_nc_counties <- left_join(nc_cnty, dat[-1,], by = c("NAMELSAD" = "county"))The main data for this project comes from the US County Health Rankings. The secondary, spatial data layer, comes from the tigris package’s census data for counties. After downloading the Health Rankings data, I decided that I would focus on the mammography screening raw value. To create an equation that exemplified their relationship, I also needed information on population, state, and county. The US County Health Ranking data table is huge so I had to narrow it down to only show data about NC. The final equation for the mammography-screening-rate to population can be easily broken down. It is the mammography screenings rate decimal in the county multiplied by 100. This value is the the percentage of the population which has received a mammography. This data was then joined with the tigris county data.
dat_summary <- dat |> # create data summary
summarize(
mean_per_pop_mamm = mean(per_pop_mamm, na.rm = TRUE), #Calculate mean.
min_per_pop_mamm = min(per_pop_mamm, na.rm = TRUE), #Calculate min.
max_per_pop_mamm = max(per_pop_mamm, na.rm = TRUE), #Calculate max.
na_count = sum(is.na(per_pop_mamm)), #Calculate NAs
total_ob = n() #Calculate the total amount of observations
)
dat |> ## Histogram
ggplot(mapping = aes(x = per_pop_mamm)) + # Ggplot to create histogram
geom_histogram(binwidth = 3.62,
fill = "forestgreen",
color = "black") + ### Customize histogram with green fill and black borders
labs(title = "Percent of Population who Received Mammogram", # Large title
x = "Mammography to Population Ratio",
y = "Frequency") ### Customize title of x and y axisThere are 101 total observations in the data set. 0 are NAs. 41.3267327 is the average value for mammography screening rate. 25 is the minimum value, and 54 is the maximum value. Each value is a percentage of that county’s population.
dat_nc_counties <- st_sf(dat_nc_counties) #Convert data to sf
dat_nc_counties_tmap <- tm_shape(dat_nc_counties) +
tm_polygons("per_pop_mamm", # variable being mapped
palette = "Reds", #color palette
style = "jenks", #Style of the map
title = "Mammography Rates per County", # Title of legend
border.col = "Black", # Polygon line color
border.alpha = 0.25, # Polygon line opacity
legend.hist = TRUE) + # Show legend
tm_layout(frame = FALSE, # No frame around map
main.title = "North Carolina Women's Preventative Health Disparities", # Big title of the map
main.title.size = 1.25, # Font size of title
legend.outside = TRUE) # Show legend### Subset data to ONLY observations with values
### Remove NAs prior to Moran's I and LISA analysis
dat_nc_counties <- dat_nc_counties |>
filter(!is.na(per_pop_mamm))
## Create Queen case
dnc_nb_queen <- poly2nb(dat_nc_counties,
queen = TRUE)
## Print summary to screen
dnc_nb_queen## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 490
## Percentage nonzero weights: 4.9
## Average number of links: 4.9
## Convert object to weight matrix
dnc_wm_queen <- nb2listw(dnc_nb_queen,
style = "B", # For binary
zero.policy = TRUE) # Allow obs with 0
### Moran's I
dat_nc_counties_moran <- moran.test(dat_nc_counties$per_pop_mamm, # Variable being analyzed
listw = dnc_wm_queen, # Sp weights matrix
alternative = "two.sided", # Clustering or Dispersion
randomisation = TRUE, # Compare to randomized values
zero.policy = TRUE) # Allow obs with 0
## Summary
dat_nc_counties_moran ##
## Moran I test under randomisation
##
## data: dat_nc_counties$per_pop_mamm
## weights: dnc_wm_queen
##
## Moran I statistic standard deviate = 4.8024, p-value = 1.568e-06
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.286301377 -0.010101010 0.003809355
### LISA
#
dat_nc_counties_lisa <- localmoran(dat_nc_counties$per_pop_mamm, # The variable which I am analyzing, per_pop_mamm
listw = dnc_wm_queen, # Weights object
alternative = "two.sided", # Clustering or Dispersion
zero.policy = TRUE) |> # no policy
as_tibble() |> # object type
mutate(across(everything(), as.vector)) # Remove uneccesary from localmoran output
# Values required for LISA category
dat_nc_counties_lisa <- dat_nc_counties_lisa |>
mutate(SCVAR = scale(dat_nc_counties$per_pop_mamm) |> as.vector(), # Original data column
LAGVAR = lag.listw(dnc_wm_queen, scale(dat_nc_counties$per_pop_mamm)), # Lag of original data column
LISACAT = case_when(SCVAR >= 0 & LAGVAR >= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 1,
SCVAR <= 0 & LAGVAR <= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 2,
SCVAR >= 0 & LAGVAR <= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 3,
SCVAR <= 0 & LAGVAR >= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 4,
TRUE ~ NA_real_))
# Label based on the values
dat_nc_counties_lisa <- dat_nc_counties_lisa |>
mutate(CATNAME = case_when(LISACAT == 1 ~ "HH",
LISACAT == 2 ~ "LL",
LISACAT == 3 ~ "HL",
LISACAT == 4 ~ "LH",
LISACAT == 5 ~ "Not Significant"))
## Summary of LISA output
lisa_summary <- table(dat_nc_counties_lisa$CATNAME)
## Add LISA category column to the spatial data
dat_nc_counties <- dat_nc_counties |>
mutate(LISACAT = dat_nc_counties_lisa$LISACAT,
CATNAME = dat_nc_counties_lisa$CATNAME)
## LISA kable table
kable(
as.data.frame(lisa_summary),
caption = "Summary of LISA Analysis Results",
col.names = c("Category", "Frequency"),
format = "html",
table.attr = 'class="table table-striped" style="font-size: 14px;"',
align = c("l", "r")
)| Category | Frequency |
|---|---|
| HH | 6 |
| LL | 5 |
### MAPS
# Second, the LISA map
lisa_tmap <- tm_shape(dat_nc_counties) +
tm_polygons("LISACAT",
title = "LISA Category", ## Title of the key
breaks = c(1, 2, 3, 4, 5, 6),
palette = c("red", ## Color scheme for the legend
"blue",
"lightpink",
"skyblue",
"grey90"),
colorNA = "white",
labels = c("High-High", ## Labels for the legend
"Low-Low",
"High-Low",
"Low-High",
"Not significant"),
border.col = "black", ## Color of the polygon lines
border.alpha = 0.25) + ## Transparency for polygon lines
tm_layout(frame = FALSE, ## No frame
legend.outside = TRUE) ## Legend on the outside of the map
### Map them together
tmap_arrange(dat_nc_counties_tmap,
lisa_tmap,
ncol = 1,
nrow = 2,
sync = TRUE) Based on the choropleth map, coastal NC has higher mammography rates. Western-most NC and central NC have lower mammography rates. Areas which are more rural have low mammography rates. Inversely, the more metropolitan areas have higher mammography rates.
The Moran’s I statistic is 0.2863014. The p value is 1.5679576^{-6}. The Moran’s I suggests a positive spatial autocorrelation between counties with high mammography rates. The low p-value suggests that this is a pattern and not just a random occurrence.
The LISA analysis reveals the spatial distribution of mammography in each county in North Carolina.6 counties are high-high clusters and are mainly throughout the southeast. Low-low clusters have a count of 5, which is mostly distributed in the western tip of the state. The fact that there are no LH or HL clusters suggests that there are not significant outlier areas where they are bordered by counties with the opposite value. Areas with higher mammography rates on the choropleth map coincide with the HH clusters on the LISA model. The same is true for areas with lower mammography rates, which coincide with the LL clusters. Both HH and LL indicate that these values are clustered spatially.
After collecting, wrangling, and doing a spatial analysis on the mammography rates in North Carolina counties, there appears to be a very high statistical significance. Based on my computations, the p-value is very close to zero. Furthermore, based on the choropleth map and LISA map having a similar spatial pattern, it is safe to say the disparities in mammography rates are worth exploring. If I were to add another variable/spatial layer, it would be the median income in each county. I would hypothesize that low income areas would correlate with low mammography rates.
| Method | koRpus | stringi |
|---|---|---|
| Word count | 562 | 568 |
| Character count | 3550 | 3550 |
| Sentence count | 38 | Not available |
| Reading time | 2.8 minutes | 2.8 minutes |