Goal: Explore whether rurality, income,insurance, educational attainment participants report having to make more financial sacrifices for their cancer care.
MCGI 1.0 data set: MCGI_all_data_clean_2022-05-1
RUCA code to zip code crosswalk: RUCA2010zipcode_data.csv
Area Deprivation Index for the state of Maine
1,603 participants consented, registered, and enrolled in the MCGI study between July 2017 and October 2020. The study sample consisted of patients of participating Maine oncologists, including neuro-oncologists and gynecologic oncologists. Clinicians were recruited to MCGI by the research team via site visits, telephone, and personal contact. Oncology clinicians, once enrolled, were able to offer free large-panel GTT to their patients through the MCGI. Clinicians offered GTT to their patients as part of their options for care in an unscripted manner and were provided a brochure describing the nature, benefits, and limitations of GTT. Patients who agreed to GTT were then offered to participate in the MCGI study, which offered GTT free of charge to the patient. The MCGI study consisted of completing periodic surveys and sharing abstracted data from medical records. Interested patients provided informed consent for participation. The MCGI study protocol was reviewed and approved by the Western IRB.
Financial sacrifices: Has your family had to make sacrifices to pay for your cancer treatments (such as cancel a vacation or take out extra loans)?
Yes No Unsure
Descriptive analysis of variables, check distribution, missing.
Use Leaflet and ggplot to create maps that describe the cohort’s 1) rurality (using WWAMI classification of RUCA codes) 2) mean ADI by zip code (>10 participants per zip code), and 3) distance traveled to receive care (based on zip code).
Use bivariate logistic regression models to predict outcome variable (financial sacrifices), based on independent variables (educational attainment, income level, rurality, insurance). Perform bivariate models first, then add covariates: age and gender,
GTT provided to participants for free (Generalizability). May influence patients’ expectations and/or attitudes towards GTT.
Patient population is from mostly rural state with little racial and ethnic diversity. (Generalizability)
### LIBRARIES ####
#Install and load CRAN packages uses pacman p_load. This will install any packages you do not already have installed, and then load everything.
pacman::p_load(
plyr,
tidyverse,
grid,
readxl,
gtsummary,
broom,
knitr,
RColorBrewer,
purrr,
here,
data.table,
ggplot2,
descr,
summarytools,
knitr,
printr,
officer,
flextable,
stringr,
tidyr,
lubridate,
rstatix,
aplpack,
reticulate,
gapmider,
viridis,
sf,
patchwork,
gridExtra,latex2exp
)
# remotes::install_version("ggplot2", version = "3.4.4", repos = "http://cran.us.r-project.org")
# Set the number of digits after decimal point to 3
options(digits = 3)
### FUNCTIONS ####
# Create a function using case_when() that has a factor output and preserves the argument order as the order of factor levels
# Source: https://stackoverflow.com/questions/49572416/r-convert-to-factor-with-order-of-levels-same-with-case-when
fct_case_when <- function(...) {
args <- as.list(match.call())
levels <- sapply(args[-1], function(f)
f[[3]])
levels <- levels[!is.na(levels)]
factor(dplyr::case_when(...), levels = levels)
}
is_outlier <- function(x) {
q1 <- quantile(x, 0.25)
q3 <- quantile(x, 0.75)
iqr <- q3 - q1
return(x < (q1 - 1.5 * iqr) | x > (q3 + 1.5 * iqr))
}
# import adi calculation function from reports
source(here::here("reports", "clean_adi_data.R"))
# import plotting functions
source(here::here("reports", "plotting_functions.R"))
source(here::here("reports", "mapping_code_ADI.R"))
MCGI_1_0_GMTYN_LG <- read_csv(here::here("data","MCGI_1.0_GMTYN_LG.csv"))
#Create a histogram function taking arguments for the x aesthetic and fill color
myhist <- function(var_names, col_vec) {
#need to use !!dplyr::sym() around var_name vector so that quotations will be removed from cont_vars vector
median_value<- median(bsl_complete_recode %>% pull(!!var_names),na.rm = TRUE)
plot <-
ggplot(bsl_complete_recode, aes(x = !!dplyr::sym(var_names))) +
geom_histogram(fill = col_vec) +
geom_vline(xintercept =median_value , color = "grey", linetype = "dotted") +
geom_text(aes(x = median_value,label = paste("Median =", round(median_value, 2))),
y = Inf, hjust = -0.1, vjust = 1.1, color = col_vec, size = 3.2)+
labs(title = var_names)
print(plot)
}
### DATA IMPORT ####
bsl_complete_subset <-
read_rds(here::here("data", "bsl_complete_data.rds"))
### RECODE DATA FOR ANALYSIS ####
# Use the following categories for Race: White; African/African-American; American Indian/Alaska Native; Asian; Multiple; Other/Not Given
bsl_complete_subset <-
mutate(
bsl_complete_subset,
race_ct = pt_bl_g02_05rev___1 + pt_bl_g02_05rev___2 + pt_bl_g02_05rev___3 + pt_bl_g02_05rev___4 +
pt_bl_g02_05rev___5 + pt_bl_g02_05rev___6 + pt_bl_g02_05rev___7
)
# Recode RUCA2 values to factor categories "Urban", "Large rural", and "Small or isolated rural"
# Convert RUCA codes to rurality categories using WWAMI four-level categorization to start.
# See WWAMI site for these categorizations: https://depts.washington.edu/uwruca/ruca-maps.php
bsl_complete_recode <- bsl_complete_subset %>% mutate(
Rurality = fct_case_when(
RUCA2 <= 3 |
RUCA2 == 4.1 | RUCA2 == 5.1 | RUCA2 == 7.1 | RUCA2 == 8.1 |
RUCA2 == 10.1 ~ "Urban",
RUCA2 == 4 |
RUCA2 == 4.2 | RUCA2 == 5 | RUCA2 == 5.2 | RUCA2 == 6.0 |
RUCA2 == 6.1 ~ "Large rural",
RUCA2 == 7 |
RUCA2 >= 7.2 & RUCA2 <= 8 | RUCA2 >= 8.2 & RUCA2 <= 9.2 |
RUCA2 == 10 |
RUCA2 >= 10.2 ~ "Small & isolated rural"
),
# Recode pt_bl_g02_12rev to factor categories for income "Less than $25,000"
Income = fct_case_when(
pt_bl_g02_12rev == 1 ~ "Less than $25,000",
pt_bl_g02_12rev == 2 ~ "$25,000 to $49,999",
pt_bl_g02_12rev == 3 ~ "$50,000 to $74,999",
pt_bl_g02_12rev == 4 ~ "$75,000 to $100,000",
pt_bl_g02_12rev == 5 ~ "More than $100,000",
pt_bl_g02_12rev == 6 | is.na(pt_bl_g02_12rev) ~ NA_character_
),
# Recode pt_bl_g02_10rev to factor categories for education "Less than high school"
Education = fct_case_when(
pt_bl_g02_10rev == 1 ~ "Less than high school",
pt_bl_g02_10rev == 2 ~ "High school graduate/GED",
pt_bl_g02_10rev == 3 ~ "Some college/Trade school",
pt_bl_g02_10rev == 4 ~ "Bachelor’s Degree",
pt_bl_g02_10rev == 5 ~ "Graduate Degree"
),
Gender = fct_case_when(pt_bl_g02_03rev == 1 ~ "Male",
pt_bl_g02_03rev == 2 ~ "Female"),
Ethnicity = fct_case_when(
pt_bl_g02_04rev == FALSE ~ "Non-Hispanic",
pt_bl_g02_04rev == TRUE ~ "Hispanic"
),
Race = case_when(
race_ct == 0 ~ "Not Given/Other",
# Code source: MCGI_Outcomes_Results_calc.R
race_ct > 1 ~ "Multiple",
# More than one race
TRUE ~ case_when(
# One race
pt_bl_g02_05rev___1 == 1 ~ "White",
pt_bl_g02_05rev___2 == 1 ~ "African or African-American",
pt_bl_g02_05rev___3 == 1 ~ "African or African-American",
pt_bl_g02_05rev___4 == 1 ~ "Asian",
pt_bl_g02_05rev___5 == 1 ~ "American Indian or Alaskan Native",
pt_bl_g02_05rev___6 == 1 ~ "Native Hawaiian or other Pacific Islander",
TRUE ~ "Not Given/Other"
)
),
# INSURANCE
Insurance = case_when(
# Both Medicare and Medicaid
as.numeric(bsl_complete_subset$insurance_revised___1) + as.numeric(bsl_complete_subset$insurance_revised___2) ==
2 ~ "Medicare and Medicaid",
# Includes Medicare
as.numeric(bsl_complete_subset$insurance_revised___1) == 1 ~ "Medicare",
# Includes Medicaid
as.numeric(bsl_complete_subset$insurance_revised___2) == 1 ~ "Medicaid",
# Includes Private
as.numeric(bsl_complete_subset$insurance_revised___3) == 1 ~ "Private",
# Includes Other
as.numeric(bsl_complete_subset$insurance_revised___4) == 1 ~ "Other",
# Includes I don't know
as.numeric(bsl_complete_subset$insurance_revised___5) == 1 ~ "I don't know",
TRUE ~ NA_character_
),
# FINANCIAL SACRIFICES
Financial_Sacrifices = fct_case_when(
pt_bl_g02_14rev == 2 ~ "No",
pt_bl_g02_14rev == 1 ~ "Yes",
pt_bl_g02_14rev == 3 ~ "Unsure",
is.na(pt_bl_g02_14rev) ~ NA_character_
),
# FINANCIAL SACRIFICES 2.0
Financial_Sacrifices_ = case_when(
pt_bl_g02_14rev %in% c(2,3) ~ "No",
pt_bl_g02_14rev == 1 ~ "Yes",
is.na(pt_bl_g02_14rev) ~ NA_character_
))
bsl_complete_recode <- bsl_complete_recode %>%
mutate(
Financial_Sacrifices_bin = case_when(
pt_bl_g02_14rev %in% c(2,3) ~ 0,
pt_bl_g02_14rev == 1 ~ 1,
is.na(pt_bl_g02_14rev) ~ NA_real_
),
Financial_Sacrifices_bin_un = case_when(
pt_bl_g02_14rev ==2 ~ 0,
pt_bl_g02_14rev == 1 ~ 1,
pt_bl_g02_14rev == 3 ~ NA_real_,
is.na(pt_bl_g02_14rev) ~ NA_real_
)
)
bsl_complete_recode$Insurance <-
factor(
bsl_complete_recode$Insurance,
levels = c(
"Medicare and Medicaid",
"Medicare",
"Medicaid",
"Private",
"Other",
"I don't know"
)
)
bsl_complete_recode$Race <-
factor(
bsl_complete_recode$Race,
levels = c(
"White",
"African or African-American",
"Asian",
"American Indian or Alaskan Native",
"Native Hawaiian or other Pacific Islander",
"Not Given/Other",
"Multiple"
)
)
# identify any non-true outliers (data entry errors), review together if needed, correct the data entry errors.
bsl_complete_recode_without_na_time <-
bsl_complete_recode[!is.na(bsl_complete_recode$pt_bl_g02_09a_rev),]
bsl_complete_recode_without_na_distance <-
bsl_complete_recode[!is.na(bsl_complete_recode$pt_bl_g02_09b_rev),]
# both methods use IQR Range. There are 96 outliers (9%) for time in HOURS
## Using the identify_outliers library
out_travel_time_met1 <-
identify_outliers(data = bsl_complete_recode, variable = "pt_bl_g02_09a_rev")
## Using boxplot stats
out_travel_time_met2 <-
boxplot.stats(bsl_complete_recode$pt_bl_g02_09a_rev)$out
outliers_indices_time <-
which(bsl_complete_recode$pt_bl_g02_09a_rev %in% out_travel_time_met2)
# both methods use IQR Range. There are 63 outliers (5.9%) for MILES
out_travel_miles_met1 <-
identify_outliers(data = bsl_complete_recode, variable = "pt_bl_g02_09b_rev")
out_travel_miles_met2 <-
boxplot.stats(bsl_complete_recode$pt_bl_g02_09b_rev)$out
outliers_indices_distance <-
which(bsl_complete_recode$pt_bl_g02_09b_rev %in% out_travel_miles_met2)
# join both outlier detection list of indices
all_outliers_indices <-
union(outliers_indices_time, outliers_indices_distance)
filtered_outlier_data <- bsl_complete_recode[all_outliers_indices,]
# TIME, DISTANCE & RURALITY_ROUNDTABLE
bsl_complete_recode <- bsl_complete_recode %>% mutate(
Time_Traveled_Hours = case_when(
as.numeric(pt_bl_g02_09a_rev) >= 10 ~ as.numeric(pt_bl_g02_09a_rev) / 60,
TRUE ~ as.numeric(pt_bl_g02_09a_rev)
),
Distance_Traveled_Miles = case_when(
as.numeric(pt_bl_g02_09b_rev) >= 0 &
as.numeric(pt_bl_g02_09b_rev) <= 25 ~ "0-25",
as.numeric(pt_bl_g02_09b_rev) >= 26 &
as.numeric(pt_bl_g02_09b_rev) <= 50 ~ "26-50",
as.numeric(pt_bl_g02_09b_rev) >= 51 &
as.numeric(pt_bl_g02_09b_rev) <= 100 ~ "51-100",
as.numeric(pt_bl_g02_09b_rev) >100 ~ ">100"
),
Rurality_Roundtable = fct_case_when(
RUCA2 %in% c(1, 1.1) ~ "Urban",
RUCA2 %in% c(2, 2.1, 3, 4, 4.1, 5, 5.1, 6) ~ "Large rural",
RUCA2 %in% c(7, 7.1, 7.2, 8, 8.1, 8.2, 9, 10, 10.1, 10.2, 10.3) ~ "Small & isolated rural",
TRUE ~ NA_character_
),
Age = case_when(TRUE ~ as.numeric(pt_bl_g02_02rev)),
Age_Divided_By_Ten = case_when(TRUE ~ as.numeric(pt_bl_g02_02rev)/10)
)
adi_raw <- process_adi_data()
# left outer join between bsl_complete_recode dataframe & ADI index by zip
bsl_complete_recode <-
merge(x = bsl_complete_recode,
y = adi_raw,
by = "zip",
all.x = TRUE)
# left_outer_join to get GMT
# bsl_complete_recode <-
# merge(x = bsl_complete_recode,
# y = MCGI_1_0_GMTYN_LG,
# by = "participant_id",
# all.x = TRUE)
bsl_complete_recode$Education <- factor(bsl_complete_recode$Education, levels = c("Bachelor’s Degree", "Less than high school",
"High school graduate/GED","Some college/Trade school",
"Graduate Degree"))
bsl_complete_recode$Income <- factor(bsl_complete_recode$Income,
levels = c("$50,000 to $74,999", "Less than $25,000",
"$25,000 to $49,999","$75,000 to $100,000",
"More than $100,000"))
#### Export Dataframe for Mapping ####
saveRDS(bsl_complete_recode, file = here::here("data","bsl_complete_recode.rds"))
# Summary statistics for selected covariates and predictor variables
gt_table_selected_variables <-
bsl_complete_recode %>%
tbl_summary(
by = Financial_Sacrifices_,
include = c(
Age,
Gender,
rhx_current_stage,
Cancer_Site_Category,
Rurality,
adi,
Education,
Income,
Insurance,
Distance_Traveled_Miles
),
label = list(
rhx_current_stage ~ "Cancer Stage",
Cancer_Site_Category ~ "Cancer Site Category",
adi ~ "Area Depravation Index",
Education ~ "Education Attainment Level",
Distance_Traveled_Miles ~ "Distance Traveled (Miles)"
),
missing_text = "Missing"
) %>%
modify_spanning_header(c("stat_1", "stat_2") ~ "**Financial Sacrifice Response**") %>%
#add_p() %>%
modify_caption("**Table 1. MCGI 1.0 Demographics**") %>%
bold_labels()
gt_table_selected_variables
| Characteristic | Financial Sacrifice Response | |
|---|---|---|
| No, N = 8821 | Yes, N = 2221 | |
| Age | 66 (59, 73) | 60 (53, 66) |
| Gender | ||
| Male | 371 (42%) | 82 (37%) |
| Female | 510 (58%) | 140 (63%) |
| Missing | 1 | 0 |
| Cancer Stage | ||
| Stage I | 38 (4.3%) | 7 (3.2%) |
| Stage II | 27 (3.1%) | 7 (3.2%) |
| Stage III | 138 (16%) | 27 (12%) |
| Stage IV | 654 (74%) | 173 (78%) |
| Unknown | 24 (2.7%) | 8 (3.6%) |
| Missing | 1 | 0 |
| Cancer Site Category | ||
| Lung | 123 (14%) | 31 (14%) |
| Gynecologic | 189 (21%) | 44 (20%) |
| Breast | 93 (11%) | 17 (7.7%) |
| Colon | 87 (9.9%) | 29 (13%) |
| Brain | 52 (5.9%) | 23 (10%) |
| Prostate | 59 (6.7%) | 9 (4.1%) |
| Other | 279 (32%) | 69 (31%) |
| Rurality | ||
| Urban | 369 (43%) | 99 (47%) |
| Large rural | 187 (22%) | 34 (16%) |
| Small & isolated rural | 297 (35%) | 76 (36%) |
| Missing | 29 | 13 |
| Area Depravation Index | 5.00 (2.99, 7.47) | 4.90 (2.89, 7.47) |
| Missing | 27 | 13 |
| Education Attainment Level | ||
| Bachelor’s Degree | 150 (17%) | 29 (13%) |
| Less than high school | 59 (6.9%) | 9 (4.1%) |
| High school graduate/GED | 269 (31%) | 64 (29%) |
| Some college/Trade school | 262 (31%) | 90 (41%) |
| Graduate Degree | 118 (14%) | 25 (12%) |
| Missing | 24 | 5 |
| Income | ||
| $50,000 to $74,999 | 162 (20%) | 36 (18%) |
| Less than $25,000 | 224 (28%) | 61 (30%) |
| $25,000 to $49,999 | 243 (31%) | 71 (35%) |
| $75,000 to $100,000 | 83 (10%) | 15 (7.5%) |
| More than $100,000 | 80 (10%) | 18 (9.0%) |
| Missing | 90 | 21 |
| Insurance | ||
| Medicare and Medicaid | 196 (22%) | 31 (14%) |
| Medicare | 338 (38%) | 123 (56%) |
| Medicaid | 250 (28%) | 37 (17%) |
| Private | 39 (4.4%) | 15 (6.8%) |
| Other | 31 (3.5%) | 7 (3.2%) |
| I don't know | 25 (2.8%) | 8 (3.6%) |
| Missing | 3 | 1 |
| Distance Traveled (Miles) | ||
| >100 | 21 (2.9%) | 9 (4.7%) |
| 0-25 | 469 (65%) | 122 (64%) |
| 26-50 | 151 (21%) | 36 (19%) |
| 51-100 | 78 (11%) | 25 (13%) |
| Missing | 163 | 30 |
| 1 Median (IQR); n (%) | ||
# More summary statistics
gt_table_selected_variables_2 <-
bsl_complete_recode %>%
tbl_summary(
#by = Financial_Sacrifices,
include = c(
Age,
Race,
Gender,
rhx_current_stage,
Cancer_Site_Category,
Rurality,
Rurality_Roundtable,
adi,
Education,
Income,
Insurance,
Distance_Traveled_Miles,
Financial_Sacrifices
),
label = list(
rhx_current_stage ~ "Cancer Stage",
Cancer_Site_Category ~ "Cancer Site Category",
adi ~ "Area Depravation Index",
Education ~ "Education Attainment Level",
Distance_Traveled_Miles ~ "Distance Traveled (Miles)"
),
missing_text = "Missing"
) %>%
#modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Financial Sacrifice Response**") %>%
#add_p() %>%
modify_caption("**Table 2. MCGI 1.0 Demographics**") %>%
bold_labels()
### Generate multiple plots for exploratory analysis
cont_vars <-
c("Age",
"adi",
"Time_Traveled_Hours")
cat_vars <-
c(
"rhx_current_stage",
"Cancer_Site_Category",
"Rurality",
"Income",
"Education",
"Insurance",
"Distance_Traveled_Miles",
"Financial_Sacrifices"
)
| Characteristic | Unadjusted | Age & Gender Adjusted | ||||
| \(OR^{1}\) | \(95 \%~~CI^{1}\) | p-value | \(OR^{1}\) | \(95\%~~CI^{1}\) | p-value | |
| Education | 0.027 | 0.13 | ||||
| Bachelor’s Degree | — | — | — | — | ||
| Less than high school | 0.84 | 0.35, 1.83 | 0.7 | 1.03 | 0.42, 2.30 | >0.9 |
| High school graduate/GED | 1.26 | 0.77, 2.09 | 0.4 | 1.24 | 0.75, 2.10 | 0.4 |
| Some college/Trade school | 1.85 | 1.16, 3.03 | 0.012 | 1.74 | 1.06, 2.90 | 0.031 |
| Graduate Degree | 1.12 | 0.61, 2.04 | 0.7 | 1.10 | 0.59, 2.06 | 0.8 |
| Age | 0.94 | 0.93, 0.96 | <0.001 | |||
| Gender | 0.5 | |||||
| Male | — | — | ||||
| Female | 1.11 | 0.80, 1.54 | 0.5 | |||
| Income | 0.4 | 0.077 | ||||
| $50,000 to $74,999 | — | — | — | — | ||
| Less than $25,000 | 1.34 | 0.84, 2.17 | 0.2 | 1.29 | 0.79, 2.11 | 0.3 |
| $25,000 to $49,999 | 1.35 | 0.85, 2.16 | 0.2 | 1.56 | 0.97, 2.54 | 0.072 |
| $75,000 to $100,000 | 0.87 | 0.44, 1.67 | 0.7 | 0.77 | 0.38, 1.50 | 0.4 |
| More than $100,000 | 1.05 | 0.54, 1.98 | 0.9 | 0.84 | 0.42, 1.61 | 0.6 |
| Age | 0.94 | 0.93, 0.96 | <0.001 | |||
| Gender | 0.2 | |||||
| Male | — | — | ||||
| Female | 1.23 | 0.88, 1.74 | 0.2 | |||
| Rurality | 0.2 | 0.3 | ||||
| Urban | — | — | — | — | ||
| Large Rural | 0.68 | 0.44, 1.03 | 0.074 | 0.74 | 0.47, 1.15 | 0.2 |
| Small & isolated rural | 0.95 | 0.68, 1.33 | 0.8 | 1.02 | 0.72, 1.44 | >0.9 |
| Age | 0.94 | 0.93, 0.96 | <0.001 | |||
| Gender | 0.4 | |||||
| Male | — | — | ||||
| Female | 1.14 | 0.82, 1.58 | 0.4 | |||
| Insurance | <0.001 | 0.7 | ||||
| Medicare and Medicaid | — | — | — | — | ||
| Medicare | 2.40 | 1.55, 3.82 | <0.001 | 1.18 | 0.72, 1.99 | 0.5 |
| Medicaid | 1.35 | 0.85, 2.16 | 0.9 | 0.84 | 0.49, 1.46 | 0.5 |
| Private | 0.87 | 0.44, 1.67 | 0.013 | 1.04 | 0.46, 2.29 | >0.9 |
| Other | 0.80 | 0.28, 2.03 | 0.7 | |||
| I don’t know | 1.05 | 0.54, 1.98 | 0.3 | 0.93 | 0.34, 2.33 | 0.9 |
| Age | 0.94 | 0.93, 0.96 | <0.001 | |||
| Gender | 0.5 | |||||
| Male | — | — | ||||
| Female | 1.14 | 0.82, 1.58 | 0.4 | |||
| 1) OR = Odds Ratio, CI = Confidence~Interval |