1 Introduction to the paper

Understanding the multifaceted impact of socioeconomic factors on well-being remains a pivotal challenge in developmental and social sciences. Recent research has illuminated the significant role that economic disparities play in influencing health outcomes and developmental trajectories, particularly among children and their families. Notably, studies such as Weissman et al. (2023) have highlighted how state-level economic policies and cost of living can mediate the association between family income and child brain development and mental health. Building upon this foundation, our study seeks to expand the scope of inquiry by examining how varying levels of income influence child and parent well-being through different spectrums of material hardship and economic assistance, incorporating a nuanced analysis of both environmental and policy-driven moderators.

The detrimental effects of poverty extend beyond simple measures of income, often manifesting through material hardships that include food insecurity, inadequate housing, and unstable living conditions. These hardships can exert profound psychological and physical effects on families, potentially exacerbating the stressors associated with low income. Moreover, the assistance sought by families, whether in the form of direct financial aid or supportive services, may provide a buffering effect against these hardships. Our analysis employs a dual-model approach to dissect these dynamics further: the first model explores how the cost of living interacts with income and material hardships to affect well-being, while the second model assesses the impact of the type and extent of aid requested by families on their overall well-being.

By integrating these dimensions, this study aims to provide a more comprehensive understanding of the complex interplay between economic factors and family well-being. This approach not only contributes to the theoretical discourse on socioeconomic impacts but also holds practical implications for policy-making and the design of interventions aimed at alleviating poverty’s effects on vulnerable populations. Through this research, we aspire to refine the metrics and models used to assess poverty, advocating for policies that recognize and respond to the nuanced realities faced by low-income families.

1.1 Research Questions and Hypotheses

This study addresses the following critical questions:

How does the cost of living moderate the relationship between income and well-being among families experiencing varying levels of material hardship?
What is the impact of the types and amounts of aid requested by families on their well-being, across different income levels?

Study 1 aims to explore

1.1.1 Hypotheses

Based on the literature and the conceptual framework guiding this study, we propose the following hypotheses: (1) Higher cost of living will exacerbate the negative effects of low income on family well-being, particularly for families experiencing greater material hardships. However, these effects will be mitigated in contexts where material hardships are less severe. (2) Families receiving more comprehensive aid, in terms of both scope and magnitude, will exhibit better well-being outcomes, with this effect being more pronounced among those facing higher levels of material hardship and living in areas with a higher cost of living.

2 Methods

Data for this study were obtained from the Rapid Assessment of Pandemic Impact on Development–Early Childhood (RAPID-EC) project. This national project conducts weekly or biweekly surveys to explore how the pandemic affects households with children aged 0 to 5 years. The University of Oregon’s and Stanford University’s institutional review boards approved all study procedures. Participants were recruited via community organization email lists, Facebook advertisements, and panel services. Initially, families completed an online survey to verify eligibility. Those who qualified provided consent online and filled out a baseline survey covering demographics, employment and financial challenges, health and well-being, and access to childcare. After completing this initial survey, families joined a participant pool and received email invitations for follow-up surveys. These follow-up surveys revisited core baseline topics and introduced new special topics on a weekly or biweekly schedule. Each sampling point for the follow-up surveys was demographically representative of the U.S. population concerning race, income, and geographic location. Families were compensated $5 for each survey they completed.

########################## Clean Data - County Data ###########################

# Step 1: Cleaning master_dem zip code column

master_dem <- master_dem %>%
  mutate(zipcode1 = zipcode) %>% # use new column to track changes
  # Mark zip codes that need modification as NA
  mutate(zipcode1 = ifelse(grepl("^\\d{5}$", zipcode1) | 
                             grepl("^\\d{5}-\\d{4}$", zipcode1), 
                           ifelse(grepl("^\\d{5}-\\d{4}$", zipcode1), 
                                  sub("-.*", "", zipcode1), 
                                  zipcode1), 
                           NA)) %>%
  # Add a column to indicate whether the original zipcode was modified for QA
  mutate(zipcode_modified = ifelse(is.na(zipcode1) & !is.na(zipcode), TRUE, FALSE))

# Step 2: Merge zip codes and fips codes 

# The major issue with this step is that 20% of the data will have a single zip
# code with multiple fips codes which will make merging RAPID dataset a problem
# later down the line. So first, only use zip codes that are relevant in the data

# Step 2.1: Extract all zip codes in rapid_data into a vector

zip_codes_in_rapid <- unique(master_dem$zipcode1) # use zipcode1 because it is transformed

# Step 2.2: Subset zip_data to only include data with zip codes in RAPID

zip_data <- zip_data %>%
  filter(zipcode %in% zip_codes_in_rapid)

# Step 2.3: figure out zip codes for each county fips code 

zipcodes_by_county <- zip_data %>%
  group_by(county_fips, County, State) %>%
  summarise(zipcodes = paste(unique(zipcode), collapse = ", "), # column with zips
            n_zipcodes = n_distinct(zipcode)) %>% # this column tells us how many zips per county
  ungroup()

# Step 2.4: Add zipcodes column from zipcodes_by_county to county_data

county_data <- county_data %>%
  left_join(zipcodes_by_county, by = c("county_fips", "State", "County")) %>%
  drop_na(zipcodes) # if NA = no counties represented in RAPID data (confirm!)

# Step 3: Merge county_data with master_dem data

# Here, we need to be careful with how we merge; since some zip codes will be able to be 
# matched by a perfect combination of zip code, nchild, and nfamily, that is fine. 
# However, some will produce multiple matches with county data because of multiple zip codes
# existing. For this, we are just going to take the AVERAGE grouped by zip, child, and family.

# Step 3.1: Figure out which zip codes have multiple county fips codes

zip_multi_fips <- zip_data %>%
  group_by(zipcode) %>%
  summarise(county_fips_codes = paste(unique(county_fips), collapse = ", "),
            n_fips = n_distinct(county_fips)) %>%
  filter(n_fips > 1) %>% 
  ungroup()

# Step 3.2: Create a list of all zip codes with multiple fips
zip_codes_with_issues <- zip_multi_fips$zipcode

# Step 3.3: Create a column that signals which zip codes need to be modified
# this was necessary to help ensure we are modifying the right ones correctly for QA

master_dem <- master_dem %>%
  mutate(zip_issues = zipcode1 %in% zip_codes_with_issues)

county_data <- county_data %>%
  separate_rows(zipcodes, sep = ",\\s*") %>%
  mutate(zip_issues = zipcodes %in% zip_codes_with_issues)

# Step 3.4: Ensure single unique combination by zip code, nchild, and nfamily

county_data_final <- county_data %>%
  group_by(zipcodes, nchild, nfamily) %>%
  summarise(Housing_Monthly = mean(Housing_Monthly, na.rm = TRUE),
            Food_Monthly = mean(Food_Monthly, na.rm = TRUE),
            Transportation_Monthly = mean(Transportation_Monthly, na.rm = TRUE),
            Healthcare_Monthly = mean(Healthcare_Monthly, na.rm = TRUE),
            OtherNecessities_Monthly = mean(OtherNecessities_Monthly, na.rm = TRUE),
            Childcare_Monthly = mean(Childcare_Monthly, na.rm = TRUE),
            Taxes_Monthly = mean(Taxes_Monthly, na.rm = TRUE),
            Total_Monthly = mean(Total_Monthly, na.rm = TRUE),
            Housing_Yearly = mean(Housing_Yearly, na.rm = TRUE),
            Food_Yearly = mean(Food_Yearly, na.rm = TRUE),
            Transportation_Yearly = mean(Transportation_Yearly, na.rm = TRUE),
            Healthcare_Yearly = mean(Healthcare_Yearly, na.rm = TRUE),
            OtherNecessities._Yearly = mean(OtherNecessities._Yearly, na.rm = TRUE),
            Childcare_Yearly = mean(Childcare_Yearly, na.rm = TRUE),
            Taxes_Yearly = mean(Taxes_Yearly, na.rm = TRUE),
            Total_Yearly = mean(Total_Yearly, na.rm = TRUE),
            median_family_income = mean(median_family_income, na.rm = TRUE)
  ) %>%
  rename(zipcode = zipcodes)

# Step 3.5: Add county data to master_dem

# Step 1: Group by nchild and nfamily in master_dem and count the number of rows in each group
combo_count <- master_dem %>%
  group_by(nchild, nfamily) %>%
  summarise(count = n(), .groups = "drop") # Ungroup after summarising

# Step 2: Extract distinct combinations from county_data_final
distinct_county_combinations <- county_data_final %>%
  ungroup() %>%
  dplyr::select(nchild, nfamily) %>%
  distinct()

# Step 3: Identify the combinations missing in county_data_final
missing_combinations <- combo_count %>%
  anti_join(distinct_county_combinations, by = c("nchild", "nfamily"))

# Step 4: Create a new column in master_dem indicating whether a combination is missing
master_dem <- master_dem %>%
  left_join(missing_combinations, by = c("nchild", "nfamily")) %>%
  mutate(missing_combination = !is.na(count)) %>%
  dplyr::select(-count) %>%
  left_join(county_data_final, by = c("zipcode", "nchild", "nfamily"))

##  aid_inv$ReceivedAid    n   percent valid_percent
##                    0 8924 0.5192901     0.7019034
##                    1 3790 0.2205412     0.2980966
##                   NA 4471 0.2601688            NA

##  aid_inv$ReceivedAid_basic1    n   percent valid_percent
##                           0 9207 0.5357579     0.7241623
##                           1 3507 0.2040733     0.2758377
##                          NA 4471 0.2601688            NA

##  aid_inv$ReceivedAid_basic2    n   percent valid_percent
##                           0 9059 0.5271458     0.7125216
##                           1 3655 0.2126855     0.2874784
##                          NA 4471 0.2601688            NA

##  aid_inv$ReceivedAid_other     n    percent valid_percent
##                          0 12151 0.70707012    0.95571811
##                          1   563 0.03276113    0.04428189
##                         NA  4471 0.26016875            NA

##  aid_inv$Health_Medical_Services     n   percent valid_percent
##                                0 10511 0.6116381     0.8267264
##                                1  2203 0.1281932     0.1732736
##                               NA  4471 0.2601688            NA

##  aid_inv$Food_Benefits     n   percent valid_percent
##                      0 10199 0.5934827     0.8021866
##                      1  2515 0.1463486     0.1978134
##                     NA  4471 0.2601688            NA

##  aid_inv$Income_Benefits     n   percent valid_percent
##                        0 12148 0.7068955    0.95548215
##                        1   566 0.0329357    0.04451785
##                       NA  4471 0.2601688            NA

##  aid_inv$Disability_Benefits     n    percent valid_percent
##                            0 12361 0.71929008    0.97223533
##                            1   353 0.02054117    0.02776467
##                           NA  4471 0.26016875            NA

##  aid_inv$Military_Benefits     n     percent valid_percent
##                          0 12637 0.735350596   0.993943684
##                          1    77 0.004480652   0.006056316
##                         NA  4471 0.260168752            NA

##  aid_inv$Housing_Benefits     n    percent valid_percent
##                         0 12338 0.71795170     0.9704263
##                         1   376 0.02187955     0.0295737
##                        NA  4471 0.26016875            NA

##  aid_inv$Childcare_Subsidy     n    percent valid_percent
##                          0 12287 0.71498400    0.96641498
##                          1   427 0.02484725    0.03358502
##                         NA  4471 0.26016875            NA

##  aid_inv$Transportation_Benefits     n     percent valid_percent
##                                0 12630 0.734943264    0.99339311
##                                1    84 0.004887984    0.00660689
##                               NA  4471 0.260168752            NA

##  aid_inv$Training_Benefits     n     percent valid_percent
##                          0 12630 0.734943264    0.99339311
##                          1    84 0.004887984    0.00660689
##                         NA  4471 0.260168752            NA

##  aid_inv$Clothing_Benefits     n     percent valid_percent
##                          0 12670 0.737270876   0.996539248
##                          1    44 0.002560372   0.003460752
##                         NA  4471 0.260168752            NA

##  aid_inv$Unemployment_Benefits     n    percent valid_percent
##                              0 12259 0.71335467    0.96421268
##                              1   455 0.02647658    0.03578732
##                             NA  4471 0.26016875            NA

##  aid_inv$Other_Benefits     n    percent valid_percent
##                       0 12322 0.71702066    0.96916785
##                       1   392 0.02281059    0.03083215
##                      NA  4471 0.26016875            NA

# Code FSTR module to get a couple of variable to use in follow-up analyses: 
## fs_any: binary variable indicating whether reporting at least one types of hardships 
## fs_num: continuous numeric variable indicating how many types of hardships they reported in total
## fs_hardship: direct taken from the question "how hard it is to pay for basic needs?" with a 4-point Likert scale response. 

# Here we take the most recent response (when multiple are available)
# to be consistent with the operation of taking the most recent income levels. 

fs <- rapid_data %>%
  # select necessray variables in FSTR module using contains()
  dplyr::select (CaregiverID, StartDate, SurveyType, Week, 
                 FSTR.001, contains ("FSTR.002"),
                 contains("JOB.015.a.2"), JOB.008.2,
                 STRESS.002) %>%
  mutate (fs_hardship = FSTR.001, 
          fs_food = FSTR.002_1, 
          fs_housing = FSTR.002_2, 
          fs_utility = FSTR.002_3, 
          fs_healthcare = FSTR.002_4,
          fs_childcare = FSTR.002_7,
          fs_wellbeing = case_when (FSTR.002_5 == 1|FSTR.002_6==1|FSTR.002_10==1 ~ 1),
          fs_any = ifelse (fs_food == 1 | 
                             fs_housing == 1 | 
                             fs_utility == 1|
                             fs_healthcare == 1|
                             fs_childcare == 1|
                             fs_wellbeing == 1, 1, 0),)

fs_items <- c ("fs_food", "fs_housing", "fs_utility", 
               "fs_healthcare", "fs_childcare", "fs_wellbeing")

fs$fs_num = rowSums(fs[,fs_items],na.rm=T)

fs <- fs %>%
  mutate (fs_any = case_when (fs_any == 1 ~ 1,
                              fs_hardship == 0 ~ 0, 
                              TRUE ~ NA_real_))
fs_rct <- fs %>%
  group_by (CaregiverID)%>%
#   filter (is.na(fs_any) == F)%>% (why was this here?)
  filter (Week == max (Week))%>%
  ungroup()%>%
  dplyr::select(CaregiverID, Week, 
                 fs_any, fs_num, fs_hardship, fs_food, fs_housing, 
                 fs_utility, fs_healthcare, fs_childcare, fs_wellbeing,
                 STRESS.002, contains("JOB.015.a.2"), JOB.008.2)%>%
  mutate(across(all_of(fs_items), ~ifelse(is.na(.x), 0, .x))) 

### Final Data Prep

final_data_prep <- fs_rct %>%
  # Add demographic data to final_data by CaregiverID
  merge(master_dem_final, 
        by = "CaregiverID", 
        all.x = T) %>%
  # filter out INRs too high or too low (N = 125) or missing (9%)
  dplyr::filter(abs(log_INR) <= 4) %>%
  # grand mean center INR
  mutate(centered_logINR = log_INR - mean(log_INR),
         # factor income group
         income_group = factor(income_group, 
                               levels = c("Below FPL",
                                          "100-200% Above FPL",
                                          "200-400% Above FPL",
                                          "400%+ Above FPL")),
         aid_group = factor(ReceivedAid_basic1,
                            levels = c("0", "1")),
         Health_Medical_Services = factor(Health_Medical_Services,
                                          levels = c("0", "1")),
         Food_Benefits = factor(Food_Benefits,
                                levels = c("0", "1"))
         ) 

final_data <- final_data_prep %>%
  select(CaregiverID, 
         # select potential dependent variables
         fs_num, fs_any, fs_food, fs_healthcare,
         # select independent variables for model and plots
         log_INR, centered_logINR, income_group, aid_group,
         COLI, COLI_Merge, COLI_Merge_food, COLI_Merge_health,
         # select COLI median variables
         contains("median"),
         aid_group, Health_Medical_Services, Food_Benefits,
         # select potential demo + control variables
         race_ethnic, Page_impute, Pgender, Pedu, region
  ) %>%
  # condense Pedu variable
  mutate(Pedu_condensed = factor(
    case_when(
      Pedu %in% c("Less than high school", "Some high school", "High school diploma/GED") ~ "High School/GED or Below",
      Pedu %in% c("Some college", "Associate degree", "other") ~ "Some College/Associate Degree",
      Pedu == "Bachelor's degree" ~ "Bachelor's Degree",
      Pedu %in% c("Master's degree", "Doctorate/Professional") ~ "Postgraduate Degrees",
      TRUE ~ "Other"
    ),
    levels = c("High School/GED or Below", "Some College/Associate Degree", "Bachelor's Degree", "Postgraduate Degrees", "Other"),
    ordered = TRUE
  )) %>%
  mutate(aid_group = as.factor(aid_group),
         Page_impute = as.numeric(Page_impute),
         race_ethnic = factor(race_ethnic,
                              levels = c("White", "Black", "Latinx", "Other minorities"),
                              ordered = FALSE))

# Summary of variables in analysis

sum(is.na(final_data$fs_num))

## [1] 0

sum(is.na(final_data$COLI)) # 3368 (20%)

## [1] 3368

sum(is.na(final_data$log_INR))

## [1] 0

sum(is.na(final_data$aid_group)) # 3873 (25%)

## [1] 3873

################################ IMPUTATION ###################################

# impute the missing data
imputed_data <- final_data %>%
  # select on necessary variables
  select(fs_num, COLI, log_INR, aid_group,
         Page_impute, Pedu_condensed, race_ethnic)

# use missForest package to impute from Mateus suggestions / code
imputed_data <- missForest(imputed_data)$ximp

imputed_data_final <- imputed_data %>%
  # Add groups for new graph with imputed data
  mutate(coli_group = case_when(COLI >= 1 ~ "High CoL",
                                COLI < 1 ~ "Low CoL"),
         # use new log_INR thresholds to keep income groups the same
         income_group = factor(
           case_when(
             log_INR < 0 ~ "Below FPL",
             log_INR >= 0 & log_INR < 0.693 ~ "100-200% Above FPL",
             log_INR >= 0.693 & log_INR < 1.386 ~ "200-400% Above FPL",
             log_INR >= 1.386 ~ "400%+ Above FPL"
           ),
           levels = c("Below FPL",
                      "100-200% Above FPL",
                      "200-400% Above FPL",
                      "400%+ Above FPL")),
         fs_any = ifelse(fs_num > 0, 1, 0)
  )

3 Graphs

3.1 1-way Graphs (Pre-Imputation)

The following graphs show proportion of families experiencing material hardship by each of the following groupings:

Income Groups by Federal Poverty Line (FPL)
Cost of Living (High vs Low)
Received Aid (Yes - 1 or No - 0)

3.2 2-way graphs

Next, we show 2-way graphs showing different combinations of the following:

By income and cost of living (CoL)
By income and aid reception
By CoL and Aid

3.3 3-way graph

Now we show a 3-way graph of each of the primary variables

4 Graphs Using Imputed Data

Given the large percentage of data missing from CoL and Aid, we also provide below graphs using the imputed data

Exploring the effect of Income on Material Hardship in Families Across Diverse Cost of Living Settings and Receiving Aid

Gabriel Reyes

2024-09-16