Introduction to the Care Board Methodology

The Care Board serves as an online dashboard designed to present comprehensive statistics and insights into the Care Economy - A critical yet often overlooked sector encompassing both paid and unpaid care giving activities. The care economy includes all tasks related to caring for oneself and others, such as nursing, teaching, child-rearing, and assisting elderly relatives. These responsibilities form a significant portion of individuals’ daily lives, whether through professional roles or unpaid domestic labor.

Despite its essential role in sustaining individuals, families, and society, much of this care work remains largely invisible within formal economic statistics. For instance, the Bureau of Labor Statistics (BLS) tracks child care provided by paid professionals, yet identical activities performed by parents remain unaccounted for. This discrepancy highlights the broader issue of how care work is valued and measured within traditional economic frameworks.

The Care Board aims to bridge this gap by providing a centralized platform for measuring, analyzing, and studying care giving activities, regardless of whether they are compensated. By offering novel data and statistics, the Care Board seeks to foster meaningful discussions among researchers, policymakers, and the general public, bringing greater visibility to the challenges faced by caregivers nationwide.

This document acts as a methodology review for all statistics presented on the Care Board. Many of the statistics presented in the Care Board are novel in that they were developed specifically for the Care Board. These statistics thus represent a new way of viewing the economy through the lens of care giving. While many of these statistics will be paired with academic peer reviewed papers, this document offers a single location to review all statistics and methodologies for the Care Board. In spots needed, links to working papers or peer reviewed articles will be included when available.

This R Markdown file will act as the final methods repository for all analysis on the Care Board website. All data presented and available for download in the Care Board will be discussed in this document. This document will provide all needed information and code to replicate any data available on the Care Board. For each statistic, this file will specifically walk users through the formation of the statistics starting with the raw data and ending with the visible statistic as it is displayed on the Care Board. Any choices, hurdles, and assumptions made along the way will be laid out for critical analysis.

To use this document, move to the section of the statistic that you are most interested in learning about. When in this section, start by looking over the raw data input requirements and then review the code and explanations. For certain areas of the code, there will be references to appendix sections. The sections of the appendix represent preliminary coding decisions, data, and methods used prior to the formation of the statistics. These preliminary codes often feed into multiple statistics and thus are referenced at the end of this document in an appendix.

For using any data or code from this page we kindly ask that the data are appropriately cited. Publications and reports should include the appropriate version of the citation as follows:

Misty Heggeness, Joseph Bommarito, Lucie Prewett, Pilar Mcdonald. CareBoard: Version 1.0 [dataset]. Lawrence, KS: Careboard, 2025.

Preliminary tasks

Before running any code, the following preliminary tasks will need to be done. This code provided at the beginning of this document must be run before any code in any other section. This code installs relative packages and sets the working directory to be used by all other sections. Ensure that you run this code prior to any others or else you are likely to receive errors. Additionally please ensure that you change the working directory to fit the location of your own data locations. The change of the working directory should be the only change needed to succesefully run the code in this document.

The first step is to install the required packages. While some statistics require some specific packages to run, other packages are needed to for more general data handling. These packages are loaded and described here.

if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")
pacman::p_load(
  ipumsr,
  tidyverse,
  haven,
  data.table,
  Hmisc,
  DT,
  forcats,
  DescTools
)

pacman: is a package used to load other packages. This package checks to see if the other package is installed on the user’s computer. In the case it is not installed, pacman will install it prior to loading it from the library. In the case it is installed, pacman will skip installation and load the package directly from the library.
tidyverse: is a commonly used data handling package in the R environment. Tidyverse is used to provide more streamlined and readable coding with the goal of allowing easier access to replication files. Whenever possible, code in this document is conducted via the tidyverse methodology as opposed to base R.
haven: is a package used for reading and writing certain data formats. For the purpose of this documentation, this package is mostly used for the purpose of writing datafiles as STATA .dta files.
data.table: is a package used to efficiently load and write csv files. Large csv files can be resource intensive to load in as a dataset. This package allows them to be loaded in as a table and then worked with directly in the R environment.
Hmisc is used to handle survey research and is primarily used in the below code to apply survey weights to statistics creating population valid estimates.
DT The DT package is used exclusively for this RMD file and is used to provide more readable tables that can be viewed of the data within the HTML output.
DescTools Provides a variety of functions used to describe datasets, most noteably we utilize the Gini command in this package.

setwd("C:/Users/sc363/OneDrive/Work Items/Workspace/CareBoard/CareBoard/")

To load data, you must set the working directory to the file location where your data is stored. Modify the code chunk above with the correct file path for your personal machine to download the data. This is the one code in this document which you will need to personalize to ensure it runs properly.

Care needs and provision

The first section of the Care Board provides a measurement of the need and the provision of care throughout the US population. Care need represents the amount of time an individual requires care giving during a day. Care provision represents the amount of time an individual is able to provide care during a day. The need and provision of care is split among 3 categories, developmental, health, and daily living. Developmental care is care related to the task of providing growth to children and education. Health care is care related to the task of providing physical and mental care assisting with the health of another individual. Elder care is related to this specific care category. Finally daily living care is care related to general daily activities such as cooking and cleaning. This section of the methodology provides all data and code used to create the tables that feed into the first figure on the Care Economy page. The final results of this data and code will be available for download as a CSV, DTA, and Excel file via the data tab on the Care Board.

The data and methodology in this section is specifically used to feed into the area chart on the Care Economy page.

Age Data

The first section of the Care Economy page outlines the market of care giving in the US by age. To display this data we need to start off with the following two variables.

age - to represent each individual age group in the United States.
population - to represent the population of the US at each age.

age_data <- fread("02_data-prep-and-cleaning/02_ASECdata.csv") %>%
  filter(YEAR == 2024) %>%
  select(AGE, ASECWT)

ipums_conditions()

## No conditions available.

The data for these variables come from the most recent ACES survey accessed through the IPUMS interface. For more information on how the ACES data is coded and transformed please see the Appendix.

Steven Ruggles, Sarah Flood, Matthew Sobek, Daniel Backman, Annie Chen, Grace Cooper, Stephanie Richards, Renae Rodgers, and Megan Schouweiler. IPUMS USA: Version 15.0 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D010.V15.0

This datafile has the following format.

summary(age_data)

##       AGE            ASECWT        
##  Min.   : 0.00   Min.   :   99.43  
##  1st Qu.:18.00   1st Qu.: 1211.79  
##  Median :38.00   Median : 2199.17  
##  Mean   :38.97   Mean   : 2303.97  
##  3rd Qu.:58.00   3rd Qu.: 3076.02  
##  Max.   :85.00   Max.   :21488.61

Using this data, we provide small mutations to create a framework that fits with our visualization. We first rename the column AGE to age to suit our rules on variable naming conventions. Many datasets we use for this project follow the convention of top coding ages at 85. By matching this convention we have smoother transitions between visuals. The ASEC data has a maximum value of 85 meaning anyone over the age of 85 is included in this age bracket.

After conducting these mutations, we summarise the data by calculating the weighted sum of the population for each age. Using the survey weight we are able to get a population level estimage of the population by age. The weighted sum is calculated using the ASECWT variable from the original data which represents the respondents population survey weight.

age_data <- age_data %>%
  rename(age = AGE) %>%
  group_by(age) %>%
  summarise(
    population = sum(ASECWT, na.rm = TRUE)
  )

The results of this transformation look like this. We’ve restricted this data to having only two columns, one for age and one for the associated population.

datatable(age_data, options = list(pageLength = 100))

We then need to double check this data to ensure that it is in fact correct. We know roughly what the US population is, in order to ensure the data fits our understanding we can take the sum of the population column. If this number is off what the population of the US is expected to be, it is a sign of an issue in the run code.

message("Total US population estimate: ", sum(age_data$population))

## Total US population estimate: 332382485.08

Market Datum

If the above number is significantly off your understanding of what the USA population is, please double check the code that you ran before proceeding. Please note as well, that we utilize ipums ASEC data pulls to gather our numbers which tend to lag behind official census reports. Thus, the number presented above could be at most one year older than the current official reports, depending on the timing of the release. For example, the year currently represents 2024 data despite it being 2025 upon release.

Now that we have the above population data, we want to add a few more variables representing the need and provision of care throughout the population.

care_focus - valued at developmental, daily_living, or health this variable will determine the different care focuses that are analyzed in the dashboard and will provide a different need and provision valuation for each.
need_interval - provides the estimated number of minutes in a day a person of the associated age needs care for each specific focus.
provision_interval - provides the estimated number of minutes in a day a person of the associated age provides care for each specific focus.

To start, let’s create a blank data frame of the ages and care_focuses that we will populate with the need and care information. This code expands out each age to have an observation for each care focus using the expand.grid command. This code then creates two new interval columns each of which is valued at NA. We will populate these NA columns in the upcoming code chunks

age <- age_data$age
care_focus <- c("developmental", "daily_living", "health")

market_datum <- expand.grid(
  age = age,
  care_focus = care_focus
) %>%
  mutate(
    need_interval = NA,
    provision_interval = NA
  )

summary(market_datum)

##       age                care_focus need_interval  provision_interval
##  Min.   : 0.00   developmental:82   Mode:logical   Mode:logical      
##  1st Qu.:20.00   daily_living :82   NA's:246       NA's:246          
##  Median :40.50   health       :82                                    
##  Mean   :40.55                                                       
##  3rd Qu.:61.00                                                       
##  Max.   :85.00

After creating a blank data frame, we will load in the data extract from the American Time Use Survey (ATUS). For more information on the creation of this data extract and coding of variables see Appendix A: ATUS Methods. When loading in the ATUS data, we restrict the data to the years 2023, 2022, 2021, 2019, and 2018. Utilization of a 5 year rolling average is beneficial in allowing us to be protected from potential outlines that may influence estimates due in large part to small sample sizes in the yearly ATUS survey. However, utilization of five year rolling average will make the estimates less responsive to sudden shifts in community behavior. Additionally, the year 2020 is excluded from analysis due to the COVID-19 pandemic leading to a severe disruption in the ATUS survey implementation.

After loading in the ATUS data, we need to split responses for primary active care giving and secondary care giving. The variable SCC_ALL_LN provides time spent during an activity on secondary care giving to a child. For instance, a primary activity might be cooking, and during this time activity, the respondent was also providing care and supervision over their child. These secondary care giving times mark a large amount of care giving that exists in the data. The variable SEC_ALL_LN provides the same information for secondary care giving of elderly adults in the household.

In the data, these secondary time uses are separate columns from the main activities. We want to pull out these observations and make them their own activity observations. The chunk below does this and then binds all data together so that both primary and secondary activities are coded as distinct observations. This code also revalues nas in the Elder care and Child care datasets as 0s. NAs in this column represent 0 values (See ATUS Methods appendix) and thus must be updated to allow us to correctly filter observations.

atus <- fread("02_data-prep-and-cleaning/02_ATUSdata.csv") %>%
  select(CASEID, ACTLINE, YEAR, HH_SIZE, formal_care_focus, marst, nchild, ChildCare, ElderCare, SCC_ALL_LN, SEC_ALL_LN, FOCUS, DURATION, AGE, PaidWork, WT06) %>%
  filter(YEAR != 2020) %>%
  filter(YEAR >= 2018) %>%
  mutate(care_job = ifelse(formal_care_focus == "none", 0, 1))%>%
  mutate(ChildCare = ifelse(is.na(ChildCare), 0, ChildCare))%>%
  mutate(ElderCare = ifelse(is.na(ElderCare), 0, ElderCare))

atus$time_use <- "primary"

atus_secondary <- atus %>%
  filter(SCC_ALL_LN > 0) %>%
  mutate(
    FOCUS = "developmental",
    DURATION = SCC_ALL_LN,
    time_use = "secondary"
  )

atus <- bind_rows(atus, atus_secondary)

atus_secondary <- atus %>%
  filter(SEC_ALL_LN > 0) %>%
  mutate(
    FOCUS = "health",
    DURATION = SEC_ALL_LN,
    time_use = "secondary"
  )

# Append modified rows to the original dataset
atus <- bind_rows(atus, atus_secondary)

summary(atus)

##      CASEID                  ACTLINE           YEAR         HH_SIZE      
##  Min.   :20180101180006   Min.   : 1.00   Min.   :2018   Min.   : 1.000  
##  1st Qu.:20190201191515   1st Qu.: 5.00   1st Qu.:2019   1st Qu.: 2.000  
##  Median :20210303212088   Median :10.00   Median :2021   Median : 2.000  
##  Mean   :20204886382012   Mean   :11.53   Mean   :2020   Mean   : 2.706  
##  3rd Qu.:20220806221223   3rd Qu.:16.00   3rd Qu.:2022   3rd Qu.: 4.000  
##  Max.   :20231212232280   Max.   :72.00   Max.   :2023   Max.   :14.000  
##                                                                          
##  formal_care_focus     marst               nchild         ChildCare      
##  Length:918510      Length:918510      Min.   :0.0000   Min.   :0.00000  
##  Class :character   Class :character   1st Qu.:0.0000   1st Qu.:0.00000  
##  Mode  :character   Mode  :character   Median :0.0000   Median :0.00000  
##                                        Mean   :0.8531   Mean   :0.05508  
##                                        3rd Qu.:2.0000   3rd Qu.:0.00000  
##                                        Max.   :9.0000   Max.   :1.00000  
##                                                                          
##    ElderCare        SCC_ALL_LN        SEC_ALL_LN         FOCUS          
##  Min.   :0.0000   Min.   :  0.000   Min.   :  0.000   Length:918510     
##  1st Qu.:0.0000   1st Qu.:  0.000   1st Qu.:  0.000   Class :character  
##  Median :0.0000   Median :  0.000   Median :  0.000   Mode  :character  
##  Mean   :0.0137   Mean   :  9.908   Mean   :  1.001                     
##  3rd Qu.:0.0000   3rd Qu.:  0.000   3rd Qu.:  0.000                     
##  Max.   :1.0000   Max.   :900.000   Max.   :922.000                     
##                                                                         
##     DURATION            AGE          PaidWork           WT06          
##  Min.   :   1.00   Min.   :16.0   Min.   :1        Min.   :   719247  
##  1st Qu.:  15.00   1st Qu.:37.0   1st Qu.:1        1st Qu.:  4650918  
##  Median :  30.00   Median :50.0   Median :1        Median :  7735671  
##  Mean   :  73.97   Mean   :51.3   Mean   :1        Mean   : 10277023  
##  3rd Qu.:  90.00   3rd Qu.:66.0   3rd Qu.:1        3rd Qu.: 12779696  
##  Max.   :1310.00   Max.   :85.0   Max.   :1        Max.   :194366930  
##                                   NA's   :859632                      
##     care_job        time_use        
##  Min.   :0.0000   Length:918510     
##  1st Qu.:0.0000   Class :character  
##  Median :0.0000   Mode  :character  
##  Mean   :0.1945                     
##  3rd Qu.:0.0000                     
##  Max.   :1.0000                     
##

Now that we have the loaded time use data, we need to add and mutate a few of the variables to get them in the correct format. First, for the weight variable, to get a daily weight we need to divide by 365 and then divide again by 5. The WT06 variable gives us a value of the population suitable for a calendar year, dividing by 365 (366 in a leap year) accounts for this. We then divide by 5 to account for the fact that we have five years of data.

We then create a variable called “worktime” that is the value of the variable DURATION X PaidWork X care_job. DURATION is a numeric count of minutes spent in an activity, PaidWork is a binary variable valued at 1 if the activity consists of paid work and 0 otherwise, care_job is a binary variable values at 1 in the case the formal work is coded as a care economy job and 0 otherwise. This new variable thus represents the time spent in an activity only if that time is spent both in paid work and working in a care sector job. This will be important to calculate the amount of time different groups provide care later on.

For more information about how care economy jobs are coded, see Appendix: Coding Care Economy Occupations.

Following this we load in and merge a hierarchical version of the ATUS data. ATUS data is in two formats. The format used in the first data loaded is rectangular activity data which contains activity records and requested person information attached to each activity record. Hierarchical data contains a distinct household record followed by separate person records for other individuals within the household. We utilize the hierarchical data to understand people with whom activities are conducted with. The variable RELATEW provides codes for whom each activity was conducted with, for instance alone, with a child, or with a spouse. We load in this hierarchical data and then merge the RELATEW data into the other dataset by matching based on the CASEID person identifier and the ACTLINE unique activity identifier number. For instance, person id 1 activity 1 (representing the first activity recorded by the first person in the data) will be paired between the two datasets and RELATEW will be added to the atus data.

atus$weight = atus$WT06/365/5

atus$worktime <- atus$DURATION*atus$PaidWork*atus$care_job

ddi_file <- read_ipums_ddi("02_data-prep-and-cleaning/atus_00026.xml")
atus_H <- read_ipums_micro(ddi_file)

## Use of data from IPUMS ATUS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.

atus$CASEID <- as.numeric(atus$CASEID)

atus <- atus %>%
  left_join(atus_H %>% select(CASEID, ACTLINEW, RELATEW),
            by = c("CASEID" = "CASEID", "ACTLINE" = "ACTLINEW"))

## Warning in left_join(., atus_H %>% select(CASEID, ACTLINEW, RELATEW), by = c(CASEID = "CASEID", : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 47 of `x` matches multiple rows in `y`.
## ℹ Row 381 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

summary(atus)

##      CASEID             ACTLINE           YEAR         HH_SIZE      
##  Min.   :2.018e+13   Min.   : 1.00   Min.   :2018   Min.   : 1.000  
##  1st Qu.:2.019e+13   1st Qu.: 6.00   1st Qu.:2019   1st Qu.: 2.000  
##  Median :2.021e+13   Median :10.00   Median :2021   Median : 3.000  
##  Mean   :2.020e+13   Mean   :11.79   Mean   :2020   Mean   : 3.059  
##  3rd Qu.:2.022e+13   3rd Qu.:16.00   3rd Qu.:2022   3rd Qu.: 4.000  
##  Max.   :2.023e+13   Max.   :72.00   Max.   :2023   Max.   :14.000  
##                                                                     
##  formal_care_focus     marst               nchild        ChildCare      
##  Length:1198564     Length:1198564     Min.   :0.000   Min.   :0.00000  
##  Class :character   Class :character   1st Qu.:0.000   1st Qu.:0.00000  
##  Mode  :character   Mode  :character   Median :1.000   Median :0.00000  
##                                        Mean   :1.143   Mean   :0.07324  
##                                        3rd Qu.:2.000   3rd Qu.:0.00000  
##                                        Max.   :9.000   Max.   :1.00000  
##                                                                         
##    ElderCare         SCC_ALL_LN       SEC_ALL_LN         FOCUS          
##  Min.   :0.00000   Min.   :  0.00   Min.   :  0.000   Length:1198564    
##  1st Qu.:0.00000   1st Qu.:  0.00   1st Qu.:  0.000   Class :character  
##  Median :0.00000   Median :  0.00   Median :  0.000   Mode  :character  
##  Mean   :0.01369   Mean   : 15.98   Mean   :  1.005                     
##  3rd Qu.:0.00000   3rd Qu.: 15.00   3rd Qu.:  0.000                     
##  Max.   :1.00000   Max.   :900.00   Max.   :922.000                     
##                                                                         
##     DURATION            AGE           PaidWork            WT06          
##  Min.   :   1.00   Min.   :16.00   Min.   :1         Min.   :   719247  
##  1st Qu.:  15.00   1st Qu.:35.00   1st Qu.:1         1st Qu.:  4567399  
##  Median :  30.00   Median :45.00   Median :1         Median :  7539175  
##  Mean   :  70.55   Mean   :48.86   Mean   :1         Mean   : 10070500  
##  3rd Qu.:  85.00   3rd Qu.:63.00   3rd Qu.:1         3rd Qu.: 12449980  
##  Max.   :1310.00   Max.   :85.00   Max.   :1         Max.   :194366930  
##                                    NA's   :1128987                      
##     care_job        time_use             weight            worktime      
##  Min.   :0.0000   Length:1198564     Min.   :   394.1   Min.   :   0.0   
##  1st Qu.:0.0000   Class :character   1st Qu.:  2502.7   1st Qu.:   0.0   
##  Median :0.0000   Mode  :character   Median :  4131.1   Median :   0.0   
##  Mean   :0.2057                      Mean   :  5518.1   Mean   :  40.8   
##  3rd Qu.:0.0000                      3rd Qu.:  6821.9   3rd Qu.:  15.0   
##  Max.   :1.0000                      Max.   :106502.4   Max.   :1310.0   
##                                                         NA's   :1128987  
##     RELATEW    
##  Min.   : 100  
##  1st Qu.: 100  
##  Median : 202  
##  Mean   :1545  
##  3rd Qu.: 401  
##  Max.   :9998  
##

The next step is determining the need of care giving throughout the population. For each age, we want to know how much care in each of the different focuses is needed. While we can make a variety of assumptions about how much care is needed, we want to be as data driven as possible. Thus, to measure care needs we look at the amount of care individuals give to themselves when they have no one else they are responsible for or to help them. Within a household of multiple people, alotting care giving can be complicated and is often an important economic arrangement. For instance, economies of scale hold for most households resulting in different activities being conducted by only one person instead of both. As an example, each person needs to spend time preparing food during the day (assuming they don’t eat out). When two people live together, the time spent cooking might increase but is unlikely to double and might be only done by one of the two individuals, freeing the other up to spend less time on care activities or to focus on other care activies (such as doing the dishes). These household arrangements can make it difficult to determine exactly how much care is needed at an individual level.

We need to avoid measuring these households. To avoid them we filter the data to only include individuals, living alone, without a spouse or child, who spend no time in the day on child or elder care giving. We further filter these individuals to only include tasks that they did alone, so this data will not include tasks conducted with any outside house members. These are the individuals who have no one to balance out workloads with to apply economies of scale. Thus we can relatively safely assume these individuals are doing 100% of the care giving that the need in their household. We also can safetly assume that these individuals are the sole recipients of care giving within their household. These individuals are not providing care to other people or recieving it from other people and thus provide accurate measures of care need when looking at their daily activities. While the above assumptions might be justified to be made, it is important to note they are nonetheless assumptions and there are likely activities that are missed. For example, a person spending time making food which they plan on delivering to their elderly parents who don’t live with them on a future date, would likely not be coded as providing eldercare. At the same time, eating food that a mother dropped off the previous day, and thus spending less time cooking, would not be coded as recieving care. While a worry, these instances are likely uncommon in the data and thus it is fair to proceed.

The code below provides a loop to go through each age. In each loop the data is first filtered via the above criteria. Following that we calculated the average time spent in each care_focus using a weighted mean approach. Utilization of weighted means allows us to produce population level estimates. This estimate is then used as our measure of how muich time is needed by people within this age group. It is important to note, that this measure represents the average for the population, including people of all health statuses. It is likely that some individuals who suffer health issues will require significantly more care than this estimate provides. However, this code seeks to provide only population level estimates without looking at the specific characteristics.

for(a in age){
  data <- atus %>%
    filter(HH_SIZE == 1) %>% # Look at people living by themselves
    group_by(CASEID) %>%
      filter(all(SCC_ALL_LN == 0)) %>%
      filter(all(SEC_ALL_LN == 0)) %>%
      filter(all(ChildCare == 0)) %>%
      filter(all(ElderCare == 0)) %>%
    ungroup() %>%
    filter(RELATEW == 100) %>%
    filter(AGE == a | AGE == a-1 | AGE == a+1 | AGE == a-2 | AGE == a+2) %>% # 5-year lag group
    group_by(FOCUS, CASEID) %>% # Estimates for each individual
    summarise(
      Duration = sum(DURATION, na.rm = TRUE),
      weight = first(weight),
      .groups = "drop"
    ) %>%
    ungroup() %>%
    filter(FOCUS != "None") %>% # Aggregate by care focus
    group_by(FOCUS) %>%
    summarise(
      need_interval = weighted.mean(Duration, w = weight, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    rename(care_focus = FOCUS) %>%
    filter(care_focus != "none")
  
  data$age <- a
  
  market_datum <- market_datum %>%
    left_join(data %>% select(age, care_focus, need_interval), 
              by = c("age", "care_focus")) %>%
    mutate(need_interval = coalesce(need_interval.x, need_interval.y)) %>%
    select(age, care_focus, need_interval, provision_interval)
}

market_datum$need_interval[is.na(market_datum$need_interval)] <- 0

While the above code gives us information for adults, it isn’t useful for those under the age of 15. The ATUS survey specifically limits its sample size to those aged 15 and above. We thus need to utilize a series of assumptions to create estimates on those below this age. At the same time, we doubt that those in this sample ages 15 to 17 are a perfect sample of the population. This group represents people in this age group who claim to live alone. This is a very small group in the data and we believe this group to be unrepresentative of the true factors.

We therefore use a variety of assumptions to determine the time needs for these groups. For those under the age of 12, we assume that 24 hours of total care time is necessary. For those over the age of 12 but still under the age of 18 we assume that 16 hours total care time is needed. Many states have laws forbidding those under the age of 12 from being left alone, which is why we make this assumption for that group. These states argue that anyone under the age of 12 requires constant supervision and caregiving. While it is true that these individuals might not require 24 hours of direct primary care, they do likely require anytime not spent in primary care activity to be spent in secondary care activities. For those between 12 and 18, our number comes from a middle ground between what an 18 year old seems to need and what a child who is not yet fully independent might need.

The code below assigns values to age ranges of under 5, 5-12, and 13-17 for each of the three care focuses based on assumptions. These assumptions come from a review of data, talking to care-givers, and review of the literature. However, it is important to understand that these assumptions are not fully data driven and thus present potential bias. Future research should seek to better understand the exact care demands faced by children in the US.

under5_health <- 300
five_twelve_health <- 160
twelve_eighteen_health <- 120

under5_develop <- 420
five_twelve_develop <- 480
twelve_eighteen_develop <- 360

under5_daily <- 1440 - under5_health - under5_develop
five_twelve_daily <- 1440 - five_twelve_health - five_twelve_develop
twelve_eighteen_daily <- 960 - twelve_eighteen_health - twelve_eighteen_develop

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "health"] <- under5_health
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "health"] <- five_twelve_health
market_datum$need_interval[market_datum$age > 12 & market_datum$age <= 18 & market_datum$care_focus == "health"] <- twelve_eighteen_health

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "developmental"] <- under5_develop
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "developmental"] <- five_twelve_develop
market_datum$need_interval[market_datum$age > 12 & market_datum$age <= 18 & market_datum$care_focus == "developmental"] <- twelve_eighteen_develop

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "daily_living"] <- under5_daily
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "daily_living"] <- five_twelve_daily
market_datum$need_interval[market_datum$age > 12 & market_datum$age <= 18 & market_datum$care_focus == "daily_living"] <- twelve_eighteen_daily

Running all the above code will fill in the columns related to need for care.

summary(market_datum)

##       age         care_focus        need_interval    provision_interval
##  Min.   : 0.00   Length:246         Min.   :  0.00   Mode:logical      
##  1st Qu.:20.00   Class :character   1st Qu.:  0.00   NA's:246          
##  Median :40.50   Mode  :character   Median : 96.52                     
##  Mean   :40.55                      Mean   :156.75                     
##  3rd Qu.:61.00                      3rd Qu.:178.22                     
##  Max.   :85.00                      Max.   :800.00

We still have to fill in the columns related to the provision of care. Measuring the care provision can be done in a similar way to above where we look at the average ability of a person by age to provide care. However, unlike above we do not subset our data to only include people living alone. We instead look at the total amount of time within a market where an individual is able to provide care.

To do this we add up a person’s time spent in the following categories. 1) Paid work in the formal care economy. 2) Unpaid work in the informal care economy. 3) Secondary child care and elder care.

This gives us a distribution of the total time spent on care provision. For each of these categories we specifically look at the 75th percentile of individuals to pull out care provision. This represents the amount to which only 25% of individuals provide more care. We believe this is a conservative estimate about how much someone is able to provide care, allowing for them to have some time related to leisure and self-care along with paid work. While some people will provide more care than this, meaning it is technically possible, research looking at the mental health burden of caregiving shows that many people provide more than the healthy amount of time providing care. It is also possible that people at the maximum end of this distributino are not representative of what the average person is capable of doing. We thus want to use a value lower than the max amount to determine what is possible. The 75th percentile is a reasonable estimate for this statistic. It is nontheless important to have better discussions on what is the appropriate amount of time that a care giver should be expected to spend throughout their day.

The code below looks through ages in the same method as the previous chunck. Within each loop this code adds information about the time spent in the three care giving catagories using the weighted median approach. This code then adds this data to the dataset.

for(a in age){
  # Calculate work hours
  work_hours <- atus %>%
    filter(time_use == "primary") %>%
    filter(AGE %in% c(a-2, a-1, a, a+1, a+2)) %>%
    group_by(CASEID, formal_care_focus) %>%
    summarise(
      Duration = sum(worktime, na.rm = TRUE),
      weight = first(weight),
      .groups = "drop"
    ) %>%
    reframe( # Use reframe instead of summarise to allow multiple rows
      Duration = wtd.quantile(Duration, weights = weight, probs = 0.75, normwt = FALSE)
    )
  
  work_hours <- as.numeric(work_hours / 3)
  
  # Calculate care provision intervals
  data <- atus %>%
    filter(time_use == "primary") %>%
    filter(AGE %in% c(a-2, a-1, a, a+1, a+2)) %>%
    group_by(FOCUS, CASEID) %>%
    summarise(
      Duration = sum(DURATION, na.rm = TRUE),
      weight = first(weight),
      .groups = "drop"
    ) %>%
    filter(FOCUS != "None") %>%
    group_by(FOCUS) %>%
    reframe( # Use reframe to allow multiple rows
      provision_interval = wtd.quantile(Duration, weights = weight, probs = 0.75, normwt = FALSE)
    ) %>%
    rename(care_focus = FOCUS)
  
  # Adjust provision_interval with work_hours
  data$provision_interval <- data$provision_interval + as.numeric(work_hours)  
  data$age <- a
  
  # Merge with market_datum
  market_datum <- market_datum %>%
    left_join(data %>% select(age, care_focus, provision_interval), 
              by = c("age", "care_focus")) %>%
    mutate(provision_interval = coalesce(provision_interval.x, provision_interval.y)) %>%
    select(age, care_focus, need_interval, provision_interval)
}

Just like for care need, this provides us only information for adults in the data. We need to generate assumptions for the amount of care provision children are doing. We assume that individuals under the age of 18 are providing 0 care. We know this is likely not exactly true, but we believe it is a safe assumption to make as it’s best to not rely on these age groups and many in these age groups do not provide care. We’ll talk about in the next section how we can relax this assumption but for now we insert a zero value for these age catagories.

## Now we need to do the same thing for children.
## For now we will do the same as with elderly and assume they on average provide 0 care.

under5_health <- 0
five_twelve_health <- 0
twelve_eighteen_health <- 0

under5_develop <- 0
five_twelve_develop <- 0
twelve_eighteen_develop <- 0

under5_daily <- 0
five_twelve_daily <- 0
twelve_eighteen_daily <- 0

market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "health"] <- under5_health
market_datum$provision_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "health"] <- five_twelve_health
market_datum$provision_interval[market_datum$age > 12 & market_datum$age < 18 & market_datum$care_focus == "health"] <- twelve_eighteen_health

market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "developmental"] <- under5_develop
market_datum$provision_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "developmental"] <- five_twelve_develop
market_datum$provision_interval[market_datum$age > 12 & market_datum$age < 18 & market_datum$care_focus == "developmental"] <- twelve_eighteen_develop

market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "daily_living"] <- under5_daily
market_datum$provision_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "daily_living"] <- five_twelve_daily
market_datum$provision_interval[market_datum$age > 12 & market_datum$age < 18 & market_datum$care_focus == "daily_living"] <- twelve_eighteen_daily

One final transformation that needs to be made is related to the fact that the ASEC data combined all individuals aged 80-84 into the catagory of “80” and combines all individuals 85+ into the column of “85”. We thus need to add observations for the missing years of 81, 82, 83, and 84. To do this we will set these values equal to the value of 80.

missing_ages <- 81:84
care_focus_values <- unique(market_datum$care_focus)

reference_values <- market_datum %>% 
  filter(age == 80) %>% 
  select(care_focus, need_interval, provision_interval)

new_rows <- expand.grid(age = missing_ages, care_focus = care_focus_values) %>%
  left_join(reference_values, by = "care_focus")

market_datum <- bind_rows(market_datum, new_rows) %>%
  arrange(age, care_focus)

We now have fully populated data.

summary(market_datum)

##       age        care_focus        need_interval    provision_interval
##  Min.   : 0.0   Length:258         Min.   :  0.00   Min.   :  0.0     
##  1st Qu.:21.0   Class :character   1st Qu.:  0.00   1st Qu.: 93.0     
##  Median :42.5   Mode  :character   Median : 95.01   Median :190.0     
##  Mean   :42.5                      Mean   :153.31   Mean   :187.8     
##  3rd Qu.:64.0                      3rd Qu.:178.84   3rd Qu.:310.0     
##  Max.   :85.0                      Max.   :800.00   Max.   :422.0

We conduct one more step to finalize our estimates of care need and provision which involves applying a smoothing function. The data estimates above have some sharp peaks and valleys caused by outlines within specific age groups. One issue with bringing the data down to only single age bins, even when using five year averages, is that some ages have very few individuals. Smoothing functions allows us to help smooth over the outliers by letting us learn from the data around an observation to identify and decrease the impact of these values.

These smoothing functions also allow us to fill in the areas around our assumption borders. For instance, we know that those aged 12-18 provide some care giving and likely need less care giving than a 12 year old. We also know that this group is likely in more need and less provision than those aged 18 and above. Smoothing functions thus allows us to provide a smoother transition from the age of 12-18 which will blend this age group with the existing data allowing for a more accurate estimate. This also allows us to not be overly relient on our assumptions. We specifically utilize a LOESS methodology in the following code chunk.

The formula for the smoothing function is below. Of note, we bound the function at the minimum and max values meaning that no age group is being changed to increase above or below the current bounds of the data.

smooth_data <- function(df) {
  df %>%
    group_by(care_focus) %>%
    arrange(age) %>%  # Ensure data is ordered before smoothing
    mutate(
      smoothed_need = predict(loess(need_interval ~ age, data = cur_data(), span = 0.2), newdata = data.frame(age = age)), 
      min_val_need = min(need_interval, na.rm = TRUE),
      max_val_need = max(need_interval, na.rm = TRUE),
      smoothed_need = pmax(pmin(smoothed_need, max_val_need), min_val_need), # Ensure within bounds
      smoothed_prov = predict(loess(provision_interval ~ age, data = cur_data(), span = 0.3), newdata = data.frame(age = age)), 
      min_val_prov = min(provision_interval, na.rm = TRUE),
      max_val_prov = max(provision_interval, na.rm = TRUE),
      smoothed_prov = pmax(pmin(smoothed_prov, max_val_prov), min_val_prov) # Ensure within bounds
    ) %>%
    ungroup()
}

# Apply smoothing function to your dataset
market_datum <- smooth_data(market_datum) %>%
  select(age, care_focus, smoothed_need, smoothed_prov) %>%
  rename("need_interval" = smoothed_need) %>%
  rename("provision_interval" = smoothed_prov)

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `smoothed_need = predict(...)`.
## ℹ In group 1: `care_focus = "daily_living"`.
## Caused by warning:
## ! `cur_data()` was deprecated in dplyr 1.1.0.
## ℹ Please use `pick()` instead.

While the smoothing function is helpful for us to smooth over our estimates and assumptions, we nonetheless want to make sure that some of the assumptions are fimrly held. We believe that it is essential to reiterate that individuals under the age of 12 require 24 hours of care and are providing no care. The smoothing function adds a little bit of data here, such as 1 year olds are providing 5 minutes of health care. By repopulating these values, we overwrite the smoothed estimates for them while keeping the smoothed estimates for all other ages. We choose to keep the smoothed values for ages above 12.

under5_health <- 300
five_twelve_health <- 150

under5_develop <- 420
five_twelve_develop <- 480

under5_daily <- 1440 - under5_health - under5_develop
five_twelve_daily <- 1440 - five_twelve_health - five_twelve_develop

no_provision <- 0

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "health"] <- under5_health
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "health"] <- five_twelve_health

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "developmental"] <- under5_develop
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "developmental"] <- five_twelve_develop

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "daily_living"] <- under5_daily
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "daily_living"] <- five_twelve_daily


market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 13 & market_datum$care_focus == "health"] <- no_provision
market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 13 & market_datum$care_focus == "developmental"] <- no_provision
market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 13 & market_datum$care_focus == "daily_living"] <- no_provision

We can now look through the data for this table.

summary(market_datum)

##       age        care_focus        need_interval    provision_interval
##  Min.   : 0.0   Length:258         Min.   :  0.00   Min.   :  0.0     
##  1st Qu.:21.0   Class :character   1st Qu.: 26.61   1st Qu.:105.6     
##  Median :42.5   Mode  :character   Median :100.62   Median :182.1     
##  Mean   :42.5                      Mean   :154.61   Mean   :188.3     
##  3rd Qu.:64.0                      3rd Qu.:180.11   3rd Qu.:307.9     
##  Max.   :85.0                      Max.   :810.00   Max.   :411.1

Finalize Care Economy Data

Finally we merge this data with age_data which was previously created to add the popualtion value to this dataframe. This gives us the following final dataframe for analysis.

Care_Economy <- market_datum %>%
  left_join(age_data, by = "age")

datatable(Care_Economy, options = list(pageLength = 100))

This data provides all information by age on the amount of time spent needing care and providing care across our different care focuses. This information is useful for a variaty of reasons but most noteably can be used for understanding the time contraints on a variety of people which can be combined with models of employemnt and other activities.

Finally let’s view this data as a plot.

market_datum_long <- market_datum %>%
  pivot_longer(cols = c(need_interval, provision_interval), 
               names_to = "care_type", 
               values_to = "minutes")

plot <- ggplot(market_datum_long, aes(x = age, y = minutes, color = care_focus)) +
  geom_line(size = 1) + 
  geom_point(size = 2) + 
  facet_wrap(~care_type, scales = "free_y", labeller = labeller(care_type = c(need_interval = "Care Need", 
                                                                              provision_interval = "Care Provision"))) +
  scale_color_manual(values = c("daily_living" = "blue", "developmental" = "green", "health" = "red")) +
  labs(title = "Care Needs and Provision by Age",
       x = "Age",
       y = "Minutes",
       color = "Care Focus") +
  theme_minimal(base_size = 14) +
  theme(legend.position = "bottom",
        panel.grid.major = element_line(color = "grey85"),
        panel.grid.minor = element_blank())

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

print(plot)

Lives of Care (Provider Demographics)

One of the key questions we had when entering this project is who is providing care within the society? While a wide variety of previous literature has shown that care giving within society is often conducted by specific groups, such as women. In this project, we specifically seek to measure the time spent by different groups within society and understand how growth of care giving has shifted across groups and overtime. This section provides all code and replication files needed to understand how we aquired these estimates.

Provider Groups

The first step of this process is to identify the groups we think are of most interest. We call these groups provider groups. We reviewed the scholarly literature and the discussions throughout society to understand which groups might be expected to see imbalances and shifts in caregiving load/expectations. While this list of potential providers can be updated whenever desired, we start with what we have identified as the key groups of interest. We start by focusing on the following groups

Care Status Which will reflect an intersection of gender and parenthood. This is expected to see results based on gendered patterns in care giving as well as parents likely needing to propvide more care than non-parents.
Race Which represents the respondents identified race/ethnicity. Past literature has shown differnce in care-giving and access to formal care infrastructure across different races.
Marital Status Which represents the respondents relationship status. Depending on the relationship status of an individual, people might have access to more care help throughout their house.
Poverty Status Which represents the respondents level of poverty. Past literature has shown that individuals in poverty are likely to have decreased access to formal care infrastructure which might affect informal caregiving times.

The code chunk below creates the initial parts of a dataframe which will be populated including data on these providers. We have a column called name which represents the name of each potential provider group and a column called id, which is the name in kebaba case which will be useful for anyone seeking to work programatically with this data.

name <- c("Care Status", "Race", "Marital Status", "Poverty Status")
id <- c("care-status", "race", "marital-status", "poverty")

provider_group <- data.frame(id = id, name = name, stringsAsFactors = FALSE)

datatable(provider_group, options = list(pageLength = 100))

Providers

For each of these groups we next need to determine the specific categories of each gorup and the characteristics of the group that we want to calculate. For now, the only group characteristic that we’re going to be interested in will be the population. This section will thus look at each provider group and try to calculate the population of each specific category withing that group.

To accomplish this task, we utilize the yearly ATUS survey to cacluate populations. The use of this survey is a significant decision, as the census bureau prefers the usage of the CPS monthly survey to analyze demographic shifts in the USA. We utilzie the ATUS survey due to the fact that in the next code section, we will impute time use statistics relative to these providers. In future updates to the care board, we might seek to develop ways to fuse the monthly updates of the CPS survey and the yearly updates of the ATUS survey, but for now we will focus on using the ATUS survey to create these estimates.

The first step is to load in the ATUS data which we do via the code chunk below. Utilizing this data we then filter it to include the most recent five years of data (non inclusive of the year 2020). The year 2020 is excluded due to the COVID pandemic significantly affecting the implenetation of the ATUS survey.

The data and methodology in this section is specifically used to feed into the string chart on the Care Economy page.

atus <- read.csv("02_data-prep-and-cleaning/02_ATUSdata.csv") %>%
  filter(YEAR >= 2018) %>%
  filter(YEAR != 2020)

Following this, we create a dataset for each of our provider groups that include the individual catagories and the population within each. We due this using the survey wight variable WT06 to get population level estimates. We divide this by 365 in order to get daily level estimates. We further divide this by 5 to account for the fact that we have five years of aggregate data. For each of the provider groups identified above, we provide the code to create the population estimates and the results below. Please note that individuals under the age of 15 are not included in this data. As such, the sum of the population will not add up to the total sum of the US population.

The first provider group we analyze is the care status group which represents Mothers, Fathers, Childless Men, and Childless Women. The code below compiles the data for this group.

care_pop <- atus %>%
  group_by(care_status, CASEID) %>%
  summarise(
    care = first(care_status),
    weight = first(WT06)
  ) %>%
  ungroup() %>%
  group_by(care) %>%
  summarise(
    population = sum(weight)/365/5
  ) %>%
  rename(name = care)

## `summarise()` has grouped output by 'care_status'. You can override using the
## `.groups` argument.

care_pop$id <- str_to_lower(str_replace_all(care_pop$name, "[^a-zA-Z0-9]+", "-"))
care_pop$provider_group_id <- "care-status"

datatable(care_pop, options = list(pageLength = 100))

The next group that we include is race and ethnicity. We provide population estimates for these groups in the below code.

race_pop <- atus %>%
  group_by(race_ethnicity, CASEID) %>%
  summarise(
    race = first(race_ethnicity),
    weight = first(WT06)
  ) %>%
  ungroup() %>%
  group_by(race) %>%
  summarise(
    population = sum(weight)/365/5
  ) %>%
  rename(name = race)

## `summarise()` has grouped output by 'race_ethnicity'. You can override using
## the `.groups` argument.

race_pop$id <- str_to_lower(str_replace_all(race_pop$name, "[^a-zA-Z0-9]+", "-"))
race_pop$provider_group_id <- "race"

datatable(race_pop, options = list(pageLength = 100))

The next provider group that we want to look at is poeverty status. The code below provides the information related to the poverty status group. The poverty questino asked by ATUS is not available every year in the data. For this five year interval, this is question is only asked for the year 2023 and 2022 meaning we need to filter the data to only inclulde these years.

poverty_pop <- atus %>%
  filter(YEAR >= 2022) %>%
  group_by(poverty, CASEID) %>%
  summarise(
    poverty = first(poverty),
    weight = first(WT06)
  ) %>%
  ungroup() %>%
  group_by(poverty) %>%
  summarise(
    population = sum(weight)/365/2
  ) %>%
  rename(name = poverty) %>%
  filter(name != "NIU")

## `summarise()` has grouped output by 'poverty'. You can override using the
## `.groups` argument.

poverty_pop$id <- str_to_lower(str_replace_all(poverty_pop$name, "[^a-zA-Z0-9]+", "-"))
poverty_pop$provider_group_id <- "poverty"

datatable(poverty_pop, options = list(pageLength = 100))

The final group that we develop statistics for is marital status. The code below provides information related to this provider group.

marital_pop <- atus %>%
  group_by(marst, CASEID) %>%
  summarise(
    marital = first(marst),
    weight = first(WT06)
  ) %>%
  ungroup() %>%
  group_by(marital) %>%
  summarise(
    population = sum(weight)/365/5
  ) %>%
  rename(name = marital)

## `summarise()` has grouped output by 'marst'. You can override using the
## `.groups` argument.

marital_pop$id <- str_to_lower(str_replace_all(marital_pop$name, "[^a-zA-Z0-9]+", "-"))
marital_pop$provider_group_id <- "marital-status"

datatable(marital_pop, options = list(pageLength = 100))

Finally we will take these datasets and bind them together into a single provider table. This final table can be viewed below.

provider <- rbind(care_pop, race_pop, poverty_pop, marital_pop)
datatable(provider, options = list(pageLength = 100))

Provider Informal

The next set of code focuses on looking at each individual provider and calculating the time spent providing informal care giving. The code below focuses first on the care status provider group and looks at the time spent in care focuses for each of the provider groups. The provision interval column below represents the total amount of time spent by this group providing care in minutes. As can be seen, this is a very large number. In many figures we might expect to divide this by 60, in order to provide the estimates by hours. However, it will remain a very large number due to representing a population level estimate.

Of importance: this column does NOT represent the average amont of minutes this group spends in care giving, but instead represents the TOTAL amount of time spent by this group providing care throughout the entirety of the population.

care_formal <- atus %>%
  group_by(care_status, FOCUS, CASEID) %>%
  summarise(
    workduration = sum(DURATION),
    weight = first(WT06)/365/5
  ) %>%
  ungroup() %>%
  group_by(care_status, FOCUS) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = FOCUS) %>%
  rename(provider_id = care_status)

## `summarise()` has grouped output by 'care_status', 'FOCUS'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'care_status'. You can override using the
## `.groups` argument.

care_formal$provider_id <- str_to_lower(str_replace_all(care_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(care_formal, options = list(pageLength = 100))

The next code provides the same information for the race ethnicity provider groups.

race_formal <- atus %>%
  group_by(race_ethnicity, FOCUS, CASEID) %>%
  summarise(
    workduration = sum(DURATION),
    weight = first(WT06)/365/5
  ) %>%
  ungroup() %>%
  group_by(race_ethnicity, FOCUS) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = FOCUS) %>%
  rename(provider_id = race_ethnicity)

## `summarise()` has grouped output by 'race_ethnicity', 'FOCUS'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'race_ethnicity'. You can override using
## the `.groups` argument.

race_formal$provider_id <- str_to_lower(str_replace_all(race_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(race_formal, options = list(pageLength = 100))

The next table provides intervals for the poverty provider status groups. As a reminder, this data is only available for the years 2023 and 2022.

poverty_formal <- atus %>%
  filter(YEAR >= 2022) %>%
  group_by(poverty, FOCUS, CASEID) %>%
  summarise(
    workduration = sum(DURATION),
    weight = first(WT06)/365/2
  ) %>%
  ungroup() %>%
  group_by(poverty, FOCUS) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = FOCUS) %>%
  rename(provider_id = poverty)

## `summarise()` has grouped output by 'poverty', 'FOCUS'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'poverty'. You can override using the
## `.groups` argument.

poverty_formal$provider_id <- str_to_lower(str_replace_all(poverty_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(poverty_formal, options = list(pageLength = 100))

The final table provides provision interval estimates for marital status provider groups.

marital_formal <- atus %>%
  group_by(marst, FOCUS, CASEID) %>%
  summarise(
    workduration = sum(DURATION),
    weight = first(WT06)/365/5
  ) %>%
  ungroup() %>%
  group_by(marst, FOCUS) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = FOCUS) %>%
  rename(provider_id = marst)

## `summarise()` has grouped output by 'marst', 'FOCUS'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'marst'. You can override using the
## `.groups` argument.

marital_formal$provider_id <- str_to_lower(str_replace_all(marital_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(marital_formal, options = list(pageLength = 100))

To finalize the provider_informal table we bind all of these different tables together using an rbind function. We also filter out cases that we don’t need to include on the care board such as when the provider id is valued at niu or care_focus is valued at none. The code below then creates total population estimates by looking at provision across all groups within society and binding them to the other data. We then write this information to a csv file as a data table.

provider_informal_datum <- rbind(care_formal, race_formal, poverty_formal, marital_formal) %>%
  filter(provider_id != "niu") %>%
  filter(care_focus != "none")

datatable(provider_informal_datum, options = list(pageLength = 100))

Provider formal data

The final step of creating the provider data involves looking at the time spent for each provider group in formal care giving. formal care giving represents the time spent engaging in paid care work. Jobs such as nurses, teachers, chefs, barbers, and many others represent paid work for caregiving. The code below looks at these different groups and determines the amount of time spent in the formal care sector for the different providers.

The first group that we look at is care status.

care_formal <- atus %>%
  group_by(care_status, formal_care_focus, CASEID) %>%
  summarise(
    workduration = sum(DURATION * PaidWork, na.rm = TRUE),
    weight = first(WT06)/365/5
  ) %>%
  ungroup() %>%
  group_by(care_status, formal_care_focus) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = formal_care_focus) %>%
  rename(provider_id = care_status)

## `summarise()` has grouped output by 'care_status', 'formal_care_focus'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'care_status'. You can override using the
## `.groups` argument.

care_formal$provider_id <- str_to_lower(str_replace_all(care_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(care_formal, options = list(pageLength = 100))

The next provider we look at is time spent in the formal economy based on race and ethnicity.

race_formal <- atus %>%
  group_by(race_ethnicity, formal_care_focus, CASEID) %>%
  summarise(
    workduration = sum(DURATION * PaidWork, na.rm = TRUE),
    weight = first(WT06)/365/5
  ) %>%
  ungroup() %>%
  group_by(race_ethnicity, formal_care_focus) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = formal_care_focus) %>%
  rename(provider_id = race_ethnicity)

## `summarise()` has grouped output by 'race_ethnicity', 'formal_care_focus'. You
## can override using the `.groups` argument.
## `summarise()` has grouped output by 'race_ethnicity'. You can override using
## the `.groups` argument.

race_formal$provider_id <- str_to_lower(str_replace_all(race_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(race_formal, options = list(pageLength = 100))

The next group that we want to look at is poverty status. As a reminder, this variable is only available for the years 2023 and 2022

poverty_formal <- atus %>%
  group_by(poverty, formal_care_focus, CASEID) %>%
  summarise(
    workduration = sum(DURATION * PaidWork, na.rm = TRUE),
    weight = first(WT06)/365/5
  ) %>%
  ungroup() %>%
  group_by(poverty, formal_care_focus) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = formal_care_focus) %>%
  rename(provider_id = poverty)

## `summarise()` has grouped output by 'poverty', 'formal_care_focus'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'poverty'. You can override using the
## `.groups` argument.

poverty_formal$provider_id <- str_to_lower(str_replace_all(poverty_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(poverty_formal, options = list(pageLength = 100))

Finally we will look at the time spent on formal care activities for individuals based on marital status.

marital_formal <- atus %>%
  group_by(marst, formal_care_focus, CASEID) %>%
  summarise(
    workduration = sum(DURATION * PaidWork, na.rm = TRUE),
    weight = first(WT06)/365/5
  ) %>%
  ungroup() %>%
  group_by(marst, formal_care_focus) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = formal_care_focus) %>%
  rename(provider_id = marst)

## `summarise()` has grouped output by 'marst', 'formal_care_focus'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'marst'. You can override using the
## `.groups` argument.

marital_formal$provider_id <- str_to_lower(str_replace_all(marital_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(marital_formal, options = list(pageLength = 100))

Finally, we bind these observations together and filter out all cases where the provider id is niu or the care_focus is none. These observations are not needed for the care board and as such are not inlcuded in the final data.

provider_formal_datum <- rbind(care_formal, race_formal, poverty_formal, marital_formal) %>%
  filter(provider_id != "niu") %>%
  filter(care_focus != "none")

datatable(provider_formal_datum, options = list(pageLength = 100))

The code above provided a series of information about the different care providers studied and analyzed on the care board. For each care provider we calculated their societal population along with the time they spend in both formal and informal care giving. The information is vital to understand the demographics of caregivers throughout society and understand the question of who is providing care giving. We can finalize this section by providing a single dataset combining the provider formal and provider informal data. this is done by adding a care_type column to both and then binding them together.

provider_formal_datum$care_type <- "formal"
provider_informal_datum$care_type <= "informal"

## Warning: Unknown or uninitialised column: `care_type`.

## logical(0)

provider_datum <- rbind(provider_formal_datum, provider_informal_datum)

datatable(provider_formal_datum, options = list(pageLength = 100))

The Circle of Care (Activity Timeuse)

The next data that we compiled for the Care Board represents information at the activity level on time sepnt on different individual activities. At the formal economy these activities represent paid jobs. At the informal level these activities represent unpaid time use. One of the goals of the Care Board is to provide an analysis of all of these individual activities together. As such it is important to create statistics fusing these into a single dataframe for analysis.

The first step of this process is the creation of cross walks that 1) identify jobs that are part of the formal economy, 2) idnetify informal time use activities that are part of the formal economy, and 3) create crosswalks between formal and informal activities. The crosswalks and a discussion of how they are created are in the appendix section of this document. It is however important to note that these crosswalks can be viewed as somewhat subjective. What is a care economy job is a difficult question to answer. Some jobs such as nurses, are clearlry yes. But what about barbers/salon workers, janitors, and chefs? We often consider informally cooking or cleaning for others as care giving, but we often don’t include these as caregiving at the formal level. Additionally, in both the health and developmental formal industries, there are a high amount of activities outside of direct caregiving. Should the accountant at the hospital be considered to have a care economy job even though they never directly interact with patients? These questions are not easy to ansewr, and we discuss our methods of resolving these issues in the appendix.

Formal Care Activities

The first setp of this process is to identify all activities that we want to label as formal care activities. This is done using the Yearly ASEC survey which acts as a supplement of the CPS survey. This survey provides more robust information on individuals economic situations and thus is useful for understanding statistics about the formal economy. The first step will thus be to load in the ASEC data for the most recent year, 2024, and the created formal occupation crossover. We also filter the data to only include employed individuals because the CPS will consider someone to be in a specific job if they were recently laid off. We do not want unemployed people to bias down our numbers on income or hours worked for activities.

#Load in the asec data that has information on the formal activities
asec <- fread("02_data-prep-and-cleaning/02_ASECdata.csv") %>%
  filter(empstat == "Employed") %>%
  filter(INCWAGE != 0) %>%
  filter((YEAR == 2024)) %>%
  select(YEAR, OCC2010, UHRSWORKT, EARNWT, INCWAGE, care_focus, ASECWT) %>%
  mutate(UHRSWRKT = ifelse(UHRSWORKT == 999, 0, UHRSWORKT))

#Load in the crossover information dataset about formal jobs
job_name <- fread("01_preliminary-code-and-data/01_FormalOccs_Crossover.csv") %>% 
  filter(FOCUS != "none")

The raw asec data has the following format.

summary(asec)

##       YEAR         OCC2010       UHRSWORKT          EARNWT     
##  Min.   :2024   Min.   :  10   Min.   :  1.00   Min.   :    0  
##  1st Qu.:2024   1st Qu.:1650   1st Qu.: 40.00   1st Qu.:    0  
##  Median :2024   Median :4000   Median : 40.00   Median :    0  
##  Mean   :2024   Mean   :3920   Mean   : 89.48   Mean   : 2355  
##  3rd Qu.:2024   3rd Qu.:5550   3rd Qu.: 40.00   3rd Qu.:    0  
##  Max.   :2024   Max.   :9750   Max.   :999.00   Max.   :54484  
##     INCWAGE         care_focus            ASECWT           UHRSWRKT     
##  Min.   :      2   Length:63168       Min.   :  124.1   Min.   :  0.00  
##  1st Qu.:  30000   Class :character   1st Qu.: 1242.3   1st Qu.: 40.00  
##  Median :  51000   Mode  :character   Median : 2229.6   Median : 40.00  
##  Mean   :  71201                      Mean   : 2357.1   Mean   : 88.88  
##  3rd Qu.:  85000                      3rd Qu.: 3090.6   3rd Qu.: 40.00  
##  Max.   :1399999                      Max.   :16587.3   Max.   :997.00

The crossover document then provides thefollowing information used to identify jobs as part of the care economy or not. The Code is used to link with the OCC2010 column, Label is used as the name of the occupation, and FOCUS is used to identify the care focus.

datatable(job_name, options = list(pageLength = 50))

Our next step is to create a formated list of all unique care job names, codes and ids. First, we look at all the unique jobs in the asec data. We then filter the crosswalk to only include information on these jobs. If a job is not in the asec data, we do not want to bring over information about it. We then create a list of unique care economy job names and use this name to crate a kebab case id variable.

#Let's create a list of all the jobs in the asec data
job_code <- unique(asec$OCC2010)

#Let's take these codes and get the name
name <- job_name %>%
  filter(Code %in% job_code) %>%
  pull(Label)

#We only want unique names
name <- unique(name)

#Let's transform the name into an id variable in kebab format
id <- str_to_lower(str_replace_all(name, "[^a-zA-Z0-9]+", "-"))
id <- gsub("-$", "", id)

Following this, we create two new vectors that we will populate with data for each individual care occupation. The first represents the mean wage for this occupation. The means are calculated using the INCWAGE variable and the EARNWT survey weighted variable to create a weighted mean. The second vector called care_focus represents which of the three care focuses are included in this data.

The code below after creating these blank vectors goes through each of the unique names, gathers the code from the crosswalk, and creates a subset of the ASEC data where OCC2010 is valued at this code. care_focus is then populated with the value of this data’s care_focus which will be identical for all observations, and mean_wage is calculated by taking the weighted mean of INCWAGE. Following this we bind all of this data together into an activity_formal dataframe.

median_wage <- {}
care_focus <- {}

#Let's loop through the different names to get information on wage and the care focus
for (n in name) {
  #Convert the name back to code
  code <- job_name %>%
    filter(Label == n) %>%
    pull(Code)
  #Subset the asec data to only have observations with that code
  temp <- asec %>%
    filter(OCC2010 %in% code)
  #We can take the care focus from this list
  care_focus <- append(care_focus, first(temp$care_focus))
  #We calculate the weighted median wage from this group
  median_wage <- append(median_wage, wtd.quantile(temp$INCWAGE, weights = temp$EARNWT, probs = 0.5))
}

activity_formal <- data.frame(
  activity_id = id,
  name = name,
  care_focus = care_focus,
  median_wage = median_wage,
  stringsAsFactors = FALSE  # Ensures character columns are not converted to factors
)

datatable(activity_formal, options = list(pageLength = 50))

Informal Care Activities

The next step involves doing the exact same thing as above but instead doing it for all informal activities. To analyze time use and information on informal care activities we use atus data. Just as with previous atus usages, we will restrict the atus data to only include the most recent 5 years of data. We also rename FOCUS to be care_focus in order to ensure it fits together with the formal format.

atus <- fread("02_data-prep-and-cleaning/02_ATUSdata.csv") %>%
  rename("care_focus" = "FOCUS") %>%
  filter(care_focus != "none") %>%
  filter(YEAR >= 2018) %>%
  filter(YEAR != 2020) %>%
  select(YEAR, CASEID, WT06, ACTIVITY, DURATION, SCC_ALL_LN, SEC_ALL_LN, care_focus, Activity)

summary(atus)

##       YEAR          CASEID                    WT06              ACTIVITY     
##  Min.   :2018   Min.   :20180101180006   Min.   :   719247   Min.   : 10201  
##  1st Qu.:2019   1st Qu.:20190201192382   1st Qu.:  4636922   1st Qu.: 20101  
##  Median :2021   Median :20210403210888   Median :  7691275   Median : 20201  
##  Mean   :2020   Mean   :20205139505577   Mean   : 10057258   Mean   : 27206  
##  3rd Qu.:2022   3rd Qu.:20220807221037   3rd Qu.: 12569738   3rd Qu.: 30101  
##  Max.   :2023   Max.   :20231212232271   Max.   :194366930   Max.   :159999  
##     DURATION         SCC_ALL_LN       SEC_ALL_LN        care_focus       
##  Min.   :   1.00   Min.   :  0.00   Min.   :  0.0000   Length:275446     
##  1st Qu.:  10.00   1st Qu.:  0.00   1st Qu.:  0.0000   Class :character  
##  Median :  20.00   Median :  0.00   Median :  0.0000   Mode  :character  
##  Mean   :  38.12   Mean   :  5.44   Mean   :  0.6742                     
##  3rd Qu.:  45.00   3rd Qu.:  0.00   3rd Qu.:  0.0000                     
##  Max.   :1070.00   Max.   :788.00   Max.   :922.0000                     
##    Activity        
##  Length:275446     
##  Class :character  
##  Mode  :character  
##                    
##                    
##

Just as there exits a formal occupation crossover that takes formal occupations and codes them as care or not, there exists the same cross over for informal activities. For a broader discussion on the methodologies used to create this information, see the appendix section on this topic. Just as with the formal care occupations, this crosswalk involves a variety of subjective assumptions that can be tweaked. More information on these assumptions is found in the appendix.

act_name <- fread("01_preliminary-code-and-data/01_ATUSActivityCrossover.csv")

datatable(act_name, options = list(pageLength = 50))

Once all the data is loaded we create the name and id vectors which will be used to represent the activities. the name looks at the crosswalk to get the activity code and then finds all names where the code is labeled as a care economy code.

act_code <- unique(atus$ACTIVITY)

name <- act_name %>%
  filter(Code %in% act_code) %>%
  pull(Activity)

name <- unique(name)


id <- str_to_lower(str_replace_all(name, " ", "-"))
id <- gsub("-$", "", id)

Just as with the previous data we need to calculate the care_focus and the mean wage values. The care focus is easily done but the mean wage is more complicated. What does mean wage for an unpaid activity even represent? When talking about mean wage, we want to know what someone doing this activity in the formal sector, might get paid. In the data above, we already calculated all the relavent information about wages in the formal sector, the hard part for us is linking the informal activities to formal occupations. We do this through the utilizatino of another crosswalk. The discussion of the methods and implications of the occupations to activities crosswalk is available in the relavent appendix. This crosswalk provides the relavent occupation/s that are related to a care activity.

We thus start by first loading in and presenting this crosswalk.

act_cross <- read.csv("01_preliminary-code-and-data/01_Informal_Formal_Crosswalk.csv", fileEncoding = "latin1")

expand_range <- function(vec) {
  expanded_vec <- c()  # Initialize empty vector to store results
  
  for (item in vec) {
    if (grepl("-", item)) {  # Check if the item contains a "-"
      range_vals <- as.numeric(strsplit(item, "-")[[1]])  # Split by "-"
      expanded_vec <- c(expanded_vec, seq(range_vals[1], range_vals[2], by = 10))  # Expand range
    } else {
      expanded_vec <- c(expanded_vec, as.numeric(item))  # Append single numbers
    }
  }
  
  return(expanded_vec)
}
  
  
datatable(act_cross, options = list(pageLength = 50))

The code below now provides the care focus and the mean wage. To calculate the mean wage, we utilize the cross walk to identify all observations in the asec data that are crosswalked to the individual informal activity. We then use the same weighted mean approach as above to get the average value of the wages for this group.

care_focus <- {}
median_wage <- {}


for (n in name) {
  code <- act_name %>%
    filter(Activity == n) %>%
    pull(Code)
  temp <- atus %>%
    filter(ACTIVITY %in% code)
  cross <- act_cross %>%
    filter(Code_Informal %in% code)
  formal_codes <- unique(expand_range(cross$Code_Formal))
  formal_temp <- asec %>%
    filter(OCC2010 %in% formal_codes)
  
  care_focus <- append(care_focus, first(temp$care_focus))
  median_wage <- append(median_wage, wtd.quantile(formal_temp$INCWAGE, weights = formal_temp$EARNWT, probs = 0.5))
}

activity_informal <- data.frame(
  activity_id = id,
  name = name,
  care_focus = care_focus,
  median_wage = median_wage,
  stringsAsFactors = FALSE  # Ensures character columns are not converted to factors
)

summary(activity_informal)

##  activity_id            name            care_focus         median_wage   
##  Length:68          Length:68          Length:68          Min.   :24000  
##  Class :character   Class :character   Class :character   1st Qu.:28200  
##  Mode  :character   Mode  :character   Mode  :character   Median :30000  
##                                                           Mean   :32878  
##                                                           3rd Qu.:32000  
##                                                           Max.   :80000

Using these mean wage values to understand informal activities is an interesting method for understanding the value of the time spent in informal activity. Before we finish this, there are two additional activities that we ened to add. These activities are for secondary childcare and secondary eldercare. These secondary activities are not included as atus activities and thus need to be added seperatly. For secondary childcare we code the care focus as developmental due to the work involved with children. For secondary eldercare we code the care focus as health due to the increased health monitoring that occurs when caring for an elder. For secondary childcare we code the mean wage value as the number equal to the mean wage for the activity “providing physical care for children.” For secondary eldercare we code the mean wage value as the number equal to the mean wage for the activity “Providing medical care to adults.” .

activity_informal <- rbind(activity_informal, 
                           data.frame(activity_id = "secondary-childcare", 
                                      name = "Secondary Childcare", 
                                      care_focus = "developmental",
                                      median_wage = 30566.09))

activity_informal <- rbind(activity_informal, 
                           data.frame(activity_id = "secondary-eldercare", 
                                      name = "Secondary Eldercare", 
                                      care_focus = "health",
                                      median_wage = 34937.61))

datatable(activity_informal, options = list(pageLength = 100))

we finally want to combine all of this data into a single activity dataframe that includes a value for wether or not these activities should be in the formal or informal sector.

activity_formal$care_type <- "formal"
activity_informal$care_type <- "informal"

activity_final <- rbind(activity_formal, activity_informal)

datatable(activity_final, options = list(pageLength = 100))

##Activity Formal Datum

The next step that we need to do is calculate a few more pieces of data for each of the formal care activities. Specifically we want to add two variables to this information.

Population will represent the total number of people in the US who engaged in this job. Provision_Interval will represent the total amount of time in minutes daily that this activity was performed across society.

Let’s start by creating these new variables as blank vectors.

ids <- activity_formal$id
provision_interval <- {}
population <- {}

The first step in this will involve getting a list of the formal occupation codes that we need to calculate the values for. This is done the same way as in the activity_formal subsection above.

job <- activity_formal$name

job <- job_name %>%
  filter(Label %in% job)

code <- job$Code

asec <- asec %>%
  filter(OCC2010 %in% code)

We now need to summarise the asec data to get population and provision interval. We start by looking at each individual occupation code. For each code we caculate the population of people in that by summing the weight variable ASECWT. We then calculate the population level time spent in this activity by taking this weight and multiplying it by the UHRSWORKT variable. UHRSWORKT represents the amount of time a person usually workes in a week. This variable is at the hourly level so to get it to minutes we need to multiple it by 60. To get it to the daily level we then need to divide it by 7.

active_formal <- asec %>%
  group_by(OCC2010) %>%
  summarise(
    population = sum(ASECWT, na.rm = TRUE),
    provision_interval = sum(ASECWT*UHRSWORKT*60/7, na.rm = TRUE)
  ) %>%
  rename("activity_id" = "OCC2010")

active_formal$provider_attention <- "active"

active_formal <- active_formal %>%
  left_join(job_name, by = c("activity_id" = "Code")) %>%
  select(Label, population, provision_interval, provider_attention)

active_formal$activity_id<- str_to_lower(str_replace_all(active_formal$Label, "[^a-zA-Z0-9]+", "-"))
active_formal$activity_id <- gsub("-$", "", active_formal$activity_id)

active_formal <- active_formal %>%
  select(-Label)

active_formal <- active_formal %>%
  group_by(activity_id) %>%
  summarise(
    population = sum(population),
    provision_interval = sum(provision_interval),
    provider_attention = first(provider_attention)
  )

datatable(active_formal, options = list(pageLength = 100))

Activity Informal Datum

The next step in this section is to calculate the exact same information for the informal activities. That is for each informal activity we want to calculate the population of individuals who participated in that activity and the time spent across the population participating in that activity.

Just as with the previous work with the informal sector, we utilize the asec data when analyzing this information. One issue with lookingat the informal activities is that we need to do it in three parts. The first part for primary activities, th second part for secondary childcare, and the third part for secondary eldercare. The first code below conducts the analysis for primary care activities.

The way this code works is that for each individual it first calculates the total time spent in each unique activity. This is needed becasue some people might spent time in the same activities multiple times in the day. For instance, someone might cook breakfast, cook lunch, and then cook dinner. We first need to aggregate all of this time for each individual and activity. When doing this we also calculate the individual weight by using the WT06 variable, dividing by the 365 days in the year and the 5 years in our data. We also create a single value for the care focus of each observations. This first section gives us a signe activity, individual observation for all unique activity and individual combinations. For this section, we only care about care activities, so after this we filter out the activities where focus is equal to none. Following this we group the data by activities and calculate the number of people within that activity by summing the weights of all individuals who have at least one minute spent in that activity and then calculate the interval by suming the weight of each individual multiplied by the number of minutes spent in that activity.

active_primary <- atus %>%
  group_by(Activity, CASEID) %>%
  summarise(
    weight = first(WT06/365/5),
    duration = sum(DURATION),
    focus = first(care_focus)
  ) %>%
  ungroup() %>%
  filter(Activity != "") %>%
  filter(focus != "none") %>%
  group_by(Activity) %>%
  summarise(
    population = sum(weight),
    provision_interval = sum(weight*duration)
  ) %>%
  rename("id" = "Activity")

## `summarise()` has grouped output by 'Activity'. You can override using the
## `.groups` argument.

active_primary$provider_attention <- "active"

datatable(active_primary, options = list(pageLength = 100))

It is important that the population column in this data will add up to significantly more than the population of the US. The population column in this data represents the number of people who participated in the associated activity. A given individual might take part in 10, 20 or even more activities throughout the day. Thus that individual would be counted in each of these activities.

The next step is to repeat the above methodology but to capture secondary time use activities. First we will look at time spent in secondary child care. For this we are only looking at one apecific activity so we can skip the first step from the previous chunk. The only other difference is that we use the SCC_ALL_LN variable instead of duraiton. This variable represents time spent in secondary childcare for any child. We also manually set the focus for this to be equal to developmental.

passive_child <- atus %>%
  group_by(CASEID) %>%
  summarise(
    weight = first(WT06/365/5),
    duration = sum(SCC_ALL_LN),
    focus = "developmental"
  ) %>%
  ungroup() %>%
  filter(duration != 0) %>% 
  summarise(
    population = sum(weight),
    provision_interval = sum(weight*duration)
  )
passive_child$provider_attention <- "passive_child"
passive_child$id <- "secondary-childcare"

datatable(passive_child, options = list(pageLength = 100))

Finally we repeat the exact step as above but instead do it with secondary eldercare activities.

passive_elder <- atus %>%
  group_by(CASEID) %>%
  summarise(
    weight = first(WT06/365/5),
    duration = sum(SEC_ALL_LN),
    focus = "health"
  ) %>%
  ungroup() %>%
  filter(duration != 0) %>% 
  summarise(
    population = sum(weight),
    provision_interval = sum(weight*duration)
  )
passive_elder$provider_attention <- "passive_elder"
passive_elder$id <- "secondary-eldercare"

datatable(passive_elder, options = list(pageLength = 100))

We now have all the information needed on data for the informal activities. We end by binding these together into a single column and then presenting them. We then bind the informal and formal activities together into a final activity

active_informal <- rbind(active_primary,passive_child,passive_elder)

active_informal <- active_informal %>%
  rename(activity_id = id) %>%
  mutate(activity_id = tolower(str_replace_all(activity_id, " ", "-")))

active_informal <- active_informal %>%
  select(activity_id, provider_attention, provision_interval, population)

active_informal$activity_id <- gsub("'", "", active_informal$activity_id)

datatable(active_informal, options = list(pageLength = 100))

Next let’s bind the activity informal datum to the activity formal datum to create an activity datum dataset. This will give

activity_datum <- rbind(active_formal, active_informal)

datatable(activity_datum, options = list(pageLength = 100))

The final step in this section is to combine the activity table with the activity_datum table. This step will give us the final version totalling all the different variabels for all formal and informal activities.

activities <- left_join(activity_final, activity_datum, by = "activity_id")

datatable(activities, options = list(pageLength = 100))

Care Gini

The next statistic that we compute for the Care Board is called the care GINI. A GINI coefficient is a measure of inequality that quantifies the distribution of resources within a population on a range of 0 to 1. A GINI of 0 represents perfect equality in a population while a GINI of 1 represents perfect inequality. As such, the lower the value the better for our concerns. This is most often discussed relative to the distribution of income or wealth within society as a measure of inequality. However, it does not need to be income or wealth. The methodology behind a GINI coefficient related to the distribution of any resource throughout a society.

We utilize the GINI methodology to create a care industry GINI looking at the distribution of care jobs throughout society compared to the population. A significnat literature in a variety of industries includeing healthcare, food care, child care, and others discuss the negative impacts of deserts. A desert is a location with a sizeable population that does not have necessary services. These deserts are a symptom of inequality in these services. The GINI coefficient can be used to study to overall distribution of care resources throughout a population and an elimination of these deserts.

We start by loading in the data from the Quarterly Census on Employment and Wages that provides information on industry employment at the county level by year.

data <- fread('01_preliminary-code-and-data/County_Employment_By_IndustryYear.csv')

summary(data)

##        V1              year        area_fips     industry_code 
##  Min.   :     1   Min.   :2010   Min.   : 1001   Min.   :  10  
##  1st Qu.:146432   1st Qu.:2015   1st Qu.:19013   1st Qu.:6111  
##  Median :292863   Median :2018   Median :29207   Median :6214  
##  Mean   :292863   Mean   :2018   Mean   :31172   Mean   :5381  
##  3rd Qu.:439293   3rd Qu.:2021   3rd Qu.:45999   3rd Qu.:6244  
##  Max.   :585724   Max.   :2024   Max.   :72999   Max.   :8129  
##  IndustryEmployment     Area             Industry        
##  Min.   :      0    Length:585724      Length:585724     
##  1st Qu.:      0    Class :character   Class :character  
##  Median :     32    Mode  :character   Mode  :character  
##  Mean   :   4034                                         
##  3rd Qu.:    364                                         
##  Max.   :4523297

We also want to know information on the distribution of the population throughout the united states also at the county level. We can load in the Age ByCounty2020Plus data from the Census Bureau to gather this data. For now we just want the total population. In future iterations, and in the working paper related to this, we calculate specific GINIS related to childcare and eldercare. For now though we want the total population as our measure. We pull in this information and then bind it to the other data by state and county.

ages <- fread("01_preliminary-code-and-data/AgeByCounty2020Plus.csv") %>%
  filter(YEAR != 1) %>%
  mutate(year = case_when(
    YEAR == 2 ~ 2020,
    YEAR == 3 ~ 2021,
    YEAR == 4 ~ 2022,
    YEAR == 5 ~ 2023,
    TRUE ~ NA_real_  # Assigns NA to any other values not specified
  )) %>%
  select(STATE, COUNTY, STNAME, CTYNAME, year, POPESTIMATE)

ages$area_fips <- as.numeric(sprintf("%02d%03d", ages$STATE, ages$COUNTY))

data <- data %>%
  inner_join(ages, by = c("year", "area_fips"))

summary(data)

##        V1              year        area_fips     industry_code 
##  Min.   : 45361   Min.   :2020   Min.   : 1001   Min.   :  10  
##  1st Qu.: 89887   1st Qu.:2021   1st Qu.:18179   1st Qu.:6111  
##  Median :135381   Median :2022   Median :29147   Median :6214  
##  Mean   :135922   Mean   :2022   Mean   :30395   Mean   :5487  
##  3rd Qu.:180859   3rd Qu.:2023   3rd Qu.:45019   3rd Qu.:6244  
##  Max.   :226326   Max.   :2023   Max.   :56045   Max.   :8129  
##  IndustryEmployment     Area             Industry             STATE     
##  Min.   :      0    Length:174513      Length:174513      Min.   : 1.0  
##  1st Qu.:      0    Class :character   Class :character   1st Qu.:18.0  
##  Median :     35    Mode  :character   Mode  :character   Median :29.0  
##  Mean   :   3739                                          Mean   :30.3  
##  3rd Qu.:    355                                          3rd Qu.:45.0  
##  Max.   :4523297                                          Max.   :56.0  
##      COUNTY          STNAME            CTYNAME           POPESTIMATE     
##  Min.   :  1.00   Length:174513      Length:174513      Min.   :     43  
##  1st Qu.: 33.00   Class :character   Class :character   1st Qu.:  15173  
##  Median : 75.00   Mode  :character   Mode  :character   Median :  33777  
##  Mean   : 94.93                                         Mean   : 124532  
##  3rd Qu.:129.00                                         3rd Qu.:  88637  
##  Max.   :775.00                                         Max.   :9992813

Now we want to calculate our data on the GINI coefficients. For now we calculate three separate GINI coefficients, although the national GINI is the only one that is displayed on the Care Board directly. The first GINI, called the national GINI represents the distribution of all care jobs throughout the United States relative to the distribution of the US population. Perfect equality in this statistic would represent a perfectly equal ratio of care jobs to the US population. It is important to note, that this stistic currently does not take into account the fact that different counties might have more or less care needs than others. Currently we use a simple population total. Some counties might be identifical in population, but different demographics might require more need for care in one than the other. Future versions will seek to combine the methods from the merket_datum with this section to form a more robust Care GINI measure.

The code below takes the information in the data, filters out observations with the industry code of 10 because these represent the measure for all industries, and then fore each year and area sums the total employment in the care industries and sums the total population. We then use the Gini command which is from DescTools package to calculate the gini coefficient using employment as the resource and the population.

national_gini <- data %>%
  filter(industry_code != 10) %>%
  group_by(year, area_fips) %>%
  summarise(
    careemployment = sum(IndustryEmployment),
    population = first(POPESTIMATE)
  ) %>%
  ungroup() %>%
  group_by(year) %>%
  summarise(
    gini_nat = Gini(careemployment, weights = population)
  )

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

datatable(national_gini, options = list(pageLength = 100))

As can be seenthe GINI coefficient for these years is relativly constant with a value of around 0.69. However it is slightly decreasing which represents a stable but minute improvement.

For further analysis in this variable we can calculate it for each individual state. This uses the same method as above but calculates the GINI coefficient for each state seperatly.

state_gini <- data %>%
  filter(industry_code != 10) %>%
  group_by(year, STNAME) %>%
  summarise(
    gini_state = Gini(IndustryEmployment, weights = POPESTIMATE)
  )

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

datatable(state_gini, options = list(pageLength = 100))

The final GINI statistic that we want to calculate is at the industry level. We might for instance have concerns that one industry is more or less evenly distributed than another. For example, it is possibel that nursing jobs are more or less evenly distributed than childcare jobs. Calculating the GINI for each individual industry allows us to answer questions related to this.

industry_gini <- data %>%
  group_by(year, STNAME, Industry) %>%
  summarise(
    gini_state = Gini(IndustryEmployment, weights = POPESTIMATE)
  )

## `summarise()` has grouped output by 'year', 'STNAME'. You can override using
## the `.groups` argument.

datatable(industry_gini, options = list(pageLength = 100))

Care Ratio

The next statistic table we create is called the Care Ratio. This represents a novel statistic directly related to the Care Board. The Care Ratio is our take on a dependency ratio. A dependency ratio is a demographic measure comparing the proportion of dependent individuals to the working-age population. It is used to understand the economic burden throughout society on productive workers. A common dependency ratio is the number of dependents (young + elderly) divided by the sum of the working age population.

We change this method in a few ways. First we flip it and look at the ratio of the people providing care over the people in need of care. Second, we mutate each of these variables via a variety of weighting techniques to create unique values of the numerators and denominators. The code below will walk through the procedure on creating these weighting mechanisms.

The first step though is load in the data. The data we load in is identical to the data in the previous section. We need the distribution of population by age in the US, the distribution of care jobs in the US, and the population of disabled individuals in the US.

formalsector <- fread("01_preliminary-code-and-data/County_Employment_By_IndustryYear.csv") %>%
  filter(!grepl("0$", area_fips)) %>%
  filter(!grepl("^C", area_fips)) %>%
  filter(area_fips != "USMSA" & area_fips != "USCMS" & area_fips != "USNMS")

ages <- fread("01_preliminary-code-and-data/AgeByCounty2020Plus.csv") %>%
  filter(YEAR != 1) %>%
  mutate(year = case_when(
    YEAR == 2 ~ 2020,
    YEAR == 3 ~ 2021,
    YEAR == 4 ~ 2022,
    YEAR == 5 ~ 2023,
    TRUE ~ NA_real_  # Assigns NA to any other values not specified
  ))
YEARS <- unique(ages$year)
disability <- fread("01_preliminary-code-and-data/Disability_By_County.csv")

We also will need to load in the market_datum table which we calculated above. This table was used to create the need and provision metrics by age catagory. After we load this we create a variable of care need called weight. This variable looks at the percent an age group needs compared to the maximum value of needed care. For example, the maximum value of time spent needing care is 24 hours, if an age group needs 12 hours of care then the value of this weight will be .5. This will be used to value that age group as .5 as much as the age group with full care needed.

#To understand the weights of different populations we need to use the market_Datum data
market_datum <- fread("03_metric-tables/market_datum/market_datum.csv") %>%
  group_by(age) %>%
  summarise(
    need = sum(need_interval)
  ) %>%
  #Weight refers to for each age group, on a scale of 0-1 how much care is needed with 1 representing the highest need group.
  mutate(
    weight = need/max(need)
  )

#We normalize this weight creating a mean of 1 and a stnadard deviation of 1
market_datum$weight <- (market_datum$weight - mean(market_datum$weight, na.rm = TRUE)) / sd(market_datum$weight, na.rm = TRUE) + 1

After we load in this data, we normalize it to create a mean vale of 1 and a standard deviation of 1. This is useful for understanding shifts in need along a standard persepective. A value of 1 in this perspective represents the average amount of care needed with each additional value of 1 representing a one standard deviation increase.

market_datum$weight <- (market_datum$weight - mean(market_datum$weight, na.rm = TRUE)) / sd(market_datum$weight, na.rm = TRUE) + 1

Using the data that we loaded in, the next step is to create the denomenator od the Care Ratio. The denomenator represents the total amount of care needed in society. This does not necessarily mean the pure population of individuals, but instead the weighted need based on age and disability characteristics. For each group of individuals in the data we do the following transformations. First, we sum the value of the ages dataset for the associated age group to get the population of that age group. We then filter the market_datum data to only include observations within that age group and take the mean weight value for that age group. To create the need value for that group, we multiply the population in that age group by the weight for that age group to get a weighted value. The denomenator is valued at the sum of all of these individual values. For disabled individuals we calculate the weight based on the highest need catagory of that age group. For instance, children are valued at under the age 5, elders at 85 and older, and adults at 65 to 75. In the future, better data is needed to identify the time use needs for individuals with disabilities.

Denominators = {}
Years = {}
for (yr in YEARS) {
  ages_temp <- ages %>%
    filter(year == yr)
  disability_temp <- disability %>%
    filter(Year == yr)
  under5 <- sum(ages_temp$UNDER5_TOT)
  
  under5_W <- market_datum %>%
    filter(age < 5)
  under5_W <- mean(under5_W$weight)
  
  five_thirteen <- sum(ages_temp$AGE513_TOT)
  
  five_thirteen_W <- market_datum %>%
    filter(age > 4 & age < 14)
  five_thirteen_W <- mean(five_thirteen_W$weight)
  
  fourteen_seventeen <- sum(ages_temp$AGE1417_TOT)
  
  fourteen_seventeen_W <- market_datum %>%
    filter(age > 13 & age < 18)
  fourteen_seventeen_W <- mean(fourteen_seventeen_W$weight)
  
  sixtyfive_sixtynine <- sum(ages_temp$AGE6569_TOT)
  
  sixtyfive_sixtynine_W <- market_datum %>%
    filter(age > 64 & age < 70)
  sixtyfive_sixtynine_W <- mean(sixtyfive_sixtynine_W$weight)
  
  seventy_seventyfour <- sum(ages_temp$AGE7074_TOT)
  
  seventy_seventyfour_W <- market_datum %>%
    filter(age > 69 & age < 75)
  seventy_seventyfour_W <- mean(seventy_seventyfour_W$weight)
  
  seventyfive_plus <- sum(ages_temp$AGE7579_TOT) + sum(ages_temp$AGE8084_TOT)
  
  seventyfive_plus_W <- market_datum %>%
    filter(age > 74)
  seventyfive_plus_W <- mean(seventyfive_plus_W$weight)
  
  child_disabled <- sum(as.numeric(disability_temp$DisabUnd18), na.rm = TRUE)
  
  child_disabled_W <- market_datum %>%
    filter(age < 5)
  child_disabled_W <- mean(child_disabled_W$weight) + 1
  
  
  adult_disabled <- sum(disability_temp$DisabAdult)
  
  adult_disabled_W <- market_datum %>%
    filter(age > 64 & age < 75)
  adult_disabled_W <- mean(adult_disabled_W$weight) + 1
  
  elder_disabled <- sum(disability_temp$DisabElder)
  
  elder_disabled_W <- market_datum %>%
    filter(age > 80)
  elder_disabled_W <- mean(elder_disabled_W$weight) + 1
  
  Denom = under5*under5_W + five_thirteen*five_thirteen_W + child_disabled*child_disabled_W + adult_disabled*adult_disabled_W + elder_disabled*elder_disabled_W + sixtyfive_sixtynine*sixtyfive_sixtynine_W + seventy_seventyfour*seventy_seventyfour_W + seventyfive_plus*seventyfive_plus_W
  Denominators = append(Denominators, Denom)
  Years = append(Years, yr)
}

The next step in creating the care ratio involves creating the numerator. The numerator is valued at the number of care givers within society. We deliniate three specific types of care givers. First, formal sector care givers, second home makers, and third caregivers not working in a formal sector job but working in informal caregiving. We utilize cps data to understand formal sector employment. Specifically we utilize the asec data.

data <- fread("02_data-prep-and-cleaning/02_CPSdata.csv") %>%
  filter(YEAR >= 2020) %>%
  filter(nilf_activity == "Homemaker") %>%
  select(YEAR, month, nilf_activity, WTFINL)

Numerators = {}
Years = {}
W = c(1.5, 0.5, 1)

To calculate the numerator variable we calculate the number of care workers, home makers, and individuals working in a non care industry throughout society. We weight these groups relative the the value of homemakers. Homemakers have a standardized value of providing 1 unit of cargiving. careworkers, definied as those working the formal care economy provide 1.5 units of careviging. The formal economy is more efficient thatn the informal economy leading these individuals to be weighted higher. Finally, those working in noncare are considered able to provide .5 units of careviing. The actual values of these weights are subjective and future research should look into discussing what the ideal value is.

for (yr in YEARS) {
  ages_temp <- ages %>%
    filter(year == yr)
  disability_temp <- disability %>%
    filter(Year == yr)
  formal_temp <- formalsector %>%
    filter(year == yr)
  cps_temp <- data %>%
    filter(YEAR == yr)
  population <- sum(ages_temp$POPESTIMATE)
  careworkers <- formal_temp %>%
    filter(industry_code != 10)
  careworkers <- sum(careworkers$IndustryEmployment)
  workingnoncare <- formal_temp %>%
    filter(industry_code == 10)
  workingnoncare <- sum(workingnoncare$IndustryEmployment)-careworkers
  Homemakers <- sum(cps_temp$WTFINL/length(unique(cps_temp$month)))
  Years = append(Years, yr)
  Numer <- careworkers*W[1] + workingnoncare*W[2] + Homemakers*W[3]
  Numerators<- append(Numerators, Numer)
}

Finally, we create our table the care_ratio dividing the denominators from the numerators. This gives us a finalized ratio. The value of 1 represents 1 home maker for every 1 care need. A value beneath 1 represents a progressivly increased care need while a value above 1 represents a progressivly increased care supply.

An important question in regards to this is whwat the optimal value of the Care Ratio is. While we center it on 1, it is not necessarily the case that 1 is the optimal value within society. We create this measure to understand the relative nature of caregiving throuout society and it should be interpreted relative to itself especailly as new geographical comparisons are added.

data <- data.frame(
  Year = Years,
  Care_Ratio = Numerators / Denominators
)

datatable(data, options = list(pageLength = 100))

Labor Force Participation

The next data on the care board is related to Labor Force Participation. We calculate two main statistics related to this data. First, we replicate the BLS statistics related to laboor force participation but subset it based on gender and parenthood status. This variable will be useful providing a more detailed understanding of labor trends. The second variable we calculate is called Care Force Participation which is a count of the people in the care labofr force, both formal and informal. We should the statistics below to calculate all this information.

Labor Force Participation by Care Giver Status

To start with we want to understand the labor force participation rates of people based on their care status. To get this information we need to following the procedures conduced by the Bureau of Labor Statistics in their monthly labor force updates. We utilzie the monthly CPS survey to do this.

cps <- fread("02_data-prep-and-cleaning/02_CPSdata.csv") %>%
  filter(YEAR >= 2003)%>%
  filter(AGE >= 18 & AGE <= 65) %>%
  select(YEAR, month, AHRSWORKT, WTFINL, care_focus, labor_status, care_status) %>%
  mutate(AHRSWORKT = ifelse(AHRSWORKT == 999, 0, AHRSWORKT))%>%
  mutate(care_focus = ifelse(care_focus == "", "none", care_focus))

summary(cps)

##       YEAR         month             AHRSWORKT          WTFINL     
##  Min.   :2003   Length:19810784    Min.   :  0.00   Min.   :    0  
##  1st Qu.:2007   Class :character   1st Qu.:  0.00   1st Qu.: 1268  
##  Median :2012   Mode  :character   Median : 37.00   Median : 2660  
##  Mean   :2013                      Mean   : 27.15   Mean   : 2501  
##  3rd Qu.:2018                      3rd Qu.: 40.00   3rd Qu.: 3449  
##  Max.   :2024                      Max.   :198.00   Max.   :34716  
##   care_focus        labor_status       care_status       
##  Length:19810784    Length:19810784    Length:19810784   
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##

To calculate the labor force participation rates, we group all observations by each date via a year and month variable. Following this we group the data by each unique care status. We want to calculate the population and participation rate for each of these four groups. The data has four unique labor_status values that we want to divide the data over. The first step is to calculate the population of each of these groups. We then get the participation rate by dividing the population by thtal value of the population for that care status group. This gives us the labor force participation rate for each of these care status groups.

LFP <- cps %>%
  group_by(YEAR, month, care_status, labor_status) %>%
  filter(labor_status != "Armed Forces") %>%
  summarise(population = sum(WTFINL), .groups = "drop") %>%
  group_by(YEAR, month, care_status) %>%
  mutate(participation_rate = (population / sum(population)) * 100) %>%
  ungroup()

datatable(LFP, options = list(pageLength = 100))

Formal Care Force

The next related statistic is related to the population of each care status that is engaged in formal caregiving. For each month, we use the cps data to calculate the total population of people engaged in each unique care focus. We then calculate the proportion of this group that is engaged in each focus status.

formal_lfp <- cps %>%
  group_by(YEAR, month, care_status) %>%
  mutate(total_WTFINL = sum(WTFINL, na.rm = TRUE)) %>%  # Total WTFINL for each YEAR and month
  summarise(
    population = sum(WTFINL[care_focus %in% c("health", "daily_living", "developmental") & (labor_status == "Full Time" | labor_status == "Part Time")], na.rm = TRUE),
    total_WTFINL = first(total_WTFINL)  # Retain total sum for proportion calculation
  ) %>%
  mutate(proportion = population / total_WTFINL) %>%
  ungroup()

## `summarise()` has grouped output by 'YEAR', 'month'. You can override using the
## `.groups` argument.

formal_lfp <- formal_lfp %>%
  mutate(
    date = as.Date(paste0(YEAR, "-", month, "-01"), format = "%Y-%B-%d"), # Convert to YYYY-MM-01 format
    metric_id = "formal-careforce" # Add the metric_id column
  ) %>%
  select(date, care_status, population, proportion, metric_id)

datatable(formal_lfp, options = list(pageLength = 100))

Informal Care Force

Just like with the formal care force, the next statistic is related to the population and percentage of each care_status that is working in the informal care force. We calculate this statistic using the same method as above but utilize the atus data as our data source.

atus <- fread("02_data-prep-and-cleaning/02_ATUSdata.csv") %>%
  filter(YEAR >= 2003)%>%
  filter(AGE >= 18 & AGE <= 65) %>%
  select(YEAR, DURATION, WT06, FOCUS, CASEID, care_status)

summary(atus)

##       YEAR         DURATION            WT06              FOCUS          
##  Min.   :2003   Min.   :   1.00   Min.   :   433020   Length:3653827    
##  1st Qu.:2006   1st Qu.:  15.00   1st Qu.:  3085939   Class :character  
##  Median :2011   Median :  30.00   Median :  5548997   Mode  :character  
##  Mean   :2011   Mean   :  73.54   Mean   :  7699313                     
##  3rd Qu.:2016   3rd Qu.:  90.00   3rd Qu.:  9408743                     
##  Max.   :2023   Max.   :1350.00   Max.   :209010030                     
##                                   NA's   :109935                        
##      CASEID               care_status       
##  Min.   :20030100013280   Length:3653827    
##  1st Qu.:20060807062459   Class :character  
##  Median :20110302111426   Mode  :character  
##  Mean   :20114077246830                     
##  3rd Qu.:20160705162329                     
##  Max.   :20231212232280                     
##

The code below pvoides the same method as above but calculates labor force participation by care status. Onr note is that we define someone as being in the informal careforce if they spent at least 6 hours in their day on care giving. This amount equates to a second job and thus is an important value to measure.

individual_summary <- atus %>%
  filter(FOCUS %in% c("developmental", "daily_living", "health")) %>% # Keep relevant activities
  group_by(YEAR, care_status, CASEID) %>%
  summarise(
    total_duration = sum(DURATION, na.rm = TRUE), # Total time spent in relevant activities
    WT06 = first(WT06) # Retain the individual's weight (only one per person)
  ) %>%
  ungroup()

## `summarise()` has grouped output by 'YEAR', 'care_status'. You can override
## using the `.groups` argument.

# Identify individuals who spent at least 360 minutes in these activities
eligible_individuals <- individual_summary %>%
  filter(total_duration >= 180)

# Calculate the population and proportion by YEAR
informal_lfp <- eligible_individuals %>%
  group_by(YEAR, care_status) %>%
  summarise(
    population = sum(WT06 / 365, na.rm = TRUE) # Sum of weighted individuals
  ) %>%
  left_join(
    atus %>%
      group_by(YEAR, CASEID) %>%
      summarise(WT06 = first(WT06)) %>%
      summarise(total_WT06 = sum(WT06 / 365, na.rm = TRUE)), # Total weighted population for each year
    by = "YEAR"
  ) %>%
  mutate(proportion = population / total_WT06) %>%
  select(YEAR, care_status, population, proportion)

## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.

informal_lfp <- informal_lfp %>%
  mutate(
    date = as.Date(paste0(YEAR, "-01", "-01"), format = "%Y-%m-%d"), # Convert to YYYY-MM-01 format
    metric_id = "informal-laborforce" 
  ) %>%
  select(date, care_status, population, proportion, metric_id)

## Adding missing grouping variables: `YEAR`

datatable(informal_lfp, options = list(pageLength = 100))

The Sandwhich Generation

The next statistic that we are interested in capturing is related to the sandwhich generation. The sandwhich generation is the growing number of people who have to take care of both children and the elderly. These individuals are considered to be sandwhiched as they have dual caregiving responsibilities. We create statistics related to understanding the growth of this group in recent years.

We utilize atus data to understand the demographic breakdown of this group. The variable we use to identify eldercare is only available for the years 2011 and onwards so we restrict our data for these years. We definie someone as sandwhiched if they have a child aged 10 or younger AND they spent at least one minute in their day providing primary or secondary eldercare activities.

atus <- fread("02_data-prep-and-cleaning/02_ATUSdata.csv")%>%
  filter(YEAR != 2020) %>%
  filter(YEAR >= 2011) %>%
  filter(YNGCH <= 10) %>%
  group_by(CASEID) %>%
  filter(any(SEC_ALL_LN > 0 | DURATION*ElderCare > 0)) %>%
  ungroup()

summary(atus)

##       YEAR          CASEID                   SERIAL          STRATA     
##  Min.   :2011   Min.   :20110101110850   Min.   :   12   Min.   :  -1   
##  1st Qu.:2013   1st Qu.:20130403131272   1st Qu.: 2535   1st Qu.: 800   
##  Median :2015   Median :20150907152507   Median : 5118   Median :2500   
##  Mean   :2016   Mean   :20156653520078   Mean   : 5286   Mean   :2540   
##  3rd Qu.:2018   3rd Qu.:20180404180663   3rd Qu.: 7901   3rd Qu.:4100   
##  Max.   :2023   Max.   :20231212231056   Max.   :12470   Max.   :5601   
##                                                          NA's   :53523  
##     STATEFIP       HH_SIZE        FAMINCOME      HH_NUMADULTS  
##  Min.   : 1.0   Min.   : 2.00   Min.   : 1.00   Min.   :1.000  
##  1st Qu.:16.0   1st Qu.: 3.00   1st Qu.: 9.00   1st Qu.:2.000  
##  Median :29.0   Median : 4.00   Median :13.00   Median :2.000  
##  Mean   :28.9   Mean   : 4.17   Mean   :11.72   Mean   :2.037  
##  3rd Qu.:42.0   3rd Qu.: 5.00   3rd Qu.:15.00   3rd Qu.:2.000  
##  Max.   :56.0   Max.   :15.00   Max.   :16.00   Max.   :8.000  
##                                                                
##  KIDWAKETIME         KIDBEDTIME          POVERTY185        PERNUM      LINENO 
##  Length:93306       Length:93306       Min.   :11.00   Min.   :1   Min.   :1  
##  Class :character   Class :character   1st Qu.:11.00   1st Qu.:1   1st Qu.:1  
##  Mode  :character   Mode  :character   Median :20.00   Median :1   Median :1  
##                                        Mean   :19.62   Mean   :1   Mean   :1  
##                                        3rd Qu.:20.00   3rd Qu.:1   3rd Qu.:1  
##                                        Max.   :99.00   Max.   :1   Max.   :1  
##                                        NA's   :56397                          
##       DAY             WT06               WT20               AGE      
##  Min.   :1.000   Min.   :  840264   Min.   :       0   Min.   :16.0  
##  1st Qu.:2.000   1st Qu.: 3461059   1st Qu.: 2953740   1st Qu.:32.0  
##  Median :4.000   Median : 6178999   Median : 5762189   Median :37.0  
##  Mean   :4.111   Mean   : 7796595   Mean   : 7515437   Mean   :37.1  
##  3rd Qu.:6.000   3rd Qu.: 9583156   3rd Qu.:10929281   3rd Qu.:42.0  
##  Max.   :7.000   Max.   :87439218   Max.   :33705738   Max.   :85.0  
##                                     NA's   :86311                    
##       SEX             RACE           HISPAN          MARST      
##  Min.   :1.000   Min.   :100.0   Min.   :100.0   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:100.0   1st Qu.:100.0   1st Qu.:1.000  
##  Median :2.000   Median :100.0   Median :100.0   Median :1.000  
##  Mean   :1.647   Mean   :104.6   Mean   :118.6   Mean   :2.071  
##  3rd Qu.:2.000   3rd Qu.:100.0   3rd Qu.:100.0   3rd Qu.:3.000  
##  Max.   :2.000   Max.   :320.0   Max.   :250.0   Max.   :6.000  
##                                                                 
##       EDUC          EMPSTAT          OCC2              OCC_CPS8    
##  Min.   :10.00   Min.   :1.000   Length:93306       Min.   :   10  
##  1st Qu.:21.00   1st Qu.:1.000   Class :character   1st Qu.: 2310  
##  Median :32.00   Median :1.000   Mode  :character   Median : 4700  
##  Mean   :31.91   Mean   :2.108                      Mean   :25306  
##  3rd Qu.:40.00   3rd Qu.:4.000                      3rd Qu.: 9130  
##  Max.   :43.00   Max.   :5.000                      Max.   :99999  
##                                                                    
##     EARNWEEK      HRSWORKT_CPS8    SPEMPSTAT         CPSIDP              
##  Min.   :     0   Min.   :   1   Min.   : 1.00   Min.   :20090606051502  
##  1st Qu.:   650   1st Qu.:  40   1st Qu.: 1.00   1st Qu.:20111102795001  
##  Median :  1500   Median :  40   Median : 1.00   Median :20140600957601  
##  Mean   : 36369   Mean   :3248   Mean   :20.51   Mean   :20142634333663  
##  3rd Qu.:100000   3rd Qu.:9999   3rd Qu.: 3.00   3rd Qu.:20161204828901  
##  Max.   :100000   Max.   :9999   Max.   :99.00   Max.   :20230805307301  
##                                                                          
##     ECPRIOR           YNGCH            NCHILD         ACTLINE     
##  Min.   : 0.000   Min.   : 0.000   Min.   :1.000   Min.   : 1.00  
##  1st Qu.: 0.000   1st Qu.: 2.000   1st Qu.:1.000   1st Qu.: 7.00  
##  Median : 0.000   Median : 4.000   Median :2.000   Median :14.00  
##  Mean   : 1.739   Mean   : 4.568   Mean   :2.155   Mean   :15.76  
##  3rd Qu.: 1.000   3rd Qu.: 7.000   3rd Qu.:3.000   3rd Qu.:22.00  
##  Max.   :99.000   Max.   :10.000   Max.   :8.000   Max.   :85.00  
##                                                                   
##     ACTIVITY       DURATION_EXT        DURATION        SCC_ALL_LN    
##  Min.   : 10101   Min.   :   1.00   Min.   :  1.00   Min.   :  0.00  
##  1st Qu.: 20602   1st Qu.:  10.00   1st Qu.: 10.00   1st Qu.:  0.00  
##  Median : 70101   Median :  20.00   Median : 20.00   Median :  0.00  
##  Mean   : 88439   Mean   :  59.57   Mean   : 53.61   Mean   : 15.23  
##  3rd Qu.:160101   3rd Qu.:  60.00   3rd Qu.: 60.00   3rd Qu.: 15.00  
##  Max.   :500107   Max.   :1439.00   Max.   :960.00   Max.   :960.00  
##                                                                      
##    SCC_OWN_LN       SEC_ALL_LN         START               STOP          
##  Min.   :  0.00   Min.   :  0.000   Length:93306       Length:93306      
##  1st Qu.:  0.00   1st Qu.:  0.000   Class :character   Class :character  
##  Median :  0.00   Median :  0.000   Mode  :character   Mode  :character  
##  Mean   : 15.02   Mean   :  1.033                                        
##  3rd Qu.: 15.00   3rd Qu.:  0.000                                        
##  Max.   :960.00   Max.   :880.000                                        
##                                                                          
##      race             poverty             hispan          race_ethnicity    
##  Length:93306       Length:93306       Length:93306       Length:93306      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     marst               sex              empstat              educ          
##  Length:93306       Length:93306       Length:93306       Length:93306      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  age_category           day             prime_age          child_age        
##  Length:93306       Length:93306       Length:93306       Length:93306      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      nchild      care_status          Activity            PaidWork    
##  Min.   :1.000   Length:93306       Length:93306       Min.   :1      
##  1st Qu.:1.000   Class :character   Class :character   1st Qu.:1      
##  Median :2.000   Mode  :character   Mode  :character   Median :1      
##  Mean   :2.155                                         Mean   :1      
##  3rd Qu.:3.000                                         3rd Qu.:1      
##  Max.   :8.000                                         Max.   :1      
##                                                        NA's   :88442  
##    FormalWork      ChildCare       ElderCare     Householdcare  
##  Min.   :1       Min.   :1       Min.   :1       Min.   :1      
##  1st Qu.:1       1st Qu.:1       1st Qu.:1       1st Qu.:1      
##  Median :1       Median :1       Median :1       Median :1      
##  Mean   :1       Mean   :1       Mean   :1       Mean   :1      
##  3rd Qu.:1       3rd Qu.:1       3rd Qu.:1       3rd Qu.:1      
##  Max.   :1       Max.   :1       Max.   :1       Max.   :1      
##  NA's   :90350   NA's   :76008   NA's   :88481   NA's   :75935  
##     Selfcare        Leisure         Sleeping      Volunteering  
##  Min.   :1       Min.   :1       Min.   :1       Min.   :1      
##  1st Qu.:1       1st Qu.:1       1st Qu.:1       1st Qu.:1      
##  Median :1       Median :1       Median :1       Median :1      
##  Mean   :1       Mean   :1       Mean   :1       Mean   :1      
##  3rd Qu.:1       3rd Qu.:1       3rd Qu.:1       3rd Qu.:1      
##  Max.   :1       Max.   :1       Max.   :1       Max.   :1      
##  NA's   :76136   NA's   :78811   NA's   :85707   NA's   :92674  
##    Education        FOCUS           formal_care_focus 
##  Min.   :1       Length:93306       Length:93306      
##  1st Qu.:1       Class :character   Class :character  
##  Median :1       Mode  :character   Mode  :character  
##  Mean   :1                                            
##  3rd Qu.:1                                            
##  Max.   :1                                            
##  NA's   :93110

To understand the number of people in the sandwhich generation we first calculate the total time each person spends in child care and the total time each person spends in eldercare. We then calculate the population weight of this person utilizing the WT06 variable. Finally we use the variable sex to calculate the gender of this individual. Please note that unlike other statistics we use sex instead of care_status to calculate values. a childless man or childless woman can’t be shandwhiched by definition of not having a child. Thus, we only look at this statistic by sex.

For each year and sex we calculate the weighted mean time spent on child care and elder care as well as the total population count of individuals in this group. We present his data below.

data <- atus %>%
  group_by(YEAR, CASEID) %>%
  summarise(
    child_care = sum(ChildCare*DURATION + SCC_ALL_LN, na.rm = TRUE),
    elder_care = sum(ElderCare*DURATION + SEC_ALL_LN, na.rm = TRUE),
    weight = first(WT06)/365,
    sex=first(sex)
  )

## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.

# Ensure all years are present in the dataset
yearly_stats <- data %>%
  group_by(YEAR, sex) %>%
  summarise(
    mean_child_care = weighted.mean(child_care, w = weight, na.rm = TRUE),
    mean_elder_care = weighted.mean(elder_care, w = weight, na.rm = TRUE),
    weighted_pop = sum(weight, na.rm = TRUE)/5
  )

## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.

datatable(yearly_stats, options = list(pageLength = 100))

The share of the Formal/Informal Care Economy

The next statistic that we are interested in capturing is related to amount of time throughout society that is spent in both the formal and informal care economy. This is calculated as a raw number and as a proportion of total time spent working. This statistic is useful for understanding the share of the populations time that can be spent on care giving activities both formal and informal.

Minutes Worked in the Formal Care Economy

To calculate the formal hours, we utlize the same cps data as calculated above. For each month and year we calculate the time spent using the variable AHRSWORKT. This variable asks individuals the average amont of time they spend working in a week. We then take this varioable and multiply by 60 to get minutes and divide by 7 for the days in a week. This gives us a daily esitimate of minutes spent in work. We then filter this to only include activities within the care economy using the care_focus variable. Using this we calculate the value or hours spent and the proportion of hours spent compared to all formal economic work.

formal_hours <- cps %>%
  group_by(YEAR, month) %>%
  summarise(
    total_hours_weighted = sum(AHRSWORKT/7*60 * WTFINL, na.rm = TRUE), # Total weighted hours for all individuals
    minutes = sum(AHRSWORKT[care_focus %in% c("developmental", "daily_living", "health")] * WTFINL[care_focus %in% c("developmental", "daily_living", "health")]/7*60, na.rm = TRUE) # Weighted hours for target group
  ) %>%
  mutate(proportion = minutes / total_hours_weighted) %>%
  select(YEAR, month, minutes, proportion) %>%
  ungroup()

## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.

formal_hours <- formal_hours %>%
  mutate(
    date = as.Date(paste0(YEAR, "-", month, "-01"), format = "%Y-%B-%d"), # Convert to YYYY-MM-01 format
    metric_id = "formal-hours" 
  ) %>%
  select(date, minutes, proportion, metric_id)

datatable(formal_hours, options = list(pageLength = 100))

Minutes Worked in the Informal Care Economy

Next we look at the same statistic related to the informal car economy. We calculate the total number of minutes spent in informal caregiving and the proportion of these activities compared to the toal time spent throughout the day. To calculate this we utilize the atus data and look at the amount of time spent in care activities.

# Filter the dataset for relevant activities
filtered_atus <- atus[atus$FOCUS %in% c("developmental", "daily_living", "health"), ]

# Calculate total DURATION per individual (CASEID) within each YEAR
individual_totals <- aggregate(DURATION ~ CASEID + WT06 + YEAR, data = filtered_atus, sum)

# Apply weighting by WT06 divided by 365
individual_totals$weighted_time <- individual_totals$DURATION * (individual_totals$WT06 / 365)

population_totals <- aggregate(weighted_time ~ YEAR, data = individual_totals, sum, na.rm = TRUE)
colnames(population_totals)[2] <- "population"

total_time_all_activities <- aggregate(DURATION ~ CASEID + WT06 + YEAR, data = atus, sum)
total_time_all_activities$weighted_time <- total_time_all_activities$DURATION * (total_time_all_activities$WT06 / 365)

total_time_population <- aggregate(weighted_time ~ YEAR, data = total_time_all_activities, sum, na.rm = TRUE)
colnames(total_time_population)[2] <- "total_time_all"

informal_hours <- merge(population_totals, total_time_population, by = "YEAR")

informal_hours$proportion <- informal_hours$population / informal_hours$total_time_all

informal_hours$total_time_all <- NULL

GDP <- data.frame(
  YEAR = c(2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 
           2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023),
  GDP_Billions = c(11460, 12220, 13040, 13820, 14470, 14770, 14480, 15050, 15600, 16250, 
                   16880, 17610, 18300, 18800, 19610, 20660, 21540, 21350, 23680, 26010, 27720)
)

informal_value <- informal_hours %>%
  left_join(GDP, by = "YEAR") %>%
  mutate(
    population = population * 7.25 * 365,
    proportion = population / GDP_Billions
    )

informal_hours <- informal_hours %>%
  mutate(
    date = as.Date(paste0(YEAR, "-01", "-01"), format = "%Y-%m-%d"), # Convert to YYYY-MM-01 format
    metric_id = "informal-hours" 
  ) %>%
  select(date, population, proportion, metric_id)

datatable(informal_hours , options = list(pageLength = 100))

Valuing the Care Economy

The next statistic that we care about creating is the valuation of the care economy. The care economy represents a sizeable chunk of the US economy and we want to create information on the value of both the formal and the informal care economies. To do this we use the information on hours calculated above and combine it with wage information that we calculate below. The proportion variable below represents the value as a proportion of the US GDP for that year.

Informal Care Economy Valuation.

# Filter the dataset for relevant activities
filtered_atus <- atus[atus$FOCUS %in% c("developmental", "daily_living", "health"), ]

# Calculate total DURATION per individual (CASEID) within each YEAR
individual_totals <- aggregate(DURATION ~ CASEID + WT06 + YEAR, data = filtered_atus, sum)

# Apply weighting by WT06 divided by 365
individual_totals$weighted_time <- individual_totals$DURATION * (individual_totals$WT06 / 365)

population_totals <- aggregate(weighted_time ~ YEAR, data = individual_totals, sum, na.rm = TRUE)
colnames(population_totals)[2] <- "population"

total_time_all_activities <- aggregate(DURATION ~ CASEID + WT06 + YEAR, data = atus, sum)
total_time_all_activities$weighted_time <- total_time_all_activities$DURATION * (total_time_all_activities$WT06 / 365)

total_time_population <- aggregate(weighted_time ~ YEAR, data = total_time_all_activities, sum, na.rm = TRUE)
colnames(total_time_population)[2] <- "total_time_all"

informal_hours <- merge(population_totals, total_time_population, by = "YEAR")

informal_hours$proportion <- informal_hours$population / informal_hours$total_time_all

informal_hours$total_time_all <- NULL

GDP <- data.frame(
  YEAR = c(2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 
           2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023),
  GDP_Billions = c(11460, 12220, 13040, 13820, 14470, 14770, 14480, 15050, 15600, 16250, 
                   16880, 17610, 18300, 18800, 19610, 20660, 21540, 21350, 23680, 26010, 27720)
)

informal_value <- informal_hours %>%
  left_join(GDP, by = "YEAR") %>%
  mutate(
    value= population * 7.25 * 365,
    proportion = value / (GDP_Billions*1000000000)
    )

informal_value <- informal_value %>%
  mutate(
    date = as.Date(paste0(YEAR, "-01", "-01"), format = "%Y-%m-%d"), # Convert to YYYY-MM-01 format
    metric_id = "informal-value" 
  ) %>%
  select(date, value, proportion, metric_id)

datatable(informal_value, options = list(pageLength = 100))

Formal Care Economy Valuation.

Finally we want to create the total valuation of the formal care economy. To calculate the value of the formal care economy we utilize the INCWAGE variable. INCWAGE is valued at the income the individual makes. We value the formal economy at the total time spent working multipled by the wage paid for individuals.

asec <- fread("02_data-prep-and-cleaning/02_ASECdata.csv") %>%
  filter(YEAR >= 2003)%>%
  filter(AGE >= 18 & AGE <= 65) %>%
  select(YEAR, INCWAGE, EARNWT, care_focus) %>%
  mutate(care_focus = ifelse(care_focus == "", "none", care_focus))

formal_value <- asec %>%
  filter(care_focus %in% c("developmental", "health", "daily_living")) %>%
  group_by(YEAR) %>%
  summarise(value = sum(INCWAGE * EARNWT, na.rm = TRUE)) %>%
  left_join(GDP, by = "YEAR") %>%
  mutate(proportion = value / (GDP_Billions*1e9))

formal_value <- formal_value %>%
  mutate(
    date = as.Date(paste0(YEAR, "-01", "-01"), format = "%Y-%m-%d"), # Convert to YYYY-MM-01 format
    metric_id = "formal-value" 
  ) %>%
  select(date, value, proportion, metric_id)

datatable(formal_value, options = list(pageLength = 100))

This provides the total value of the formal care economy in dollars as well as the proportion of GDP valued by this formal care economy.

APPENDIX: RAW ATUS

# Set Working Directory
setwd("C:/Users/sc363/OneDrive/Work Items/Workspace/CareBoard/CareBoard/02_data-prep-and-cleaning/")
Sys.sleep(30)
# Load IPUMS data
ddi_file <- read_ipums_ddi("./atus_00027.xml")
Sys.sleep(30)
micro_data <- read_ipums_micro(ddi_file)

## Use of data from IPUMS ATUS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.

Sys.sleep(30)

micro_data$OCC2 <- as.character(as_factor(micro_data$OCC2))
Sys.sleep(30)
# Recode variables to match desired categories
micro_data <- micro_data %>%
  mutate(
    race = dplyr::recode(as.character(RACE),
                         `100` = "White",
                         `110` = "Black",
                         `120` = "American Indian",
                         `131` = "Asian/Pacific Island",
                         `132` = "Asian/Pacific Island",
                         .default = "Two or More Races"),
    poverty = dplyr::recode(as.character(POVERTY185),
                            `11` = "Below Poverty",
                            `12` = "Below Poverty",
                            `20` = "Above Poverty",
                            .default = "NIU"),
    hispan = dplyr::recode(as.character(HISPAN),
                           `100` = "Not Hispanic",
                           `901` = "NIU",
                           `902` = "NIU",
                           .default = "Hispanic"),
    race_ethnicity = ifelse(hispan == "Hispanic", "Hispanic", race),
    marst = dplyr::recode(as.character(MARST),
                          `1` = "Married",
                          `2` = "Married",
                          `3` = "Separated, Widowed, or Divorced",
                          `4` = "Separated, Widowed, or Divorced",
                          `5` = "Separated, Widowed, or Divorced",
                          `6` = "Single-Never-Married",
                          `7` = "Separated, Widowed, or Divorced",
                          `9` = "NIU"),
    sex = dplyr::recode(as.character(SEX),
                        `1` = "Male", `2` = "Female", `9` = "NIU"),
    empstat = dplyr::recode(as.character(EMPSTAT),
                            `1` = "Employed", `2` = "Employed",
                            `3` = "Unemployed", `4` = "Unemployed",
                            `5` = "NILF"),
    educ = dplyr::recode(as.character(EDUC),
                         `0` = "NIU",
                         `10` = "No HS Diploma",
                         `11` = "No HS Diploma",
                         `12` = "No HS Diploma",
                         `13` = "No HS Diploma",
                         `14` = "No HS Diploma",
                         `15` = "No HS Diploma",
                         `16` = "No HS Diploma",
                         `17` = "No HS Diploma",
                         `20` = "High School",
                         `21` = "High School",
                         `30` = "Some College",
                         `31` = "Some College",
                         `32` = "Some College",
                         `80` = "Some College",
                         `110` = "Some College",
                         `40` = "Bachelor's Degree",
                         `41` = "Graduate Degree",
                         `42` = "Graduate Degree",
                         `43` = "Graduate Degree",
                         .default = as.character(EDUC)),
    age_category = case_when(
      AGE < 18 ~ "Under 18",
      AGE >= 18 & AGE < 25 ~ "Eighteen/Twenty-Four",
      AGE >= 25 & AGE < 35 ~ "Twenty-Five/Thirty-Five",
      AGE >= 35 & AGE < 45 ~ "Thirty-Five/Forty-Five",
      AGE >= 45 & AGE < 55 ~ "Forty-Five/Fifty-Five",
      AGE >= 55 & AGE < 65 ~ "Fifty-Five/Sixty-Five",
      AGE >= 65 ~ "Sixty-Five Plus"
    ),
    day = case_when(
      DAY == 1 ~ "Sunday",
      DAY == 2 ~ "Monday",
      DAY == 3 ~ "Tuesday",
      DAY == 4 ~ "Wednesday",
      DAY == 5 ~ "Thursday",
      DAY == 6 ~ "Friday",
      DAY == 7 ~ "Saturday"
    ),
    prime_age = case_when(
      AGE < 25 ~ "Under Twenty-Five",
      AGE >= 25 & AGE < 55 ~ "Prime Age",
      AGE >= 55 ~ "Fifty-Five Plus"
    ),
    child_age = case_when(
      YNGCH < 5 ~ "Under Five",
      YNGCH >= 5 & YNGCH < 12 ~ "Five_Eleven",
      YNGCH >= 12 & YNGCH < 18 ~ "Twelve_Eighteen",
      YNGCH >= 18 & YNGCH < 99 ~ "Eighteen Plus",
      YNGCH == 99 ~ "NIU"
    ),
    nchild = as.numeric(NCHILD)
  )

setwd("C:/Users/sc363/OneDrive/Work Items/Workspace/CareBoard/CareBoard/")
  Sys.sleep(60)
# Create Mothers and Fathers samples
mothers <- micro_data %>%
  filter(sex == "Female" & YNGCH != 99) %>%
  mutate(care_status = "Mothers")

fathers <- micro_data %>%
  filter(sex == "Male" & YNGCH != 99) %>%
  mutate(care_status = "Fathers")

# Create samples for childless men and women
men_no_child <- micro_data %>%
  filter(sex == "Male" & AGE > 17 & !CASEID %in% fathers$CASEID) %>%
  mutate(care_status = "Childless Men")

women_no_child <- micro_data %>%
  filter(sex == "Female" & AGE > 17 & !CASEID %in% mothers$CASEID) %>%
  mutate(care_status = "Childless Women")
# Combine all samples into one dataframe
micro_data <- bind_rows(mothers, fathers, men_no_child, women_no_child)

# Load activity codes and define activity categories
act_df <- fread("01_preliminary-code-and-data/01_ATUSActivityCrossover.csv")


micro_data <- micro_data %>%
  left_join(act_df, by = c("ACTIVITY" = "Code")) %>%
  mutate(FOCUS = case_when(
    developmental == 1 ~ "developmental",
    health == 1 ~ "health",
    daily_living == 1 ~ "daily_living",
    TRUE ~ "none"
  )) %>%
  select(-developmental, -health, -daily_living)


micro_data <- micro_data %>%
  mutate(formal_care_focus = case_when(
    OCC2 == "Education, training, and library occupations" ~ "developmental",
    OCC2 == "Healthcare practitioner and technical occupations" ~ "health",
    OCC2 == "Healthcare support occupations" ~ "health",
    OCC2 == "Protective service occupations" ~ "health",
    OCC2 == "Personal care and service occupations" ~ "daily_living",
    OCC2 == "Food preparation and serving related occupations" ~ "daily_living",
    OCC2 == "Building and grounds cleaning and maintenance occupations" ~ "daily_living",
    OCC2 == "Community and social service occupations" ~ "daily_living",
    TRUE ~ "none"
  ))

micro_data <- micro_data %>%
  mutate(formal_care_focus = ifelse(OCC_CPS8 >= 4200 & OCC_CPS8 <= 4250, "daily_living", formal_care_focus))


fwrite(micro_data, "02_data-prep-and-cleaning/02_ATUSdata.csv")

summary(micro_data)

##       YEAR          CASEID              SERIAL          STRATA       
##  Min.   :2003   Min.   :2.003e+13   Min.   :    1   Min.   :  -1     
##  1st Qu.:2006   1st Qu.:2.006e+13   1st Qu.: 2947   1st Qu.:1200     
##  Median :2011   Median :2.011e+13   Median : 5842   Median :2700     
##  Mean   :2012   Mean   :2.012e+13   Mean   : 6158   Mean   :2727     
##  3rd Qu.:2017   3rd Qu.:2.017e+13   3rd Qu.: 8823   3rd Qu.:4100     
##  Max.   :2023   Max.   :2.023e+13   Max.   :20720   Max.   :5604     
##                                                     NA's   :1566818  
##     STATEFIP        HH_SIZE         FAMINCOME       HH_NUMADULTS   
##  Min.   : 1.00   Min.   : 1.000   Min.   :  1.00   Min.   : 0.000  
##  1st Qu.:13.00   1st Qu.: 2.000   1st Qu.:  8.00   1st Qu.: 1.000  
##  Median :28.00   Median : 2.000   Median : 12.00   Median : 2.000  
##  Mean   :28.28   Mean   : 2.749   Mean   : 65.57   Mean   : 1.892  
##  3rd Qu.:42.00   3rd Qu.: 4.000   3rd Qu.: 15.00   3rd Qu.: 2.000  
##  Max.   :56.00   Max.   :16.000   Max.   :998.00   Max.   :12.000  
##                                                                    
##  KIDWAKETIME         KIDBEDTIME          POVERTY185          PERNUM 
##  Length:4570050     Length:4570050     Min.   :10.0      Min.   :1  
##  Class :character   Class :character   1st Qu.:11.0      1st Qu.:1  
##  Mode  :character   Mode  :character   Median :20.0      Median :1  
##                                        Mean   :20.6      Mean   :1  
##                                        3rd Qu.:20.0      3rd Qu.:1  
##                                        Max.   :99.0      Max.   :1  
##                                        NA's   :2933156              
##      LINENO       DAY             WT06                WT20          
##  Min.   :1   Min.   :1.000   Min.   :   419472   Min.   :        0  
##  1st Qu.:1   1st Qu.:2.000   1st Qu.:  3057593   1st Qu.:  3579908  
##  Median :1   Median :4.000   Median :  5447414   Median :  6530155  
##  Mean   :1   Mean   :3.971   Mean   :  7455690   Mean   :  8700946  
##  3rd Qu.:1   3rd Qu.:6.000   3rd Qu.:  9157729   3rd Qu.: 11190731  
##  Max.   :1   Max.   :7.000   Max.   :209010030   Max.   :137151708  
##                              NA's   :151274      NA's   :4240222    
##       AGE             SEX           RACE           HISPAN        MARST      
##  Min.   :15.00   Min.   :1.0   Min.   :100.0   Min.   :100   Min.   :1.000  
##  1st Qu.:36.00   1st Qu.:1.0   1st Qu.:100.0   1st Qu.:100   1st Qu.:1.000  
##  Median :47.00   Median :2.0   Median :100.0   Median :100   Median :1.000  
##  Mean   :49.12   Mean   :1.6   Mean   :103.9   Mean   :115   Mean   :2.696  
##  3rd Qu.:62.00   3rd Qu.:2.0   3rd Qu.:100.0   3rd Qu.:100   3rd Qu.:4.000  
##  Max.   :85.00   Max.   :2.0   Max.   :599.0   Max.   :250   Max.   :6.000  
##                                                                             
##       EDUC         EMPSTAT         OCC2              OCC_CPS8    
##  Min.   :10.0   Min.   :1.00   Length:4570050     Min.   :   10  
##  1st Qu.:21.0   1st Qu.:1.00   Class :character   1st Qu.: 2630  
##  Median :30.0   Median :1.00   Mode  :character   Median : 5240  
##  Mean   :30.4   Mean   :2.46                      Mean   :35009  
##  3rd Qu.:40.0   3rd Qu.:5.00                      3rd Qu.:99999  
##  Max.   :43.0   Max.   :5.00                      Max.   :99999  
##                                                                  
##     EARNWEEK      HRSWORKT_CPS8    SPEMPSTAT        CPSIDP         
##  Min.   :     0   Min.   :   1   Min.   : 1.0   Min.   :2.001e+13  
##  1st Qu.:   683   1st Qu.:  40   1st Qu.: 1.0   1st Qu.:2.005e+13  
##  Median :  1923   Median :  50   Median : 3.0   Median :2.010e+13  
##  Mean   : 44566   Mean   :4091   Mean   :43.1   Mean   :2.010e+13  
##  3rd Qu.:100000   3rd Qu.:9999   3rd Qu.:99.0   3rd Qu.:2.015e+13  
##  Max.   :100000   Max.   :9999   Max.   :99.0   Max.   :2.023e+13  
##                                                                    
##     ECPRIOR            YNGCH           NCHILD          ACTLINE     
##  Min.   : 0.0      Min.   : 0.00   Min.   :0.0000   Min.   : 1.00  
##  1st Qu.: 0.0      1st Qu.: 8.00   1st Qu.:0.0000   1st Qu.: 5.00  
##  Median : 0.0      Median :99.00   Median :0.0000   Median :10.00  
##  Mean   : 1.5      Mean   :57.26   Mean   :0.8918   Mean   :11.89  
##  3rd Qu.: 0.0      3rd Qu.:99.00   3rd Qu.:2.0000   3rd Qu.:17.00  
##  Max.   :99.0      Max.   :99.00   Max.   :9.0000   Max.   :91.00  
##  NA's   :2130230                                                   
##     ACTIVITY       DURATION_EXT        DURATION         SCC_ALL_LN     
##  Min.   : 10101   Min.   :   1.00   Min.   :   1.00   Min.   :   0.00  
##  1st Qu.: 20201   1st Qu.:  15.00   1st Qu.:  15.00   1st Qu.:   0.00  
##  Median :110101   Median :  30.00   Median :  30.00   Median :   0.00  
##  Mean   : 88624   Mean   :  82.86   Mean   :  74.25   Mean   :   6.86  
##  3rd Qu.:120312   3rd Qu.:  90.00   3rd Qu.:  90.00   3rd Qu.:   0.00  
##  Max.   :509999   Max.   :1472.00   Max.   :1350.00   Max.   :1195.00  
##                                                                        
##    SCC_OWN_LN       SEC_ALL_LN         START               STOP          
##  Min.   :   0.0   Min.   :   0.0    Length:4570050     Length:4570050    
##  1st Qu.:   0.0   1st Qu.:   0.0    Class :character   Class :character  
##  Median :   0.0   Median :   0.0    Mode  :character   Mode  :character  
##  Mean   :   5.9   Mean   :   0.5                                         
##  3rd Qu.:   0.0   3rd Qu.:   0.0                                         
##  Max.   :1195.0   Max.   :1097.0                                         
##  NA's   :395092   NA's   :2130230                                        
##      race             poverty             hispan          race_ethnicity    
##  Length:4570050     Length:4570050     Length:4570050     Length:4570050    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     marst               sex              empstat              educ          
##  Length:4570050     Length:4570050     Length:4570050     Length:4570050    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  age_category           day             prime_age          child_age        
##  Length:4570050     Length:4570050     Length:4570050     Length:4570050    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      nchild       care_status          Activity            PaidWork      
##  Min.   :0.0000   Length:4570050     Length:4570050     Min.   :1        
##  1st Qu.:0.0000   Class :character   Class :character   1st Qu.:1        
##  Median :0.0000   Mode  :character   Mode  :character   Median :1        
##  Mean   :0.8918                                         Mean   :1        
##  3rd Qu.:2.0000                                         3rd Qu.:1        
##  Max.   :9.0000                                         Max.   :1        
##                                                         NA's   :4227387  
##    FormalWork        ChildCare         ElderCare       Householdcare    
##  Min.   :1         Min.   :1         Min.   :1         Min.   :1        
##  1st Qu.:1         1st Qu.:1         1st Qu.:1         1st Qu.:1        
##  Median :1         Median :1         Median :1         Median :1        
##  Mean   :1         Mean   :1         Mean   :1         Mean   :1        
##  3rd Qu.:1         3rd Qu.:1         3rd Qu.:1         3rd Qu.:1        
##  Max.   :1         Max.   :1         Max.   :1         Max.   :1        
##  NA's   :4368311   NA's   :4236140   NA's   :4516044   NA's   :3673030  
##     Selfcare          Leisure           Sleeping        Volunteering    
##  Min.   :1         Min.   :1         Min.   :1         Min.   :1        
##  1st Qu.:1         1st Qu.:1         1st Qu.:1         1st Qu.:1        
##  Median :1         Median :1         Median :1         Median :1        
##  Mean   :1         Mean   :1         Mean   :1         Mean   :1        
##  3rd Qu.:1         3rd Qu.:1         3rd Qu.:1         3rd Qu.:1        
##  Max.   :1         Max.   :1         Max.   :1         Max.   :1        
##  NA's   :3480732   NA's   :3602525   NA's   :4053438   NA's   :4540087  
##    Education          FOCUS           formal_care_focus 
##  Min.   :1         Length:4570050     Length:4570050    
##  1st Qu.:1         Class :character   Class :character  
##  Median :1         Mode  :character   Mode  :character  
##  Mean   :1                                              
##  3rd Qu.:1                                              
##  Max.   :1                                              
##  NA's   :4553297

APPENDIX: RAW CPS

rm()
gc()

##             used   (Mb) gc trigger    (Mb)   max used    (Mb)
## Ncells   2614634  139.7   13629111   727.9   17036388   909.9
## Vcells 637219444 4861.6 2272750495 17339.8 4438168095 33860.6

# Load required packages
if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")
pacman::p_load(
               ipumsr,
               tidyverse,
               data.table)

# Set Working Directory
setwd("C:/Users/sc363/OneDrive/Work Items/Workspace/CareBoard/CareBoard/")

# Set API Key
set_ipums_api_key("59cba10d8a5da536fc06b59dda3a0cbbbd724585ad9e6db5db34408f",
                  save = TRUE,
                  overwrite = TRUE)

## Existing .Renviron file copied to C:\Users\sc363\OneDrive\Documents/.Renviron_backup for backup purposes.

## The environment variable IPUMS_API_KEY has been set and saved for future sessions.

Sys.sleep(15)
# Define variables
vars <- c("AGE",
          "MARST",
          "OCC2010",
          "RACE",
          "SEX",
          "YNGCH",
          "EMPSTAT",
          "LABFORCE",
          "ABSENT",
          "WHYABSNT",
          "MOMLOC",
          "POPLOC",
          "PERNUM",
          "HISPAN",
          "AHRSWORKT",
          "IND1990",
          "WKSTAT",
          "SPLOC",
          "COMPWT",
          "TELWRKPAY",
          "CLASSWKR",
          "EDUC",
          "REGION",
          "STATEFIP",
          "FAMSIZE",
          "NCHILD",
          "DIFFCARE",
          "NILFACT")

# Load Sample IDs
samples <- read.csv("01_preliminary-code-and-data/01_CPSSampleIDs.csv")$CPS_Sample_IDs
Sys.sleep(15)
# Create and Submit Data Extract
cps_ext_def <- define_extract_cps(
  description = "CPS Care Variable Extract",
  samples = samples,
  variables = vars
)

## Warning: `define_extract_cps()` was deprecated in ipumsr 0.8.0.
## ℹ Please use `define_extract_micro()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Sys.sleep(30)
cps_ext_submitted <- submit_extract(cps_ext_def)

## Successfully submitted IPUMS CPS extract number 413

Sys.sleep(30)
cps_ext_complete <- wait_for_extract(cps_ext_submitted)

## Checking extract status...

## Waiting 10 seconds...

## Checking extract status...

## Waiting 20 seconds...

## Checking extract status...

## Waiting 30 seconds...

## Checking extract status...

## Waiting 40 seconds...

## Checking extract status...

## Waiting 50 seconds...

## Checking extract status...

## Waiting 60 seconds...

## Checking extract status...

## Waiting 70 seconds...

## Checking extract status...

## Waiting 80 seconds...

## Checking extract status...

## Waiting 90 seconds...

## Checking extract status...

## Waiting 100 seconds...

## Checking extract status...

## Waiting 110 seconds...

## Checking extract status...

## Waiting 120 seconds...

## Checking extract status...

## Waiting 130 seconds...

## Checking extract status...

## IPUMS CPS extract 413 is ready to download.

Sys.sleep(60)
filepath <- download_extract(cps_ext_submitted, overwrite = TRUE)

##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |====                                                                  |   5%  |                                                                              |================                                                      |  23%  |                                                                              |============================                                          |  40%  |                                                                              |=====================================                                 |  52%  |                                                                              |=================================================                     |  70%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |                                                                      |   1%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |==                                                                    |   4%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |=======                                                               |  11%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |=========                                                             |  14%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  23%  |                                                                              |================                                                      |  24%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |=======================                                               |  34%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  39%  |                                                                              |============================                                          |  40%  |                                                                              |============================                                          |  41%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |==============================                                        |  44%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |===================================                                   |  51%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |=====================================                                 |  54%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  56%  |                                                                              |========================================                              |  57%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |==========================================                            |  61%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |============================================                          |  64%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |=================================================                     |  71%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |===================================================                   |  74%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  76%  |                                                                              |======================================================                |  77%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |========================================================              |  81%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |==========================================================            |  84%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |===============================================================       |  91%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |=================================================================     |  94%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================|  99%  |                                                                              |======================================================================| 100%

## DDI codebook file saved to C:/Users/sc363/OneDrive/Work Items/Workspace/CareBoard/CareBoard/cps_00413.xml
## Data file saved to C:/Users/sc363/OneDrive/Work Items/Workspace/CareBoard/CareBoard/cps_00413.dat.gz

Sys.sleep(30)
print("API work has finished")

## [1] "API work has finished"

# Load IPUMS data
ddi <- read_ipums_ddi(filepath)
Sys.sleep(30)
micro_data <- read_ipums_micro(ddi)

## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.

Sys.sleep(30)
micro_data <- micro_data %>% filter(YEAR >= 1990)
print("Data has Downloaded")

## [1] "Data has Downloaded"

# Recode variables
micro_data <- micro_data %>%
  mutate(
    state_fip = as_factor(STATEFIP),
    region = as_factor(REGION),
    nchild = as.numeric(NCHILD),
    famsize = as.numeric(FAMSIZE),
    month = recode(as.character(MONTH),
                   `1` = "January", `2` = "February", `3` = "March",
                   `4` = "April", `5` = "May", `6` = "June",
                   `7` = "July", `8` = "August", `9` = "September",
                   `10` = "October", `11` = "November", `12` = "December"),
    hispan = recode(as.character(HISPAN),
                    `0` = "Not Hispanic", `901` = "NIU", `902` = "NIU",
                    .default = "Hispanic"),
    classwkr = recode(as.character(CLASSWKR),
                      `00` = "NIU",
                      `13` = "Self_Employed",
                      `14` = "Self_Employed",
                      `22` = "Wage/Salary",
                      `23` = "Wage/Salary",
                      `25` = "Government",
                      `26` = "Government",
                      `27` = "Government",
                      `28` = "Government",
                      `29` = "Unpaid"),
    race = recode(as.character(RACE),
                  `100` = "White",
                  `200` = "Black",
                  `300` = "American Indian",
                  `651` = "Asian/Pacific Island",
                  `652` = "Asian/Pacific Island",
                  `999` = "NIU",
                  .default = "Two or More Races"),
    race_ethnicity = ifelse(hispan == "Hispanic", "Hispanic", race),
    marst = recode(as.character(MARST),
                   `1` = "Married", `2` = "Married",
                   `3` = "Separated, Widowed, or Divorced",
                   `4` = "Separated, Widowed, or Divorced",
                   `5` = "Separated, Widowed, or Divorced",
                   `6` = "Single-Never-Married",
                   `7` = "Separated, Widowed, or Divorced",
                   `9` = "NIU"),
    labforce = recode(as.character(LABFORCE),
                      `0` = "NIU",
                      `1` = "Not in the Labor Force",
                      `2` = "In the Labor Force"),
    telwrkpay = recode(as.character(TELWRKPAY),
                       `0` = "NIU",
                       `1` = "Teleworked",
                       `2` = "No Telework"),
    sex = recode(as.character(SEX),
                 `1` = "Male", `2` = "Female", `9` = "NIU"),
    absent = recode(as.character(ABSENT),
                    `0` = "NIU",
                    `1` = "No",
                    `2` = "Yes, Laid Off",
                    `3` = "Yes, Other"),
    why_absnt = recode(as.character(WHYABSNT),
                       `0` = "NIU",
                       `5` = "Vacation/Personal days",
                       `6` = "Own illness/medical problem",
                       `7` = "Care Reason",
                       `8` = "Care Reason",
                       `9` = "Care Reason",
                       `10` = "Non-Care Reason",
                       `11` = "Non-Care Reason",
                       `12` = "Non-Care Reason",
                       `13` = "Non-Care Reason",
                       `15` = "Other"),
    empstat = recode(as.character(EMPSTAT),
                     `0` = "NIU",
                     `1` = "Armed Forces",
                     `10` = "Employed",
                     `12` = "Employed",
                     `20` = "Unemployed",
                     `21` = "Unemployed",
                     `22` = "Unemployed",
                     `30` = "NILF",
                     `31` = "NILF",
                     `32` = "NILF",
                     `33` = "NILF",
                     `34` = "NILF",
                     `35` = "NILF",
                     `36` = "NILF"),
    wkstat = recode(as.character(WKSTAT),
                    `10` = "Full Time",
                    `11` = "Full Time",
                    `12` = "Full Time",
                    `13` = "Full Time",
                    `14` = "Full Time",
                    `15` = "Full Time",
                    `20` = "Part Time",
                    `21` = "Part Time",
                    `22` = "Part Time",
                    `40` = "Part Time",
                    `41` = "Part Time",
                    `42` = "Part Time",
                    `50` = "Unemployed",
                    `60` = "Unemployed",
                    `99` = "NIU"),
    labor_status = ifelse(empstat == "Employed", wkstat, empstat),
    educ = case_when(
                     EDUC <= 1 ~ "NIU",
                     EDUC >= 2 & EDUC <= 72 ~ "No HS Diplomma",
                     EDUC == 73 ~ "High School",
                     EDUC >= 80 & EDUC <= 110  ~ "Some College",
                     EDUC >= 121 & EDUC <= 122  ~ "Some College",
                     EDUC == 111 ~ "Bachelor's Degree",
                     EDUC >= 123 ~ "Graduate Degree"),
    age_category = case_when(
      AGE < 18 ~ "Under 18",
      AGE >= 18 & AGE < 25 ~ "Eighteen/Twenty-Four",
      AGE >= 25 & AGE < 35 ~ "Twenty-Five/Thirty-Five",
      AGE >= 35 & AGE < 45 ~ "Thirty-Five/Forty-Five",
      AGE >= 45 & AGE < 55 ~ "Forty-Five/Fifty-Five",
      AGE >= 55 & AGE < 65 ~ "Fifty-Five/Sixty-Five",
      AGE >= 65 ~ "Sixty-Five Plus"
    ),
    prime_age = case_when(
      AGE < 25 ~ "Under Twenty-Five",
      AGE >= 25 & AGE < 55 ~ "Prime Age",
      AGE >= 55 ~ "Fifty-Five Plus"
    ),
    child_age = case_when(
      YNGCH < 5 ~ "Under Five",
      YNGCH >= 5 & YNGCH < 12 ~ "Five_Eleven",
      YNGCH >= 12 & YNGCH < 18 ~ "Twelve_Eighteen",
      YNGCH >= 18 & YNGCH < 99 ~ "Eighteen Plus",
      YNGCH == 99 ~ "NIU"
    ),
    nilf_activity = case_when(
      NILFACT == 1 ~ "Disabled",
      NILFACT == 2 ~ "Ill",
      NILFACT == 3 ~ "School",
      NILFACT == 4 ~ "Homemaker",
      NILFACT == 6 ~ "Other",
      NILFACT == 99 ~ "NIU"
    )
  )
Sys.sleep(10)
print("Descriptives are Done")

## [1] "Descriptives are Done"

# Categorize occupations and industries
micro_data <- micro_data %>%
  mutate(
    care_job = case_when(
      OCC2010 %in% c(230, 325, 420, 350, 310) ~ "Care_Occ",
      OCC2010 >= 3000 & OCC2010 <= 3655 ~ "Care_Occ",
      OCC2010 >= 2200 & OCC2010 <= 2550 ~ "Care_Occ",
      OCC2010 >= 2000 & OCC2010 <= 2060 ~ "Care_Occ",
      OCC2010 >= 4000 & OCC2010 <= 4160 ~ "Care_Occ",
      OCC2010 %in% c(4460, 4465, 4500, 4510, 4520, 4600, 4610) ~ "Care_Occ",
      TRUE ~ "NonCare_Occ"
    ),
    health_care_occ = ifelse(OCC2010 >= 3000 & OCC2010 <= 3655 |
                               OCC2010 == 350,
                             "HealthCare_Occ",
                             "NonHealthCare_Occ"),
    education_occ = ifelse(OCC2010 >= 2200 & OCC2010 <= 2550 |
                             OCC2010 == 230,
                           "Education_Occ",
                           "NonEducation_Occ"),
    social_services_occ = ifelse(OCC2010 >= 2000 & OCC2010 <= 2060 |
                                   OCC2010 == 420,
                                 "Social_Occ",
                                 "NonSocial_Occ"),
    child_care_occ = ifelse(OCC2010 == 4600,
                            "Childcare_Occ",
                            "NonChildcare_Occ"),
    death_care_occ = ifelse(OCC2010 %in% c(4460, 4465, 325),
                            "DeathCare_Occ",
                            "NonDeathCare_Occ"),
    personal_care = ifelse(OCC2010 == 4610,
                           "PersonalCare_Occ",
                           "NonPersonalCare_Occ"),
    self_care = ifelse(OCC2010 >= 4500 & OCC2010 <= 4520,
                       "Selfcare_Occ",
                       "NonSelfcare_Occ"),
    food_care = ifelse(OCC2010 >= 4000 & OCC2010 <= 4160 |
                         OCC2010 == 310,
                       "FoodCare_Occ",
                       "NonFoodCare_Occ"),
    care_focus = case_when(
      education_occ == "Education_Occ" ~ "developmental",
      social_services_occ == "Social_Occ" ~ "health",
      health_care_occ == "HealthCare_Occ" ~ "health",
      child_care_occ == "Childcare_Occ" ~ "developmental",
      death_care_occ == "DeathCare_Occ" ~ "daily_living",
      personal_care == "PersonalCare_Occ" ~ "health",
      self_care == "Selfcare_Occ" ~ "daily_living",
      food_care == "FoodCare_Occ" ~ "daily_living",
      TRUE ~ NA_character_
    )
  )
Sys.sleep(10)
mothers <- micro_data %>%
  filter(sex == "Female" & YNGCH != 99) %>%
  mutate(care_status = "Mothers")
Sys.sleep(10)
fathers <- micro_data %>%
  filter(sex == "Male" & YNGCH != 99) %>%
  mutate(care_status = "Fathers")
Sys.sleep(10)
# Create samples for childless men and women
men_no_child <- micro_data %>%
  filter(sex == "Male" & AGE > 17 & !CPSIDP %in% fathers$CPSIDP) %>%
  mutate(care_status = "Childless Men")
Sys.sleep(10)
women_no_child <- micro_data %>%
  filter(sex == "Female" & AGE > 17 & !CPSIDP %in% mothers$CPSIDP) %>%
  mutate(care_status = "Childless Women")
Sys.sleep(10)
# Combine all samples into one dataframe
micro_data <- bind_rows(mothers, fathers, men_no_child, women_no_child)
Sys.sleep(10)
# Save the final dataframe to a CSV file
fwrite(micro_data, "02_data-prep-and-cleaning/02_CPSdata.csv")

summary(micro_data)

##       YEAR          SERIAL          MONTH           HWTFINL     
##  Min.   :1990   Min.   :    1   Min.   : 1.000   Min.   :    0  
##  1st Qu.:1998   1st Qu.:17262   1st Qu.: 3.000   1st Qu.: 1196  
##  Median :2006   Median :34430   Median : 6.000   Median : 2183  
##  Mean   :2006   Mean   :34597   Mean   : 6.477   Mean   : 2233  
##  3rd Qu.:2015   3rd Qu.:51789   3rd Qu.: 9.000   3rd Qu.: 3119  
##  Max.   :2024   Max.   :74625   Max.   :12.000   Max.   :34716  
##                                                                 
##      CPSID              ASECFLAG            REGION         STATEFIP    
##  Min.   :1.988e+13   Min.   :2          Min.   :11.00   Min.   : 1.00  
##  1st Qu.:1.997e+13   1st Qu.:2          1st Qu.:21.00   1st Qu.:13.00  
##  Median :2.006e+13   Median :2          Median :31.00   Median :29.00  
##  Mean   :2.006e+13   Mean   :2          Mean   :27.66   Mean   :28.25  
##  3rd Qu.:2.014e+13   3rd Qu.:2          3rd Qu.:33.00   3rd Qu.:41.00  
##  Max.   :2.024e+13   Max.   :2          Max.   :42.00   Max.   :56.00  
##                      NA's   :36231133                                  
##      PERNUM           WTFINL          CPSIDP              CPSIDV         
##  Min.   : 1.000   Min.   :    0   Min.   :1.988e+13   Min.   :1.988e+14  
##  1st Qu.: 1.000   1st Qu.: 1207   1st Qu.:1.997e+13   1st Qu.:1.997e+14  
##  Median : 1.000   Median : 2225   Median :2.006e+13   Median :2.006e+14  
##  Mean   : 1.676   Mean   : 2272   Mean   :2.006e+13   Mean   :2.006e+14  
##  3rd Qu.: 2.000   3rd Qu.: 3170   3rd Qu.:2.014e+13   3rd Qu.:2.014e+14  
##  Max.   :23.000   Max.   :34716   Max.   :2.024e+13   Max.   :2.024e+14  
##                                                                          
##       AGE             SEX             RACE           MARST      
##  Min.   : 0.00   Min.   :1.000   Min.   :100.0   Min.   :1.000  
##  1st Qu.:32.00   1st Qu.:1.000   1st Qu.:100.0   1st Qu.:1.000  
##  Median :45.00   Median :2.000   Median :100.0   Median :1.000  
##  Mean   :46.85   Mean   :1.525   Mean   :144.7   Mean   :2.832  
##  3rd Qu.:60.00   3rd Qu.:2.000   3rd Qu.:100.0   3rd Qu.:5.000  
##  Max.   :90.00   Max.   :2.000   Max.   :830.0   Max.   :9.000  
##                                                                 
##      MOMLOC            POPLOC           SPLOC            FAMSIZE      
##  Min.   : 0.0000   Min.   : 0.000   Min.   : 0.0000   Min.   : 1.000  
##  1st Qu.: 0.0000   1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.: 2.000  
##  Median : 0.0000   Median : 0.000   Median : 1.0000   Median : 2.000  
##  Mean   : 0.1804   Mean   : 0.115   Mean   : 0.9457   Mean   : 2.831  
##  3rd Qu.: 0.0000   3rd Qu.: 0.000   3rd Qu.: 2.0000   3rd Qu.: 4.000  
##  Max.   :16.0000   Max.   :16.000   Max.   :21.0000   Max.   :25.000  
##                                                                       
##      NCHILD           YNGCH           HISPAN          EMPSTAT     
##  Min.   :0.0000   Min.   : 0.00   Min.   :  0.00   Min.   : 0.00  
##  1st Qu.:0.0000   1st Qu.:13.00   1st Qu.:  0.00   1st Qu.:10.00  
##  Median :0.0000   Median :99.00   Median :  0.00   Median :10.00  
##  Mean   :0.7611   Mean   :63.67   Mean   : 29.23   Mean   :18.82  
##  3rd Qu.:1.0000   3rd Qu.:99.00   3rd Qu.:  0.00   3rd Qu.:34.00  
##  Max.   :9.0000   Max.   :99.00   Max.   :902.00   Max.   :36.00  
##                                                                   
##     LABFORCE        OCC2010        IND1990         CLASSWKR       AHRSWORKT  
##  Min.   :0.000   Min.   :  10   Min.   :  0.0   Min.   : 0.00   Min.   :  1  
##  1st Qu.:1.000   1st Qu.:3640   1st Qu.:  0.0   1st Qu.: 0.00   1st Qu.: 40  
##  Median :2.000   Median :5860   Median :410.0   Median :22.00   Median : 50  
##  Mean   :1.649   Mean   :6221   Mean   :398.6   Mean   :18.37   Mean   :429  
##  3rd Qu.:2.000   3rd Qu.:9999   3rd Qu.:760.0   3rd Qu.:22.00   3rd Qu.:999  
##  Max.   :2.000   Max.   :9999   Max.   :952.0   Max.   :99.00   Max.   :999  
##                                                                              
##      ABSENT          WHYABSNT           WKSTAT         NILFACT       
##  Min.   :0.0000   Min.   : 0.0000   Min.   :10.00   Min.   : 1       
##  1st Qu.:0.0000   1st Qu.: 0.0000   1st Qu.:11.00   1st Qu.:99       
##  Median :0.0000   Median : 0.0000   Median :13.00   Median :99       
##  Mean   :0.4477   Mean   : 0.1894   Mean   :45.72   Mean   :89       
##  3rd Qu.:1.0000   3rd Qu.: 0.0000   3rd Qu.:99.00   3rd Qu.:99       
##  Max.   :3.0000   Max.   :15.0000   Max.   :99.00   Max.   :99       
##                                                     NA's   :5005317  
##       EDUC          DIFFCARE            COMPWT          TELWRKPAY       
##  Min.   :  1.0   Min.   :0          Min.   :    0     Min.   :0         
##  1st Qu.: 73.0   1st Qu.:1          1st Qu.: 1251     1st Qu.:0         
##  Median : 81.0   Median :1          Median : 2477     Median :1         
##  Mean   : 83.2   Mean   :1          Mean   : 2401     Mean   :1         
##  3rd Qu.:111.0   3rd Qu.:1          3rd Qu.: 3319     3rd Qu.:2         
##  Max.   :125.0   Max.   :2          Max.   :34709     Max.   :2         
##                  NA's   :21452109   NA's   :9464479   NA's   :37639716  
##         state_fip                                region            nchild      
##  California  : 3209470   South Atlantic Division    :6957089   Min.   :0.0000  
##  New York    : 2026559   Pacific Division           :5407090   1st Qu.:0.0000  
##  Texas       : 1936879   East North Central Division:5003107   Median :0.0000  
##  Florida     : 1754061   Middle Atlantic Division   :4474668   Mean   :0.7611  
##  Pennsylvania: 1401855   Mountain Division          :4279590   3rd Qu.:1.0000  
##  Illinois    : 1330737   West North Central Division:3985983   Max.   :9.0000  
##  (Other)     :27840506   (Other)                    :9392540                   
##     famsize          month              hispan            classwkr        
##  Min.   : 1.000   Length:39500067    Length:39500067    Length:39500067   
##  1st Qu.: 2.000   Class :character   Class :character   Class :character  
##  Median : 2.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 2.831                                                           
##  3rd Qu.: 4.000                                                           
##  Max.   :25.000                                                           
##                                                                           
##      race           race_ethnicity        marst             labforce        
##  Length:39500067    Length:39500067    Length:39500067    Length:39500067   
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   telwrkpay             sex               absent           why_absnt        
##  Length:39500067    Length:39500067    Length:39500067    Length:39500067   
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    empstat             wkstat          labor_status           educ          
##  Length:39500067    Length:39500067    Length:39500067    Length:39500067   
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  age_category        prime_age          child_age         nilf_activity     
##  Length:39500067    Length:39500067    Length:39500067    Length:39500067   
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    care_job         health_care_occ    education_occ      social_services_occ
##  Length:39500067    Length:39500067    Length:39500067    Length:39500067    
##  Class :character   Class :character   Class :character   Class :character   
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character   
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  child_care_occ     death_care_occ     personal_care       self_care        
##  Length:39500067    Length:39500067    Length:39500067    Length:39500067   
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   food_care          care_focus        care_status       
##  Length:39500067    Length:39500067    Length:39500067   
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##

APPENDIX: RAW ASEC

The code below produces a major source of raw data utilized in the production of Care Board Statistics, ASEC Micro Data. The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) is a special supplement to the monthly Current Population Survey (CPS), which is conducted by the U.S. Census Bureau and the Bureau of Labor Statistics (BLS). The monthly CPS is primarily focused on labor force characteristics, such as employment, unemployment, and workforce participation. The ** CPS ASEC** goes beyond this by collecting detailed information on income, poverty, health insurance coverage, and demographic characteristics, making it the primary source of data for measuring income inequality and economic well-being in the U.S.

The monthly CPS is a regular survey conducted every month. The CPS ASEC is conducted once a year, typically in March, and includes both regular CPS respondents and additional over sampled households to improve estimates for specific population groups. The CPS ASEC expands the sample size compared to the monthly CPS by including additional households to improve data accuracy, especially for poverty and income statistics. The CPS Monthly data is used for labor force statistics like the unemployment rate while the CPS ASEC data is used for official poverty, estimates, income distribution studies, and health insurance coverage statistics. We utilize the ASEC data to compile data on income and earnings for those working in the formal care economy.

The code below uses an IPUMS key to download the IPUMS micro_data files which include the information. IPUMS (Integrated Public Use Micro data Series) is a project that provides harmonized micro data from various national and international surveys and censuses. It is maintained by the Minnesota Population Center at the University of Minnesota. IPUMS makes large-scale individuals- and household-level databases more accessible and comparable over time and across geographic region.

The data available from IPUMS can be accessed through an API key. In order to replicate the code chunk below you will need to insert your personal key into the slot for set_ipums_api_key. For information on how to get a personal key visit https://www.ipums.org/. An IPUMS api key is free to the public and researchers.

IPUMS CPS data, including the ASEC supplements, can be cited as follows.

Sarah Flood, Miriam King, Renae Rodgers, Steven Ruggles, J. Robert Warren, Daniel Backman, Annie Chen, Grace Cooper, Stephanie Richards, Megan Schouweiler, and Michael Westberry. IPUMS CPS: Version 12.0 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D030.V12.0

After setting our API Key we need to define the variables and the samples we want to pull from. The vars vector lists all the variables we want to pull from the data while the samples vector uses a preliminary datasheet which lists all ASEC samples to pull the unique sample ids. These vectors are then inserted into a api extract request which is submitted to IPUMS. The request can take a few minutes to go through, and the wait_for_extract command should provide updates on when the extract is finished. We include a Sys.Sleep command in the code as there is a know issue with large api pulls where the code will move onto the next line faster than the api can download which can cause an issue. The Sys.Sleep command slows down the process.

##   |                                                                              |                                                                      |   0%  |                                                                              |==                                                                    |   2%  |                                                                              |====                                                                  |   5%  |                                                                              |===============                                                       |  22%  |                                                                              |======================================                                |  54%  |                                                                              |=============================================                         |  65%  |                                                                              |===================================================================== |  98%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |                                                                      |   1%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |==                                                                    |   4%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |=======                                                               |  11%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |=========                                                             |  14%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  23%  |                                                                              |================                                                      |  24%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |=======================                                               |  34%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  39%  |                                                                              |============================                                          |  40%  |                                                                              |============================                                          |  41%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |==============================                                        |  44%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |===================================                                   |  51%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |=====================================                                 |  54%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  56%  |                                                                              |========================================                              |  57%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |==========================================                            |  61%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |============================================                          |  64%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |=================================================                     |  71%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |===================================================                   |  74%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  76%  |                                                                              |======================================================                |  77%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |========================================================              |  81%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |==========================================================            |  84%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |===============================================================       |  91%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |=================================================================     |  94%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================|  99%  |                                                                              |======================================================================| 100%

summary(micro_data)

##       YEAR          SERIAL          MONTH       CPSID              ASECFLAG
##  Min.   :1990   Min.   :    1   Min.   :3   Min.   :0.000e+00   Min.   :1  
##  1st Qu.:2000   1st Qu.:22449   1st Qu.:3   1st Qu.:0.000e+00   1st Qu.:1  
##  Median :2008   Median :44685   Median :3   Median :1.999e+13   Median :1  
##  Mean   :2007   Mean   :45647   Mean   :3   Mean   :1.435e+13   Mean   :1  
##  3rd Qu.:2015   3rd Qu.:67543   3rd Qu.:3   3rd Qu.:2.010e+13   3rd Qu.:1  
##  Max.   :2024   Max.   :99986   Max.   :3   Max.   :2.024e+13   Max.   :1  
##                                                                            
##      HFLAG            ASECWTH            REGION         STATEFIP    
##  Min.   :0         Min.   :    0.0   Min.   :11.00   Min.   : 1.00  
##  1st Qu.:0         1st Qu.:  890.2   1st Qu.:21.00   1st Qu.:13.00  
##  Median :0         Median : 1534.2   Median :31.00   Median :28.00  
##  Mean   :0         Mean   : 1672.9   Mean   :28.22   Mean   :27.88  
##  3rd Qu.:1         3rd Qu.: 2187.7   3rd Qu.:41.00   3rd Qu.:41.00  
##  Max.   :1         Max.   :28654.3   Max.   :42.00   Max.   :56.00  
##  NA's   :6007501                                                    
##      PERNUM           CPSIDP              CPSIDV              ASECWT       
##  Min.   : 1.000   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :    0.0  
##  1st Qu.: 1.000   1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:  892.8  
##  Median : 2.000   Median :1.999e+13   Median :1.999e+14   Median : 1547.9  
##  Mean   : 2.261   Mean   :1.435e+13   Mean   :1.435e+14   Mean   : 1706.9  
##  3rd Qu.: 3.000   3rd Qu.:2.010e+13   3rd Qu.:2.010e+14   3rd Qu.: 2242.2  
##  Max.   :26.000   Max.   :2.024e+13   Max.   :2.024e+14   Max.   :44423.8  
##                                                                            
##       AGE             SEX             RACE           MARST      
##  Min.   : 0.00   Min.   :1.000   Min.   :100.0   Min.   :1.000  
##  1st Qu.:16.00   1st Qu.:1.000   1st Qu.:100.0   1st Qu.:1.000  
##  Median :34.00   Median :2.000   Median :100.0   Median :4.000  
##  Mean   :35.21   Mean   :1.515   Mean   :155.8   Mean   :3.702  
##  3rd Qu.:52.00   3rd Qu.:2.000   3rd Qu.:100.0   3rd Qu.:6.000  
##  Max.   :90.00   Max.   :2.000   Max.   :830.0   Max.   :6.000  
##                                                                 
##      MOMLOC            POPLOC           FAMSIZE           NCHILD      
##  Min.   : 0.0000   Min.   : 0.0000   Min.   : 1.000   Min.   :0.0000  
##  1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: 2.000   1st Qu.:0.0000  
##  Median : 0.0000   Median : 0.0000   Median : 3.000   Median :0.0000  
##  Mean   : 0.5859   Mean   : 0.4155   Mean   : 3.413   Mean   :0.6187  
##  3rd Qu.: 1.0000   3rd Qu.: 1.0000   3rd Qu.: 4.000   3rd Qu.:1.0000  
##  Max.   :17.0000   Max.   :18.0000   Max.   :25.000   Max.   :9.0000  
##                                                                       
##      YNGCH           HISPAN          EMPSTAT         OCC2010    
##  Min.   : 0.00   Min.   :  0.00   Min.   : 0.00   Min.   :  10  
##  1st Qu.:17.00   1st Qu.:  0.00   1st Qu.:10.00   1st Qu.:4510  
##  Median :99.00   Median :  0.00   Median :10.00   Median :9999  
##  Mean   :70.64   Mean   : 42.44   Mean   :14.61   Mean   :7179  
##  3rd Qu.:99.00   3rd Qu.:  0.00   3rd Qu.:32.00   3rd Qu.:9999  
##  Max.   :99.00   Max.   :902.00   Max.   :36.00   Max.   :9999  
##                                                                 
##    UHRSWORKT        AHRSWORKT       ABSENT          WHYABSNT      
##  Min.   :  0.0    Min.   :  1   Min.   :0.0000   Min.   : 0.0000  
##  1st Qu.: 40.0    1st Qu.: 40   1st Qu.:0.0000   1st Qu.: 0.0000  
##  Median :997.0    Median :999   Median :0.0000   Median : 0.0000  
##  Mean   :516.6    Mean   :570   Mean   :0.3482   Mean   : 0.1254  
##  3rd Qu.:999.0    3rd Qu.:999   3rd Qu.:1.0000   3rd Qu.: 0.0000  
##  Max.   :999.0    Max.   :999   Max.   :3.0000   Max.   :15.0000  
##  NA's   :627549                                                   
##      WKSTAT           EDUC            EARNWT         INCWAGE        
##  Min.   :10.00   Min.   :  1.00   Min.   :    0   Min.   :       0  
##  1st Qu.:11.00   1st Qu.: 20.00   1st Qu.:    0   1st Qu.:       0  
##  Median :99.00   Median : 73.00   Median :    0   Median :   25000  
##  Mean   :59.31   Mean   : 61.79   Mean   : 1366   Mean   :23346325  
##  3rd Qu.:99.00   3rd Qu.: 91.00   3rd Qu.:    0   3rd Qu.:  127000  
##  Max.   :99.00   Max.   :125.00   Max.   :85013   Max.   :99999999  
##                                                                     
##     POVERTY     
##  Min.   : 0.00  
##  1st Qu.:23.00  
##  Median :23.00  
##  Mean   :21.14  
##  3rd Qu.:23.00  
##  Max.   :23.00  
##

The data above is raw in that it is the direct results from the coded survey. Many of the categorical variables in this data are represented in numeric format as opposed to the actual labels that we want. We therefore need to spend some time cleaning the data to make the categories more meaningful. We provide code below that cleans all the data. The cleaning of the data goes beyond simply transforming numeric variables into an appropriate label. For some variables we combine categories together. For other variables we change how NAs are coded. For anyone with a specific question about a single variable, we recommend reviewing this code and the Data Dictionary table provided.

ASEC Data Dictionary

Variable	Type	Description	Example Values
statefip	Factor	State FIPS code	20 (Kansas), 06 (California)
region	Factor	Census-defined region	“Midwest”, “South”, “Northeast”
nchild	Numeric	Number of own children in household	0, 1, 2, 3
famsize	Numeric	Total number of people in family	1, 2, 4, 6
month	Factor	Month of survey (January-December)	“January”, “July”, “December”
hispan	Factor	Hispanic origin classification	“Not Hispanic”, “Hispanic”, “NIU”
race	Factor	Race classification	“White”, “Black”, “Asian/Pacific”
race_ethnicity	Factor	Race/Ethnicity combined category	“Hispanic”, “White”, “Black”
marst	Factor	Marital status	“Married”, “Single-Never-Married”
sex	Factor	Gender classification	“Male”, “Female”
absent	Factor	Employment absence status	“No”, “Yes, Laid Off”
whyabsnt	Factor	Reason for absence from work	“Vacation”, “Own illness”, “Other”
educ	Factor	Educational attainment	“No HS Diploma”, “Bachelor’s Degree”
empstat	Factor	Employment status	“Employed”, “Unemployed”, “NILF”
wkstat	Factor	Work status	“Full Time”, “Part Time”, “Unemployed”
laborstatus	Factor	Employment status combining empstat and wkstat	“Full Time”, “Unemployed”
poverty	Factor	Poverty status classification	“Below Poverty”, “150+ Percent”
age_category	Factor	Categorized age groups	“Under 18”, “Eighteen/Twenty-Four”
child_age	Factor	Categorized age groups of children	“Under Five”, “Twelve_Eighteen”
prime_age	Factor	Prime working age classification	“Under Twenty-Five”, “Prime Age”
pernum	Numeric	Person number within household	1, 2, 3
momloc	Numeric	Mother’s location in household	1 (Mother present), NA (No Mother)
date	Date	Survey date (YYYY-MM-DD)	“2023-03-01”, “2022-07-01”
care_job	Factor	Indicator for care-related occupations	“Care_Occ”, “NonCare_Occ”
health_care_occ	Factor	Indicator for healthcare occupations	“HealthCare_Occ”, “NonHealthCare_Occ”
education_occ	Factor	Indicator for education-related occupations	“Education_Occ”, “NonEducation_Occ”
social_services_occ	Factor	Indicator for social service occupations	“Social_Occ”, “NonSocial_Occ”
child_care_occ	Factor	Indicator for childcare occupations	“Childcare_Occ”, “NonChildcare_Occ”
death_care_occ	Factor	Indicator for death care occupations	“DeathCare_Occ”, “NonDeathCare_Occ”
personal_care	Factor	Indicator for personal care occupations	“PersonalCare_Occ”, “NonPersonalCare”
self_care	Factor	Indicator for self-care occupations	“Selfcare_Occ”, “NonSelfcare_Occ”
food_care	Factor	Indicator for food-related care occupations	“FoodCare_Occ”, “NonFoodCare_Occ”
care_focus	Factor	Type of care focus	“health”, “developmental”, “daily_living”
care_status	Factor	Type of care giver	“mother”, “father”

month_map <- c("1" = "January", "2" = "February", "3" = "March", "4" = "April",
               "5" = "May", "6" = "June", "7" = "July", "8" = "August", 
               "9" = "September", "10" = "October", "11" = "November", "12" = "December")

hispan_map <- c("0" = "Not Hispanic", "901" = "NIU", "902" = "NIU")
race_map <- c("100" = "White", "200" = "Black", "300" = "American Indian",
              "651" = "Asian/Pacific Island", "652" = "Asian/Pacific Island",
              "999" = "NIU")
marst_map <- c("1" = "Married", "2" = "Married",
               "3" = "Separated, Widowed, or Divorced", "4" = "Separated, Widowed, or Divorced",
               "5" = "Separated, Widowed, or Divorced", "6" = "Single-Never-Married",
               "7" = "Separated, Widowed, or Divorced", "9" = "NIU")
sex_map <- c("1" = "Male", "2" = "Female", "9" = "NIU")
absent_map <- c("0" = "NIU", "1" = "No", "2" = "Yes, Laid Off", "3" = "Yes, Other")
whyabsnt_map <- c("0" = "NIU", "5" = "Vacation/Personal days", "6" = "Own illness/medical problem",
                  "7" = "Care Reason", "8" = "Care Reason", "9" = "Care Reason",
                  "10" = "Non-Care Reason", "11" = "Non-Care Reason", "12" = "Non-Care Reason",
                  "13" = "Non-Care Reason", "15" = "Other")

educ_map <- c("1" = "NIU", "2" = "No HS Diploma", "11" = "No HS Diploma", "12" = "No HS Diploma",
              "13" = "No HS Diploma", "14" = "No HS Diploma", "20" = "No HS Diploma",
              "21" = "No HS Diploma", "22" = "No HS Diploma", "30" = "No HS Diploma",
              "31" = "No HS Diploma", "32" = "No HS Diploma", "40" = "No HS Diploma",
              "50" = "No HS Diploma", "60" = "No HS Diploma", "70" = "No HS Diploma",
              "72" = "No HS Diploma", "73" = "High School", "80" = "Some College",
              "110" = "Some College", "111" = "Bachelor's Degree", "121" = "Graduate Degree",
              "122" = "Graduate Degree", "123" = "Graduate Degree", "124" = "Graduate Degree",
              "125" = "Graduate Degree")

empstat_map <- c("0" = "NIU", "1" = "Armed Forces", "10" = "Employed", "12" = "Employed",
                 "20" = "Unemployed", "21" = "Unemployed", "22" = "Unemployed",
                 "30" = "NILF", "31" = "NILF", "32" = "NILF", "33" = "NILF",
                 "34" = "NILF", "35" = "NILF", "36" = "NILF")

wkstat_map <- c("10" = "Full Time", "11" = "Full Time", "12" = "Full Time",
                "13" = "Full Time", "14" = "Full Time", "15" = "Full Time",
                "20" = "Part Time", "21" = "Part Time", "22" = "Part Time",
                "40" = "Part Time", "41" = "Part Time", "42" = "Part Time",
                "50" = "Unemployed", "60" = "Unemployed", "99" = "NIU")

poverty_map <- c("0" = "NIU", "10" = "Below Poverty", "20" = "Above Poverty",
                 "21" = "100-124 Percent of Poverty", "22" = "125-149 Percent of Poverty",
                 "23" = "150+ Percent of Poverty")

# Transform micro_data
micro_data <- micro_data %>%
  # Convert categorical variables to factors
  mutate(
    statefip = as_factor(STATEFIP),
    region = as_factor(REGION),
    
    # Convert numeric variables
    nchild = as.numeric(NCHILD),
    famsize = as.numeric(FAMSIZE),
    
    # Recode categorical variables using predefined maps
    month = recode(as.character(MONTH), !!!month_map),
    hispan = recode(as.character(HISPAN), !!!hispan_map, .default = "Hispanic"),
    race = recode(as.character(RACE), !!!race_map, .default = "Two or More Races"),
    race_ethnicity = ifelse(hispan == "Hispanic", "Hispanic", race),
    marst = recode(as.character(MARST), !!!marst_map),
    sex = recode(as.character(SEX), !!!sex_map),
    absent = recode(as.character(ABSENT), !!!absent_map),
    whyabsnt = recode(as.character(WHYABSNT), !!!whyabsnt_map),
    educ = recode(as.numeric(EDUC), !!!educ_map, .default = as.character(EDUC)),
    empstat = recode(as.character(EMPSTAT), !!!empstat_map),
    wkstat = recode(as.character(WKSTAT), !!!wkstat_map),
    laborstatus = ifelse(empstat == "Employed", wkstat, empstat),
    poverty = recode(as.character(POVERTY), !!!poverty_map),
    
    # Create new categorical age group variables
    age_category = case_when(
      AGE < 18 ~ "Under 18",
      AGE >= 18 & AGE < 25 ~ "Eighteen/Twenty-Four",
      AGE >= 25 & AGE < 35 ~ "Twenty-Five/Thirty-Five",
      AGE >= 35 & AGE < 45 ~ "Thirty-Five/Forty-Five",
      AGE >= 45 & AGE < 55 ~ "Forty-Five/Fifty-Five",
      AGE >= 55 & AGE < 65 ~ "Fifty-Five/Sixty-Five",
      AGE >= 65 ~ "Sixty-Five Plus"
    ),
    child_age = case_when(
      YNGCH < 5 ~ "Under Five",
      YNGCH >= 5 & YNGCH < 12 ~ "Five_Eleven",
      YNGCH >= 12 & YNGCH < 18 ~ "Twelve_Eighteen",
      YNGCH >= 18 & YNGCH < 99 ~ "Eighteen Plus",
      YNGCH == 99 ~ "NIU"
    ),
    prime_age = case_when(
      AGE < 25 ~ "Under Twenty-Five",
      AGE >= 25 & AGE < 55 ~ "Prime Age",
      AGE >= 55 ~ "Fifty-Five Plus"
    ),
    
    # Convert numeric variables
    pernum = as.numeric(PERNUM),
    momloc = as.numeric(MOMLOC),
    
    # Create formatted date variable
    date = as.Date(paste(YEAR, month, "01", sep = "-"), "%Y-%B-%d")
  )

micro_data <- micro_data %>%
  select(-STATEFIP, -REGION, -NCHILD, -FAMSIZE, -MONTH, -HISPAN, -RACE, 
         -MARST, -SEX, -ABSENT, -WHYABSNT, -EDUC, -EMPSTAT, -WKSTAT, 
         -POVERTY, -PERNUM, -MOMLOC, -POPLOC)

summary(micro_data)

##       YEAR          SERIAL          CPSID              ASECFLAG
##  Min.   :1990   Min.   :    1   Min.   :0.000e+00   Min.   :1  
##  1st Qu.:2000   1st Qu.:22449   1st Qu.:0.000e+00   1st Qu.:1  
##  Median :2008   Median :44685   Median :1.999e+13   Median :1  
##  Mean   :2007   Mean   :45647   Mean   :1.435e+13   Mean   :1  
##  3rd Qu.:2015   3rd Qu.:67543   3rd Qu.:2.010e+13   3rd Qu.:1  
##  Max.   :2024   Max.   :99986   Max.   :2.024e+13   Max.   :1  
##                                                                
##      HFLAG            ASECWTH            CPSIDP              CPSIDV         
##  Min.   :0         Min.   :    0.0   Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:0         1st Qu.:  890.2   1st Qu.:0.000e+00   1st Qu.:0.000e+00  
##  Median :0         Median : 1534.2   Median :1.999e+13   Median :1.999e+14  
##  Mean   :0         Mean   : 1672.9   Mean   :1.435e+13   Mean   :1.435e+14  
##  3rd Qu.:1         3rd Qu.: 2187.7   3rd Qu.:2.010e+13   3rd Qu.:2.010e+14  
##  Max.   :1         Max.   :28654.3   Max.   :2.024e+13   Max.   :2.024e+14  
##  NA's   :6007501                                                            
##      ASECWT             AGE            YNGCH          OCC2010    
##  Min.   :    0.0   Min.   : 0.00   Min.   : 0.00   Min.   :  10  
##  1st Qu.:  892.8   1st Qu.:16.00   1st Qu.:17.00   1st Qu.:4510  
##  Median : 1547.9   Median :34.00   Median :99.00   Median :9999  
##  Mean   : 1706.9   Mean   :35.21   Mean   :70.64   Mean   :7179  
##  3rd Qu.: 2242.2   3rd Qu.:52.00   3rd Qu.:99.00   3rd Qu.:9999  
##  Max.   :44423.8   Max.   :90.00   Max.   :99.00   Max.   :9999  
##                                                                  
##    UHRSWORKT        AHRSWORKT       EARNWT         INCWAGE        
##  Min.   :  0.0    Min.   :  1   Min.   :    0   Min.   :       0  
##  1st Qu.: 40.0    1st Qu.: 40   1st Qu.:    0   1st Qu.:       0  
##  Median :997.0    Median :999   Median :    0   Median :   25000  
##  Mean   :516.6    Mean   :570   Mean   : 1366   Mean   :23346325  
##  3rd Qu.:999.0    3rd Qu.:999   3rd Qu.:    0   3rd Qu.:  127000  
##  Max.   :999.0    Max.   :999   Max.   :85013   Max.   :99999999  
##  NA's   :627549                                                   
##          statefip                               region            nchild      
##  California  : 583792   South Atlantic Division    :1064521   Min.   :0.0000  
##  Texas       : 356065   Pacific Division           : 940960   1st Qu.:0.0000  
##  New York    : 309136   East North Central Division: 751093   Median :0.0000  
##  Florida     : 273960   Mountain Division          : 708900   Mean   :0.6187  
##  Illinois    : 208190   Middle Atlantic Division   : 665014   3rd Qu.:1.0000  
##  Pennsylvania: 198147   West North Central Division: 610567   Max.   :9.0000  
##  (Other)     :4277767   (Other)                    :1466002                   
##     famsize          month              hispan              race          
##  Min.   : 1.000   Length:6207057     Length:6207057     Length:6207057    
##  1st Qu.: 2.000   Class :character   Class :character   Class :character  
##  Median : 3.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 3.413                                                           
##  3rd Qu.: 4.000                                                           
##  Max.   :25.000                                                           
##                                                                           
##  race_ethnicity        marst               sex               absent         
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    whyabsnt             educ             empstat             wkstat         
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  laborstatus          poverty          age_category        child_age        
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   prime_age             pernum           momloc             date           
##  Length:6207057     Min.   : 1.000   Min.   : 0.0000   Min.   :1990-03-01  
##  Class :character   1st Qu.: 1.000   1st Qu.: 0.0000   1st Qu.:2000-03-01  
##  Mode  :character   Median : 2.000   Median : 0.0000   Median :2008-03-01  
##                     Mean   : 2.261   Mean   : 0.5859   Mean   :2007-07-22  
##                     3rd Qu.: 3.000   3rd Qu.: 1.0000   3rd Qu.:2015-03-01  
##                     Max.   :26.000   Max.   :17.0000   Max.   :2024-03-01  
##

The next thing that we want to do for this data involves transforming how occupations are treated. We use the OCC2010 code to label certain values as care occs or not. We specifically get these values relative to the Formal_Occupation_crosswalk document which was compiled. For more information on our decision making on specific care occupations, please see the appendix section on Labeling Care Activities. For now, the code below uses certain OCC2010 values to label them as care occupations or not and then subsets this further to specific subcategories as care occupations. Following this, the care_focus variable is creates which generates the three unique possible values for care giving. In the case where an occupation is not labeled as a care occupation, the care focus is set to “none”.

care_occ_codes <- c(230, 325, 420, 350, 310, 
                    3000:3655, 2200:2550, 2000:2060, 4000:4160, 
                    4460, 4465, 4500, 4510, 4520, 4600, 4610, 
                    4230, 4220, 4250, 4200, 4210, 4240)

health_care_codes <- c(3000:3655, 350)
education_codes <- c(2200:2550, 230)
social_services_codes <- c(2000:2060, 420)
child_care_codes <- 4600
death_care_codes <- c(4460, 4465, 325)
personal_care_codes <- 4610
self_care_codes <- 4200:4520
food_care_codes <- c(4000:4160, 310)

# Transform micro_data
micro_data <- micro_data %>%
  mutate(
    # General care job classification
    care_job = ifelse(OCC2010 %in% unlist(care_occ_codes), "Care_Occ", "NonCare_Occ"),
    
    # Specific care occupation categories
    health_care_occ = ifelse(OCC2010 %in% unlist(health_care_codes), "HealthCare_Occ", "NonHealthCare_Occ"),
    education_occ = ifelse(OCC2010 %in% unlist(education_codes), "Education_Occ", "NonEducation_Occ"),
    social_services_occ = ifelse(OCC2010 %in% unlist(social_services_codes), "Social_Occ", "NonSocial_Occ"),
    child_care_occ = ifelse(OCC2010 %in% child_care_codes, "Childcare_Occ", "NonChildcare_Occ"),
    death_care_occ = ifelse(OCC2010 %in% unlist(death_care_codes), "DeathCare_Occ", "NonDeathCare_Occ"),
    personal_care = ifelse(OCC2010 %in% personal_care_codes, "PersonalCare_Occ", "NonPersonalCare_Occ"),
    self_care = ifelse(OCC2010 %in% unlist(self_care_codes), "Selfcare_Occ", "NonSelfcare_Occ"),
    food_care = ifelse(OCC2010 %in% unlist(food_care_codes), "FoodCare_Occ", "NonFoodCare_Occ"),
    
    # Care focus classification
    care_focus = case_when(
      education_occ == "Education_Occ" ~ "developmental",
      social_services_occ == "Social_Occ" ~ "health",
      health_care_occ == "HealthCare_Occ" ~ "health",
      child_care_occ == "Childcare_Occ" ~ "developmental",
      death_care_occ == "DeathCare_Occ" ~ "daily_living",
      personal_care == "PersonalCare_Occ" ~ "health",
      self_care == "Selfcare_Occ" ~ "daily_living",
      food_care == "FoodCare_Occ" ~ "daily_living",
      TRUE ~ "none"
    )
  )

summary(micro_data)

##       YEAR          SERIAL          CPSID              ASECFLAG
##  Min.   :1990   Min.   :    1   Min.   :0.000e+00   Min.   :1  
##  1st Qu.:2000   1st Qu.:22449   1st Qu.:0.000e+00   1st Qu.:1  
##  Median :2008   Median :44685   Median :1.999e+13   Median :1  
##  Mean   :2007   Mean   :45647   Mean   :1.435e+13   Mean   :1  
##  3rd Qu.:2015   3rd Qu.:67543   3rd Qu.:2.010e+13   3rd Qu.:1  
##  Max.   :2024   Max.   :99986   Max.   :2.024e+13   Max.   :1  
##                                                                
##      HFLAG            ASECWTH            CPSIDP              CPSIDV         
##  Min.   :0         Min.   :    0.0   Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:0         1st Qu.:  890.2   1st Qu.:0.000e+00   1st Qu.:0.000e+00  
##  Median :0         Median : 1534.2   Median :1.999e+13   Median :1.999e+14  
##  Mean   :0         Mean   : 1672.9   Mean   :1.435e+13   Mean   :1.435e+14  
##  3rd Qu.:1         3rd Qu.: 2187.7   3rd Qu.:2.010e+13   3rd Qu.:2.010e+14  
##  Max.   :1         Max.   :28654.3   Max.   :2.024e+13   Max.   :2.024e+14  
##  NA's   :6007501                                                            
##      ASECWT             AGE            YNGCH          OCC2010    
##  Min.   :    0.0   Min.   : 0.00   Min.   : 0.00   Min.   :  10  
##  1st Qu.:  892.8   1st Qu.:16.00   1st Qu.:17.00   1st Qu.:4510  
##  Median : 1547.9   Median :34.00   Median :99.00   Median :9999  
##  Mean   : 1706.9   Mean   :35.21   Mean   :70.64   Mean   :7179  
##  3rd Qu.: 2242.2   3rd Qu.:52.00   3rd Qu.:99.00   3rd Qu.:9999  
##  Max.   :44423.8   Max.   :90.00   Max.   :99.00   Max.   :9999  
##                                                                  
##    UHRSWORKT        AHRSWORKT       EARNWT         INCWAGE        
##  Min.   :  0.0    Min.   :  1   Min.   :    0   Min.   :       0  
##  1st Qu.: 40.0    1st Qu.: 40   1st Qu.:    0   1st Qu.:       0  
##  Median :997.0    Median :999   Median :    0   Median :   25000  
##  Mean   :516.6    Mean   :570   Mean   : 1366   Mean   :23346325  
##  3rd Qu.:999.0    3rd Qu.:999   3rd Qu.:    0   3rd Qu.:  127000  
##  Max.   :999.0    Max.   :999   Max.   :85013   Max.   :99999999  
##  NA's   :627549                                                   
##          statefip                               region            nchild      
##  California  : 583792   South Atlantic Division    :1064521   Min.   :0.0000  
##  Texas       : 356065   Pacific Division           : 940960   1st Qu.:0.0000  
##  New York    : 309136   East North Central Division: 751093   Median :0.0000  
##  Florida     : 273960   Mountain Division          : 708900   Mean   :0.6187  
##  Illinois    : 208190   Middle Atlantic Division   : 665014   3rd Qu.:1.0000  
##  Pennsylvania: 198147   West North Central Division: 610567   Max.   :9.0000  
##  (Other)     :4277767   (Other)                    :1466002                   
##     famsize          month              hispan              race          
##  Min.   : 1.000   Length:6207057     Length:6207057     Length:6207057    
##  1st Qu.: 2.000   Class :character   Class :character   Class :character  
##  Median : 3.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 3.413                                                           
##  3rd Qu.: 4.000                                                           
##  Max.   :25.000                                                           
##                                                                           
##  race_ethnicity        marst               sex               absent         
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    whyabsnt             educ             empstat             wkstat         
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  laborstatus          poverty          age_category        child_age        
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   prime_age             pernum           momloc             date           
##  Length:6207057     Min.   : 1.000   Min.   : 0.0000   Min.   :1990-03-01  
##  Class :character   1st Qu.: 1.000   1st Qu.: 0.0000   1st Qu.:2000-03-01  
##  Mode  :character   Median : 2.000   Median : 0.0000   Median :2008-03-01  
##                     Mean   : 2.261   Mean   : 0.5859   Mean   :2007-07-22  
##                     3rd Qu.: 3.000   3rd Qu.: 1.0000   3rd Qu.:2015-03-01  
##                     Max.   :26.000   Max.   :17.0000   Max.   :2024-03-01  
##                                                                            
##    care_job         health_care_occ    education_occ      social_services_occ
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057     
##  Class :character   Class :character   Class :character   Class :character   
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character   
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  child_care_occ     death_care_occ     personal_care       self_care        
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   food_care          care_focus       
##  Length:6207057     Length:6207057    
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
##

The final step of cleaning and processing this data involves identifying people by care status. We identify 3 specific care status values, mothers, fathers, childless men, and childless women. We fully acknowledge that the status of care providers is far more complicated than this simple breakdown. However, we think that this breakdown is a useful starting point for analysis as it fits with common discourse on the topic of care giving which often breaks the discussion down into gender and parenthood dimensions.

For mothers and fathers, we identify them in the data based on their reported sex and the value of the YNGCH variable which represents the age of their youngest child. If YNGCH is a value that is in the sample (i.e. not 99 which is coded as NIU), then it means that this person reported living with their own child. YNGCH specifically reports the age of the youngest own child residing with an individual. This is an important decision as it means we only code someone as a mother or father if they live with their child. This excludes cases where a parent does not live with their child but nonetheless might be involved in care giving activities. The decision to split the data based on these values is not perfect, but given the limitations in the data it is currently the best we have. Future work should seek to further elaborate on these different care status dimensions.

An individual is identified as a childless man or a childless woman based on their reported sex and the fact that they are not present in the mothers and fathers category. In essence, we report someone as a childless man or childless woman if they are not identified as a mother or father. This ensures that the total population of each of these four categories adds up to the total population of the United States making these groups all inclusive but mutually exclusive.

Following this we bind the rows back together and have our final data frame representing the ASEC micro data.

micro_data <- micro_data %>%
  mutate(indiv_id = row_number())

mothers <- micro_data %>%
  filter(sex == "Female" & YNGCH != 99) %>%
  mutate(care_status = "Mothers")

fathers <- micro_data %>%
  filter(sex == "Male" & YNGCH != 99) %>%
  mutate(care_status = "Fathers")

men_no_child <- micro_data %>%
  filter(sex == "Male" & !indiv_id %in% fathers$indiv_id) %>%
  mutate(care_status = "Childless Men")

women_no_child <- micro_data %>%
  filter(sex == "Female" & !indiv_id %in% mothers$indiv_id) %>%
  mutate(care_status = "Childless Women")

micro_data <- bind_rows(mothers, fathers, men_no_child, women_no_child)

summary(micro_data)

##       YEAR          SERIAL          CPSID              ASECFLAG
##  Min.   :1990   Min.   :    1   Min.   :0.000e+00   Min.   :1  
##  1st Qu.:2000   1st Qu.:22449   1st Qu.:0.000e+00   1st Qu.:1  
##  Median :2008   Median :44685   Median :1.999e+13   Median :1  
##  Mean   :2007   Mean   :45647   Mean   :1.435e+13   Mean   :1  
##  3rd Qu.:2015   3rd Qu.:67543   3rd Qu.:2.010e+13   3rd Qu.:1  
##  Max.   :2024   Max.   :99986   Max.   :2.024e+13   Max.   :1  
##                                                                
##      HFLAG            ASECWTH            CPSIDP              CPSIDV         
##  Min.   :0         Min.   :    0.0   Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:0         1st Qu.:  890.2   1st Qu.:0.000e+00   1st Qu.:0.000e+00  
##  Median :0         Median : 1534.2   Median :1.999e+13   Median :1.999e+14  
##  Mean   :0         Mean   : 1672.9   Mean   :1.435e+13   Mean   :1.435e+14  
##  3rd Qu.:1         3rd Qu.: 2187.7   3rd Qu.:2.010e+13   3rd Qu.:2.010e+14  
##  Max.   :1         Max.   :28654.3   Max.   :2.024e+13   Max.   :2.024e+14  
##  NA's   :6007501                                                            
##      ASECWT             AGE            YNGCH          OCC2010    
##  Min.   :    0.0   Min.   : 0.00   Min.   : 0.00   Min.   :  10  
##  1st Qu.:  892.8   1st Qu.:16.00   1st Qu.:17.00   1st Qu.:4510  
##  Median : 1547.9   Median :34.00   Median :99.00   Median :9999  
##  Mean   : 1706.9   Mean   :35.21   Mean   :70.64   Mean   :7179  
##  3rd Qu.: 2242.2   3rd Qu.:52.00   3rd Qu.:99.00   3rd Qu.:9999  
##  Max.   :44423.8   Max.   :90.00   Max.   :99.00   Max.   :9999  
##                                                                  
##    UHRSWORKT        AHRSWORKT       EARNWT         INCWAGE        
##  Min.   :  0.0    Min.   :  1   Min.   :    0   Min.   :       0  
##  1st Qu.: 40.0    1st Qu.: 40   1st Qu.:    0   1st Qu.:       0  
##  Median :997.0    Median :999   Median :    0   Median :   25000  
##  Mean   :516.6    Mean   :570   Mean   : 1366   Mean   :23346325  
##  3rd Qu.:999.0    3rd Qu.:999   3rd Qu.:    0   3rd Qu.:  127000  
##  Max.   :999.0    Max.   :999   Max.   :85013   Max.   :99999999  
##  NA's   :627549                                                   
##          statefip                               region            nchild      
##  California  : 583792   South Atlantic Division    :1064521   Min.   :0.0000  
##  Texas       : 356065   Pacific Division           : 940960   1st Qu.:0.0000  
##  New York    : 309136   East North Central Division: 751093   Median :0.0000  
##  Florida     : 273960   Mountain Division          : 708900   Mean   :0.6187  
##  Illinois    : 208190   Middle Atlantic Division   : 665014   3rd Qu.:1.0000  
##  Pennsylvania: 198147   West North Central Division: 610567   Max.   :9.0000  
##  (Other)     :4277767   (Other)                    :1466002                   
##     famsize          month              hispan              race          
##  Min.   : 1.000   Length:6207057     Length:6207057     Length:6207057    
##  1st Qu.: 2.000   Class :character   Class :character   Class :character  
##  Median : 3.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 3.413                                                           
##  3rd Qu.: 4.000                                                           
##  Max.   :25.000                                                           
##                                                                           
##  race_ethnicity        marst               sex               absent         
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    whyabsnt             educ             empstat             wkstat         
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  laborstatus          poverty          age_category        child_age        
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   prime_age             pernum           momloc             date           
##  Length:6207057     Min.   : 1.000   Min.   : 0.0000   Min.   :1990-03-01  
##  Class :character   1st Qu.: 1.000   1st Qu.: 0.0000   1st Qu.:2000-03-01  
##  Mode  :character   Median : 2.000   Median : 0.0000   Median :2008-03-01  
##                     Mean   : 2.261   Mean   : 0.5859   Mean   :2007-07-22  
##                     3rd Qu.: 3.000   3rd Qu.: 1.0000   3rd Qu.:2015-03-01  
##                     Max.   :26.000   Max.   :17.0000   Max.   :2024-03-01  
##                                                                            
##    care_job         health_care_occ    education_occ      social_services_occ
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057     
##  Class :character   Class :character   Class :character   Class :character   
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character   
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  child_care_occ     death_care_occ     personal_care       self_care        
##  Length:6207057     Length:6207057     Length:6207057     Length:6207057    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   food_care          care_focus           indiv_id       care_status       
##  Length:6207057     Length:6207057     Min.   :      1   Length:6207057    
##  Class :character   Class :character   1st Qu.:1551765   Class :character  
##  Mode  :character   Mode  :character   Median :3103529   Mode  :character  
##                                        Mean   :3103529                     
##                                        3rd Qu.:4655293                     
##                                        Max.   :6207057                     
##

APPENDIX: LABELING CARE ACTIVITIES

Care Board Methodology

Joseph Bommarito

2025-03-19