Introduction to the Care Board Methodology

The Care Board is an online dashboard designed to present comprehensive statistics and insights on the care economy - a critical yet overlooked sector encompassing both paid and unpaid care activities. The care economy includes all tasks related to caring for oneself and others. This includes jobs like nursing, teaching, childcare, and assisting elderly relatives, among others. It also includes informal, unpaid care work we do in our homes like making dinner, caring for children, or washing clothes. These responsibilities form a significant portion of individuals’ daily lives, whether through professional roles or unpaid domestic labor.

Despite its essential role in sustaining individuals, families, and society, care work remains largely invisible within formal economic statistics. For instance, while the Bureau of Labor Statistics (BLS) tracks childcare provided by paid professionals, identical activities performed by parents and relatives remain unaccounted for. This discrepancy highlights the broader issue of how care work is valued and measured within traditional economic frameworks.

The Care Board aims to bridge this gap by providing a centralized platform for measuring, analyzing, and studying caregiving activities in both the formal and informal sectors. By developing novel statistics from publicly available data, the Care Board seeks to foster meaningful discussions among researchers, policymakers, and the general public, bringing greater visibility to the challenges faced by caregivers nationwide.

This document serves as the primary detailed methodology and repository for all statistics presented on The Care Board. The statistics developed for The Care Board offer a new perspective on the economy through the lens of care. Where applicable, links to working papers or peer-reviewed articles will be provided.

All data presented and available for download on The Care Board, along with code and necessary information for replication, are discussed in this document. Each section guides users through the formation of a given statistic, from raw data to its final presentation. Any methodological choices, hurdles, and assumptions are documented for transparency.

To use this document, navigate to the section of the statistic in which you are most interested in. In each section, you will find the raw data input requirements, code, and relevant explanations. If using any data or code on the Care Board, please ensure proper attribution. Publications and reports should cite the appropriate version of the following:

Misty L. Heggeness, Joseph Bommarito, and Lucie Prewitt. The Care Board: Version 1.0 [dataset]. Lawrence, KS: Kansas Population Center, 2025. https://thecareboard.org

Preliminary tasks

Before running any code, the following preliminary tasks will need to be done. The code provided at the beginning of this document must be run before any code in any other section. This code installs relative packages and sets the working directory to be used by all other sections.

Failure to run this code may result in errors.

Ensure that the working directory is updated to fit your data file location. Changing the working directory is needed to successfully run the code in this document.

The first step is to install the required packages. While some statistics require some specific packages to run, other packages are needed for more general data handling. These packages are loaded and described below.

Packages

pacman: is a package used to load other packages. This package checks to see if the other package is installed on the user’s computer. In the case it is not installed, pacman will install it prior to loading it from the library. In the case it is installed, pacman will skip installation and load the package directly from the library.
tidyverse: is a commonly used data handling package in the R environment. Tidyverse is used to provide more streamlined and readable coding with the goal of allowing easier access to replication files. Whenever possible, code in this document is conducted via the tidyverse methodology as opposed to base R.
haven: is a package used for reading and writing certain data formats. For the purpose of this documentation, this package is mostly used for the purpose of writing datafiles as STATA .dta files.
data.table: is a package used to efficiently load and write csv files. Large csv files can be resource intensive to load in as a dataset. This package allows them to be loaded in as a table and then worked with directly in the R environment.
Hmisc: is used to handle survey research and is primarily used in the below code to apply survey weights to statistics creating population valid estimates.
janitor: provides simple functions for cleaning and formatting data, especially useful for cleaning column names, detecting and removing empty rows or columns, and summarizing tabular data.
DT: The DT package is used exclusively for this RMD file and is used to provide more readable tables that can be viewed of the data within the HTML output.
skimr: Is a package used specifically for this RMD file to provide descriptions of the appropriate datasets.
DescTools Provides a variety of functions used to describe datasets, most noteably we utilize the Gini command in this package.

Working Directory

To load data, you must set the working directory to the file location where your data is stored. This code utilizes multiple folders based out of a single CareBoard directory. Set your working directory to a general folder where the folders you download will be stored. The code in this document will switch between directories as needed given the assumption they are all in the correct repository. This step is required to execute the code without errors.

Data Processing

Before developing the specific statistics needed for the care board, the raw microdata files need to be compiled and converted into a proper format. The code in this section provides a methodology to pull in all required data, clean it as necessary, and export it to the required locations. This project uses a wide variety of data to compile its variety of statistics but the core data represents micro data from annual surveys conducted by the census bureau and the bureau of labor statistics. The monthly Current Population Survey (CPS) along with the yearly Annual Social and Economic Supplement (ASEC) and the annual American Time Use Survey (ATUS) are the largest used data sources for this project. We pull these data from the Minnesota Population Center (MPC) Integrated Public Use Microdata Series (IPUMS) project. The first step is introducing how these data are loaded in and transformed.

CPS ASEC Data Download

The code below produces a major source of raw data used in the production of The Care Board statistics: CPS monthly and CPS ASEC yearly microdata. The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) is a special supplement to the monthly Current Population Survey (CPS), which is conducted by the U.S. Census Bureau and the Bureau of Labor Statistics (BLS). The monthly CPS is primarily focused on labor force characteristics, such as employment, unemployment, and workforce participation. The CPS ASEC goes beyond this by collecting detailed information on income, poverty, health insurance coverage, and demographic characteristics, making it the primary source of data for measuring income inequality and economic well-being in the U.S.

The monthly CPS is a regular survey of around 60,000 households conducted every month. The CPS ASEC is conducted once a year, typically in March, and includes both regular CPS respondents and additional over sampled households to improve estimates for specific population groups. The CPS ASEC expands the sample size compared to the monthly CPS by including additional households to improve data accuracy, especially for poverty and income statistics. The CPS Monthly data is used for labor force statistics like the unemployment rate while the CPS ASEC data is used for official poverty, estimates, income distribution studies, and health insurance coverage statistics. We utilize the CPS ASEC data to compile data on income and earnings for those working in the formal care economy.

The code below uses an IPUMS API key to download the IPUMS microdata files, which include the relevant information needed to develop statistics for The Care Board. IPUMS (Integrated Public Use Microdata Series) is a project that provides harmonized microdata from various national and international surveys and censuses. It is maintained by the Minnesota Population Center at the University of Minnesota. IPUMS makes large-scale individual- and household-level databases more accessible and comparable over time and across geographic regions.

The data available from IPUMS can be accessed through an API key. In order to replicate the code chunk below you will need to insert your personal key into the slot for set_ipums_api_key. For information on how to get a personal key visit https://www.ipums.org/. An IPUMS API key is free to the public and researchers.

IPUMS CPS data, including the ASEC supplements, can be cited as follows.

Sarah Flood, Miriam King, Renae Rodgers, Steven Ruggles, J. Robert Warren, Daniel Backman, Annie Chen, Grace Cooper, Stephanie Richards, Megan Schouweiler, and Michael Westberry. IPUMS CPS: Version 12.0 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D030.V12.0

When downloading from this repository, we need to first identify the sample ids that represent the required samples. A set of files in the 01_preliminary-code-and-data file store the names of the required samples for both the asec and the CPS monthly data pulls. The key difference between these sample lists is the asec list contains only data from the yearly CPS ASEC data while the cps list contains all monthly sample iterations.

Following the creation of the samples, we also need to list a set of variables that we want to pull from the IPUMS repository. We create three set of variables. var_common refers specifically to var iables that are present in both the CPS ASEC and the CPS Monthly data. var_asec refers specifically to variables that are present only in the CPS ASEC data. These variables refer mostly to income and earnings data. var_cps refers specifically to variables that are present in the CPS Monthly data alone. These variables refer mostly to workforce classification variables. The chunk below populations these lists.

The code below uses the IPUMS API interface to pull data from the yearly CPS ASEC and the CPS Monthly. This R chunk is currently set to NOT run when this markdown file is run. The final results of this data extract can be found in the GITHUB. The code does not run so that all the following code uses the correct iteration or the API data. If you wish to modify the data downloaded from IPUMS simply change any of the samples or variables as you desire and then run the below chunk in an R script. If you wish to simply replicate the data conducted by The Care Board, you can skip this step and load in the ddi files already in the GITHUB repository.

ATUS Data Note

As a note, the IPUMS data API does not currently fully support the download of ATUS data. We provide the xml and .dat.gz file associated with the data in the GITHUB repository for The Care Board. To modify the ATUS data download by changing samples or variables, users will need to conduct a manual extract from IPUMS. The interface to conduct this manual extract along with instructions can be found as follows https://timeuse.ipums.org/. IPUMS kindly requests that usage of this data be cited as follows.

Sarah M. Flood, Liana C. Sayer, Daniel Backman, and Annie Chen. American Time Use Survey Data Extract Builder: Version 3.2 [dataset]. College Park, MD: University of Maryland and Minneapolis, MN: IPUMS, 2023. https://doi.org/10.18128/D060.V3.2

Data Prep Functions

After running the above code OR by downloading the data from the GITHUB repository, we should have three sets of .xml and .dat.gz files. These files represent meta data and zipped downloads of the microdata from IPUMS. The code below is used to load this data into the R environment.

After loading the data into the R environment, we get the variable labels for each of the files. Before creating the statistics, we need to clean the data and ensure consistency between different samples. The CPS Monthly and CPS ASEC data often have variables that measure the same thing as the ATUS data but are coded slightly differently. Thus, we need to ensure that all variables are coded correctly. This section does that while also providing information on the variety of variables throughout the samples.

Asec Raw Variables

The tables below represent the different variables gathered from the CPS ASEC data. These variables are so far not tampered with and represent the exact values received when downloaded directly from the IPUMS repository.

##  [1] "Survey year"                                           
##  [2] "Household serial number"                               
##  [3] "Month"                                                 
##  [4] "CPSID, household record"                               
##  [5] "Flag for ASEC"                                         
##  [6] "Flag for the 3/8 file 2014"                            
##  [7] "Annual Social and Economic Supplement Household weight"
##  [8] "Region and division"                                   
##  [9] "State (FIPS code)"                                     
## [10] "Person number in sample unit"                          
## [11] "CPSID, person record"                                  
## [12] "Validated Longitudinal Identifier"                     
## [13] "Annual Social and Economic Supplement Weight"          
## [14] "Age"                                                   
## [15] "Sex"                                                   
## [16] "Race"                                                  
## [17] "Marital status"                                        
## [18] "Person number of first mother (from programming)"      
## [19] "Person number of first father (from programming)"      
## [20] "Number of own family members in hh"                    
## [21] "Number of own children in household"                   
## [22] "Age of youngest own child in household"                
## [23] "Hispanic origin"                                       
## [24] "Employment status"                                     
## [25] "Occupation, 2010 basis"                                
## [26] "Industry, 1990 basis"                                  
## [27] "Hours usually worked per week at all jobs"             
## [28] "Hours worked last week"                                
## [29] "Absent from work last week"                            
## [30] "Reason for absence from work"                          
## [31] "Full or part time status"                              
## [32] "Educational attainment recode"                         
## [33] "Earnings weight"                                       
## [34] "Wage and salary income"                                
## [35] "Original poverty status (PUMS original)"

CPS Raw Variables

The tables below represent the different variables gathered from the CPS Monthly data. These variables are so far not tampered with and represent the exact values received when downloaded directly from the IPUMS repository.

##  [1] "Survey year"                                               
##  [2] "Household serial number"                                   
##  [3] "Month"                                                     
##  [4] "Household weight, Basic Monthly"                           
##  [5] "CPSID, household record"                                   
##  [6] "Flag for ASEC"                                             
##  [7] "Region and division"                                       
##  [8] "State (FIPS code)"                                         
##  [9] "Person number in sample unit"                              
## [10] "Final Basic Weight"                                        
## [11] "CPSID, person record"                                      
## [12] "Validated Longitudinal Identifier"                         
## [13] "Age"                                                       
## [14] "Sex"                                                       
## [15] "Race"                                                      
## [16] "Marital status"                                            
## [17] "Person number of first mother (from programming)"          
## [18] "Person number of first father (from programming)"          
## [19] "Person number of spouse (from programming)"                
## [20] "Number of own family members in hh"                        
## [21] "Number of own children in household"                       
## [22] "Age of youngest own child in household"                    
## [23] "Hispanic origin"                                           
## [24] "Employment status"                                         
## [25] "Labor force status"                                        
## [26] "Occupation, 2010 basis"                                    
## [27] "Industry, 1990 basis"                                      
## [28] "Class of worker "                                          
## [29] "Hours worked last week"                                    
## [30] "Absent from work last week"                                
## [31] "Reason for absence from work"                              
## [32] "Full or part time status"                                  
## [33] "Major activity (NILF)"                                     
## [34] "Educational attainment recode"                             
## [35] "Personal care limitation"                                  
## [36] "Composite Weight for replicating BLS labor force estimates"
## [37] "In the last week, telework or work at home for pay"

ATUS Raw Variables

The tables below represent the different variables gathered from the ATUS data. These variables are so far not tampered with and represent the exact values received when downloaded directly from the IPUMS repository.

##  [1] "Survey year"                                                                              
##  [2] "ATUS Case ID"                                                                             
##  [3] "Household serial number"                                                                  
##  [4] "Scrambled pseudo primary sampling unit (PSU) collapsed stratum "                          
##  [5] "FIPS State Code"                                                                          
##  [6] "Number of people in household"                                                            
##  [7] "Family income"                                                                            
##  [8] "Number of adults in household"                                                            
##  [9] "Time first household child woke up"                                                       
## [10] "Time last household child went to bed"                                                    
## [11] "Household income greater or less than 185% of poverty level (EHM)"                        
## [12] "Person number (general)"                                                                  
## [13] "Person line number"                                                                       
## [14] "ATUS interview day of the week"                                                           
## [15] "Person weight, 2006 methodology"                                                          
## [16] "Person weight, 2020 methodology"                                                          
## [17] "Age"                                                                                      
## [18] "Sex"                                                                                      
## [19] "Race"                                                                                     
## [20] "Hispanic origin"                                                                          
## [21] "Marital status"                                                                           
## [22] "Highest level of school completed"                                                        
## [23] "Labor force status"                                                                       
## [24] "General occupation category, main job"                                                    
## [25] "Detailed occupation category, main job (CPS)"                                             
## [26] "Weekly earnings"                                                                          
## [27] "Hours worked last week (CPS)"                                                             
## [28] "Employment status (spouse or partner)"                                                    
## [29] "Unique Longitudinal CPS Identifier"                                                       
## [30] "Eldercare provided in last 3 months"                                                      
## [31] "Age of youngest own child (from programming)"                                             
## [32] "Number of own children (from programming)"                                                
## [33] "Activity line number"                                                                     
## [34] "Activity"                                                                                 
## [35] "Duration of activity (extended version)"                                                  
## [36] "Duration of activity"                                                                     
## [37] "Time spent during activity on secondary child care of all children"                       
## [38] "Time spent during activity on secondary child care of own children"                       
## [39] "Time spent during activity on secondary eldercare for household and non-household members"
## [40] "Activity start time"                                                                      
## [41] "Activity stop time"

The function below presents a methodology for comparing variables between data. This function takes the variable name in each data set and compares the value and labels together. For example, each data set has a variable for Hispanic, but they code this data slightly differently. The check_lookups function will, when the variable Hispanic is inserted, look at the different values to check for consistency. In the case that different samples have different values, we will need to recode them before moving forward.

Creation of new variables

In addition to the variables generated directly from IPUMS, we create a few other variables of interest. These variables represent recoding numeric variables into categorical variables or combining multiple variables into a single variable for analysis. Each of these is coded specifically for The Care Board project.

Age Category

This variable represents groupings of ages of individual respondents and acts as a categorical classifier for the different ages. For those under the age of 18, the category “Under 18” is used. For those over the age of sixty-five, the category “Over65” is used. For all other categories other than the 18-24, this variable represents ten-year increments.

Prime Age

This variable identifies individuals who meet the labor economics definition of being in a “prime age” bin. Labor economists define prime age as those individuals aged 25 to 54. This age category represents people who tend to be most productive within the workforce. The ages are typically after higher education and before retirement.

Child Age

This variable looks within a house and identifies the age of the youngest child, putting that value within age bins. These age bins are under 5, 5-11, and 12-17 representing different aspects of a child’s growth. An additional category of eighteen plus represents adult children living with their parents while the variable NIU represents households without any children. This variable can be used as a categorical variables instead of the numeric child age variable when desired.

Gender_parent

This variable represents an interaction between the sex and parenthood status of an individual. This can be one of four unique values representing both the case where a respondent is male or female and the case where the respondent is a parent living with their children in the home or childless (including parents whose children live elsewhere). Parenthood includes step-parents and parents of both biological and adopted children.

Race Ethnicity

This variable coalesces the race variable and the Hispanic variable to create a single value of race_ethnicity. It is common practice to merge these variables adding a person of Hispanic origin to the race categories as a separate observation from other races. In the case where the respondent is not Hispanic, then this variable represents their reported survey race.

Laborstatus

This variable represents a combination of the variables wkstat and empstat. The variable empstat identifies a respondent’s labor force participation status as in the labor force, unemployed, or not in the labor force (NILF). The variable wkstat identifies a worker as full- or part-time on the condition that they are in the labor force. Labor status has four unique categories of full-time, part-time, unemployed, and NILF.

Month

This variable provides the name of the month as opposed to a numerical representation of the month for easier readability. This variable is most important for the CPS Monthly data that has monthly iterations of the data as opposed to the yearly CPS ASEC and ATUS.

Catagorical Variable Comparisons

Now that we have coded our major categorical variables, we need to also ensure they are coded the same between the different surveys. To do this we use the lookup_compare function that we created previously. Using this function, we see the values of the different variables in the CPS ASEC and ATUS data.

For each variable the lookup_compare function provides the value as coded in CPS ASEC, CPS Monthly, and ATUS. For CPS ASEC and CPS Monthly this is generally the same, but for ATUS is often different. For example, the first value we look at is the variable representing Hispanic origin.

Hispanic

## # A tibble: 30 × 4
##      val lbl_asec           lbl_cps            lbl_atus    
##    <dbl> <chr>              <chr>              <chr>       
##  1     0 Not Hispanic       Not Hispanic       <NA>        
##  2   100 Mexican            Mexican            Not Hispanic
##  3   102 Mexican American   Mexican American   <NA>        
##  4   103 Mexicano/Mexicana  Mexicano/Mexicana  <NA>        
##  5   104 Chicano/Chicana    Chicano/Chicana    <NA>        
##  6   108 Mexican (Mexicano) Mexican (Mexicano) <NA>        
##  7   109 Mexicano/Chicano   Mexicano/Chicano   <NA>        
##  8   200 Puerto Rican       Puerto Rican       <NA>        
##  9   300 Cuban              Cuban              <NA>        
## 10   400 Dominican          Dominican          <NA>        
## # ℹ 20 more rows

As can be seen, this variable is coded in more detail in the CPS Monthly data and the ATUS data. We thus need to ensure all data is coded in the same format. The functions below provide the methodology for converting the data in ATUS and the CPS Monthly data into the final values for the Hispanic origin data.

## # A tibble: 30 × 6
##      val lbl_asec           lbl_cps            hispan_cps   lbl_atus hispan_atus
##    <dbl> <chr>              <chr>              <chr>        <chr>    <chr>      
##  1     0 Not Hispanic       Not Hispanic       Not Hispanic <NA>     Hispanic   
##  2   100 Mexican            Mexican            Hispanic     Not His… Not Hispan…
##  3   102 Mexican American   Mexican American   Hispanic     <NA>     Hispanic   
##  4   103 Mexicano/Mexicana  Mexicano/Mexicana  Hispanic     <NA>     Hispanic   
##  5   104 Chicano/Chicana    Chicano/Chicana    Hispanic     <NA>     Hispanic   
##  6   108 Mexican (Mexicano) Mexican (Mexicano) Hispanic     <NA>     Hispanic   
##  7   109 Mexicano/Chicano   Mexicano/Chicano   Hispanic     <NA>     Hispanic   
##  8   200 Puerto Rican       Puerto Rican       Hispanic     <NA>     Hispanic   
##  9   300 Cuban              Cuban              Hispanic     <NA>     Hispanic   
## 10   400 Dominican          Dominican          Hispanic     <NA>     Hispanic   
## # ℹ 20 more rows

Race

The code below provides the same methodology for recoding the Race variables to be identifical.

## # A tibble: 62 × 4
##      val lbl_asec                       lbl_cps                        lbl_atus 
##    <dbl> <chr>                          <chr>                          <chr>    
##  1   100 White                          White                          White on…
##  2   200 Black                          Black                          White-Bl…
##  3   300 American Indian/Aleut/Eskimo   American Indian/Aleut/Eskimo   White-Bl…
##  4   650 Asian or Pacific Islander      Asian or Pacific Islander      <NA>     
##  5   651 Asian only                     Asian only                     <NA>     
##  6   652 Hawaiian/Pacific Islander only Hawaiian/Pacific Islander only <NA>     
##  7   700 Other (single) race, n.e.c.    Other (single) race, n.e.c.    <NA>     
##  8   801 White-Black                    White-Black                    <NA>     
##  9   802 White-American Indian          White-American Indian          <NA>     
## 10   803 White-Asian                    White-Asian                    <NA>     
## # ℹ 52 more rows

## # A tibble: 62 × 6
##      val lbl_asec                       lbl_cps      race_cps lbl_atus race_atus
##    <dbl> <chr>                          <chr>        <chr>    <chr>    <chr>    
##  1   100 White                          White        White    White o… White    
##  2   200 Black                          Black        Black    White-B… Two or M…
##  3   300 American Indian/Aleut/Eskimo   American In… America… White-B… Two or M…
##  4   650 Asian or Pacific Islander      Asian or Pa… Asian/P… <NA>     Two or M…
##  5   651 Asian only                     Asian only   Asian/P… <NA>     Two or M…
##  6   652 Hawaiian/Pacific Islander only Hawaiian/Pa… Asian/P… <NA>     Two or M…
##  7   700 Other (single) race, n.e.c.    Other (sing… Two or … <NA>     Two or M…
##  8   801 White-Black                    White-Black  Two or … <NA>     Two or M…
##  9   802 White-American Indian          White-Ameri… Two or … <NA>     Two or M…
## 10   803 White-Asian                    White-Asian  Two or … <NA>     Two or M…
## # ℹ 52 more rows

Sex

The code below provides the same methodology for recoding the sex variable.

## # A tibble: 4 × 4
##     val lbl_asec lbl_cps lbl_atus             
##   <dbl> <chr>    <chr>   <chr>                
## 1     1 Male     Male    Male                 
## 2     2 Female   Female  Female               
## 3     9 NIU      NIU     <NA>                 
## 4    99 <NA>     <NA>    NIU (Not in universe)

## # A tibble: 4 × 5
##     val lbl_asec lbl_cps lbl_atus              sex   
##   <dbl> <chr>    <chr>   <chr>                 <chr> 
## 1     1 Male     Male    Male                  Male  
## 2     2 Female   Female  Female                Female
## 3     9 NIU      NIU     <NA>                  NIU   
## 4    99 <NA>     <NA>    NIU (Not in universe) NIU

Marital Status

The code below provides the same methodology for recoding the marital status variable.

## # A tibble: 9 × 4
##     val lbl_asec                lbl_cps                 lbl_atus                
##   <dbl> <chr>                   <chr>                   <chr>                   
## 1     1 Married, spouse present Married, spouse present Married - spouse present
## 2     2 Married, spouse absent  Married, spouse absent  Married - spouse absent 
## 3     3 Separated               Separated               Widowed                 
## 4     4 Divorced                Divorced                Divorced                
## 5     5 Widowed                 Widowed                 Separated               
## 6     6 Never married/single    Never married/single    Never married           
## 7     7 Widowed or Divorced     Widowed or Divorced     <NA>                    
## 8     9 NIU                     NIU                     <NA>                    
## 9    99 <NA>                    <NA>                    NIU (Not in universe)

## # A tibble: 9 × 5
##     val lbl_asec                lbl_cps                 lbl_atus           marst
##   <dbl> <chr>                   <chr>                   <chr>              <chr>
## 1     1 Married, spouse present Married, spouse present Married - spouse … Marr…
## 2     2 Married, spouse absent  Married, spouse absent  Married - spouse … Marr…
## 3     3 Separated               Separated               Widowed            Sepa…
## 4     4 Divorced                Divorced                Divorced           Sepa…
## 5     5 Widowed                 Widowed                 Separated          Sepa…
## 6     6 Never married/single    Never married/single    Never married      Sing…
## 7     7 Widowed or Divorced     Widowed or Divorced     <NA>               Sepa…
## 8     9 NIU                     NIU                     <NA>               NIU  
## 9    99 <NA>                    <NA>                    NIU (Not in unive… NIU

Education

The variable below provides the same methodology for recoding the educaiton variable.

## # A tibble: 42 × 4
##      val lbl_asec             lbl_cps              lbl_atus                     
##    <dbl> <chr>                <chr>                <chr>                        
##  1     0 NIU or no schooling  NIU or no schooling  <NA>                         
##  2     1 NIU or blank         NIU or blank         <NA>                         
##  3     2 None or preschool    None or preschool    <NA>                         
##  4    10 Grades 1, 2, 3, or 4 Grades 1, 2, 3, or 4 Less than 1st grade          
##  5    11 Grade 1              Grade 1              1st, 2nd, 3rd, or 4th grade  
##  6    12 Grade 2              Grade 2              5th or 6th grade             
##  7    13 Grade 3              Grade 3              7th or 8th grade             
##  8    14 Grade 4              Grade 4              9th grade                    
##  9    20 Grades 5 or 6        Grades 5 or 6        High school graduate - GED   
## 10    21 Grade 5              Grade 5              High school graduate - diplo…
## # ℹ 32 more rows

## # A tibble: 42 × 6
##      val lbl_asec             lbl_cps              educ_cps   lbl_atus educ_atus
##    <dbl> <chr>                <chr>                <chr>      <chr>    <chr>    
##  1     0 NIU or no schooling  NIU or no schooling  NIU        <NA>     0        
##  2     1 NIU or blank         NIU or blank         NIU        <NA>     1        
##  3     2 None or preschool    None or preschool    No HS Dip… <NA>     2        
##  4    10 Grades 1, 2, 3, or 4 Grades 1, 2, 3, or 4 No HS Dip… Less th… No HS Di…
##  5    11 Grade 1              Grade 1              No HS Dip… 1st, 2n… No HS Di…
##  6    12 Grade 2              Grade 2              No HS Dip… 5th or … No HS Di…
##  7    13 Grade 3              Grade 3              No HS Dip… 7th or … No HS Di…
##  8    14 Grade 4              Grade 4              No HS Dip… 9th gra… No HS Di…
##  9    20 Grades 5 or 6        Grades 5 or 6        No HS Dip… High sc… High Sch…
## 10    21 Grade 5              Grade 5              No HS Dip… High sc… High Sch…
## # ℹ 32 more rows

Poverty

The code below provides the same methodology for the poverty variable.

## # A tibble: 13 × 4
##      val lbl_asec                                   lbl_cps lbl_atus            
##    <dbl> <chr>                                      <lgl>   <chr>               
##  1     0 NIU                                        NA      <NA>                
##  2    10 Below poverty                              NA      HH income less than…
##  3    20 Above poverty                              NA      HH income greater t…
##  4    21 100-124 percent of the low-income level    NA      <NA>                
##  5    22 125-149 percent of the low-income level    NA      <NA>                
##  6    23 150 percent and above the low-income level NA      <NA>                
##  7    NA <NA>                                       NA      <NA>                
##  8    11 <NA>                                       NA      HH income less than…
##  9    12 <NA>                                       NA      HH income equal to …
## 10    96 <NA>                                       NA      Refused             
## 11    97 <NA>                                       NA      Don't know          
## 12    98 <NA>                                       NA      Blank               
## 13    99 <NA>                                       NA      NIU (Not in univers…

## # A tibble: 13 × 6
##      val lbl_asec                             pov_asec lbl_cps lbl_atus pov_atus
##    <dbl> <chr>                                <chr>    <lgl>   <chr>    <chr>   
##  1     0 NIU                                  NIU      NA      <NA>     NIU     
##  2    10 Below poverty                        Below P… NA      HH inco… Below P…
##  3    20 Above poverty                        Above P… NA      HH inco… Above P…
##  4    21 100-124 percent of the low-income l… 100-124… NA      <NA>     NIU     
##  5    22 125-149 percent of the low-income l… 125-149… NA      <NA>     NIU     
##  6    23 150 percent and above the low-incom… 150+ Pe… NA      <NA>     NIU     
##  7    NA <NA>                                 <NA>     NA      <NA>     <NA>    
##  8    11 <NA>                                 11       NA      HH inco… Below P…
##  9    12 <NA>                                 12       NA      HH inco… Below P…
## 10    96 <NA>                                 96       NA      Refused  NIU     
## 11    97 <NA>                                 97       NA      Don't k… NIU     
## 12    98 <NA>                                 98       NA      Blank    NIU     
## 13    99 <NA>                                 99       NA      NIU (No… NIU

Labor Force Status

The code below provides the same methodology for the Labor Force Status variable.

##   val lbl_asec                    lbl_cps               lbl_atus
## 1  NA       NA                       <NA>                   <NA>
## 2   0       NA                        NIU                   <NA>
## 3   1       NA No, not in the labor force     Employed - at work
## 4   2       NA    Yes, in the labor force      Employed - absent
## 5   3       NA                       <NA> Unemployed - on layoff
## 6   4       NA                       <NA>   Unemployed - looking
## 7   5       NA                       <NA>     Not in labor force
## 8  99       NA                       <NA>  NIU (Not in universe)

##   val lbl_asec                    lbl_cps           labforce_cps
## 1  NA       NA                       <NA>                   <NA>
## 2   0       NA                        NIU                    NIU
## 3   1       NA No, not in the labor force Not in the Labor Force
## 4   2       NA    Yes, in the labor force     In the Labor Force
## 5   3       NA                       <NA>                   <NA>
## 6   4       NA                       <NA>                   <NA>
## 7   5       NA                       <NA>                   <NA>
## 8  99       NA                       <NA>                   <NA>
##                 lbl_atus          labforce_atus
## 1                   <NA>                   <NA>
## 2                   <NA>                   <NA>
## 3     Employed - at work     In the Labor Force
## 4      Employed - absent     In the Labor Force
## 5 Unemployed - on layoff     In the Labor Force
## 6   Unemployed - looking     In the Labor Force
## 7     Not in labor force Not in the Labor Force
## 8  NIU (Not in universe)                    NIU

Employment Status

The code below provides the same methodology for the Employment Status variable.

## # A tibble: 23 × 4
##      val lbl_asec                       lbl_cps                        lbl_atus 
##    <dbl> <chr>                          <chr>                          <chr>    
##  1     0 NIU                            NIU                            <NA>     
##  2     1 Armed Forces                   Armed Forces                   Employed…
##  3    10 At work                        At work                        <NA>     
##  4    12 Has job, not at work last week Has job, not at work last week <NA>     
##  5    20 Unemployed                     Unemployed                     <NA>     
##  6    21 Unemployed, experienced worker Unemployed, experienced worker <NA>     
##  7    22 Unemployed, new worker         Unemployed, new worker         <NA>     
##  8    30 Not in labor force             Not in labor force             <NA>     
##  9    31 NILF, housework                NILF, housework                <NA>     
## 10    32 NILF, unable to work           NILF, unable to work           <NA>     
## # ℹ 13 more rows

## # A tibble: 23 × 6
##      val lbl_asec                      lbl_cps empstat_cps lbl_atus empstat_atus
##    <dbl> <chr>                         <chr>   <chr>       <chr>    <chr>       
##  1     0 NIU                           NIU     NIU         <NA>     0           
##  2     1 Armed Forces                  Armed … Armed Forc… Employe… Employed    
##  3    10 At work                       At work Employed    <NA>     10          
##  4    12 Has job, not at work last we… Has jo… Employed    <NA>     12          
##  5    20 Unemployed                    Unempl… Unemployed  <NA>     20          
##  6    21 Unemployed, experienced work… Unempl… Unemployed  <NA>     21          
##  7    22 Unemployed, new worker        Unempl… Unemployed  <NA>     22          
##  8    30 Not in labor force            Not in… NILF        <NA>     30          
##  9    31 NILF, housework               NILF, … NILF        <NA>     31          
## 10    32 NILF, unable to work          NILF, … NILF        <NA>     32          
## # ℹ 13 more rows

Work Status

The code below provides the same methodology for the variables identifiying full or part time status.

## # A tibble: 16 × 4
##      val lbl_asec                                               lbl_cps lbl_atus
##    <dbl> <chr>                                                  <chr>   <lgl>   
##  1    10 Full-time schedules                                    Full-t… NA      
##  2    11 Full-time hours (35+), usually full-time               Full-t… NA      
##  3    12 Part-time for non-economic reasons, usually full-time  Part-t… NA      
##  4    13 Not at work, usually full-time                         Not at… NA      
##  5    14 Full-time hours, usually part-time for economic reaso… Full-t… NA      
##  6    15 Full-time hours, usually part-time for non-economic r… Full-t… NA      
##  7    20 Part-time for economic reasons                         Part-t… NA      
##  8    21 Part-time for economic reasons, usually full-time      Part-t… NA      
##  9    22 Part-time hours, usually part-time for economic reaso… Part-t… NA      
## 10    40 Part-time for non-economic reasons, usually part-time  Part-t… NA      
## 11    41 Part-time hours, usually part-time for non-economic r… Part-t… NA      
## 12    42 Not at work, usually part-time                         Not at… NA      
## 13    50 Unemployed, seeking full-time work                     Unempl… NA      
## 14    60 Unemployed, seeking part-time work                     Unempl… NA      
## 15    99 NIU, blank, or not in labor force                      NIU, b… NA      
## 16    NA <NA>                                                   <NA>    NA

## # A tibble: 16 × 5
##      val lbl_asec                                        lbl_cps wkstat lbl_atus
##    <dbl> <chr>                                           <chr>   <chr>  <lgl>   
##  1    10 Full-time schedules                             Full-t… Full … NA      
##  2    11 Full-time hours (35+), usually full-time        Full-t… Full … NA      
##  3    12 Part-time for non-economic reasons, usually fu… Part-t… Full … NA      
##  4    13 Not at work, usually full-time                  Not at… Full … NA      
##  5    14 Full-time hours, usually part-time for economi… Full-t… Full … NA      
##  6    15 Full-time hours, usually part-time for non-eco… Full-t… Full … NA      
##  7    20 Part-time for economic reasons                  Part-t… Part … NA      
##  8    21 Part-time for economic reasons, usually full-t… Part-t… Part … NA      
##  9    22 Part-time hours, usually part-time for economi… Part-t… Part … NA      
## 10    40 Part-time for non-economic reasons, usually pa… Part-t… Part … NA      
## 11    41 Part-time hours, usually part-time for non-eco… Part-t… Part … NA      
## 12    42 Not at work, usually part-time                  Not at… Part … NA      
## 13    50 Unemployed, seeking full-time work              Unempl… Unemp… NA      
## 14    60 Unemployed, seeking part-time work              Unempl… Unemp… NA      
## 15    99 NIU, blank, or not in labor force               NIU, b… NIU    NA      
## 16    NA <NA>                                            <NA>    <NA>   NA

Class of Worker

The code below provides the same methodology for analyzing worker classes.

##    val lbl_asec                         lbl_cps lbl_atus
## 1   NA       NA                            <NA>       NA
## 2    0       NA                             NIU       NA
## 3   10       NA                   Self-employed       NA
## 4   13       NA Self-employed, not incorporated       NA
## 5   14       NA     Self-employed, incorporated       NA
## 6   20       NA       Works for wages or salary       NA
## 7   21       NA            Wage/salary, private       NA
## 8   22       NA             Private, for profit       NA
## 9   23       NA              Private, nonprofit       NA
## 10  24       NA         Wage/salary, government       NA
## 11  25       NA     Federal government employee       NA
## 12  26       NA                    Armed forces       NA
## 13  27       NA       State government employee       NA
## 14  28       NA       Local government employee       NA
## 15  29       NA            Unpaid family worker       NA
## 16  99       NA                 Missing/Unknown       NA

##    val lbl_asec                         lbl_cps        classwkr lbl_atus
## 1   NA       NA                            <NA>            <NA>       NA
## 2    0       NA                             NIU             NIU       NA
## 3   10       NA                   Self-employed   Self_Employed       NA
## 4   13       NA Self-employed, not incorporated   Self_Employed       NA
## 5   14       NA     Self-employed, incorporated   Self_Employed       NA
## 6   20       NA       Works for wages or salary     Wage/Salary       NA
## 7   21       NA            Wage/salary, private     Wage/Salary       NA
## 8   22       NA             Private, for profit     Wage/Salary       NA
## 9   23       NA              Private, nonprofit     Wage/Salary       NA
## 10  24       NA         Wage/salary, government      Government       NA
## 11  25       NA     Federal government employee      Government       NA
## 12  26       NA                    Armed forces      Government       NA
## 13  27       NA       State government employee      Government       NA
## 14  28       NA       Local government employee      Government       NA
## 15  29       NA            Unpaid family worker          Unpaid       NA
## 16  99       NA                 Missing/Unknown Missing/Unknown       NA

NILF

The code below uses the same methodology for the question analyzing why someone is not in the labor force.,

##   val lbl_asec                        lbl_cps lbl_atus
## 1  NA       NA                           <NA>       NA
## 2   1       NA                       Disabled       NA
## 3   2       NA                            Ill       NA
## 4   3       NA                      In school       NA
## 5   4       NA Taking care of house or family       NA
## 6   6       NA          Something else/ Other       NA
## 7  99       NA                          Blank       NA

##   val lbl_asec                        lbl_cps nilf_activity lbl_atus
## 1  NA       NA                           <NA>          <NA>       NA
## 2   1       NA                       Disabled      Disabled       NA
## 3   2       NA                            Ill           Ill       NA
## 4   3       NA                      In school        School       NA
## 5   4       NA Taking care of house or family     Homemaker       NA
## 6   6       NA          Something else/ Other         Other       NA
## 7  99       NA                          Blank           NIU       NA

Telework

The code below provides the same methodology for the varibale asking if someone works via telework.

##   val lbl_asec lbl_cps lbl_atus
## 1  NA       NA    <NA>       NA
## 2   0       NA     NIU       NA
## 3   1       NA     Yes       NA
## 4   2       NA      No       NA

Absenteeism

## # A tibble: 5 × 4
##     val lbl_asec                                             lbl_cps    lbl_atus
##   <dbl> <chr>                                                <chr>      <lgl>   
## 1     0 NIU                                                  NIU        NA      
## 2     1 No                                                   No         NA      
## 3     2 Yes, laid off                                        Yes, laid… NA      
## 4     3 Yes, other reason (vacation, illness, labor dispute) Yes, othe… NA      
## 5    NA <NA>                                                 <NA>       NA

## # A tibble: 5 × 5
##     val lbl_asec                                         lbl_cps absent lbl_atus
##   <dbl> <chr>                                            <chr>   <chr>  <lgl>   
## 1     0 NIU                                              NIU     NIU    NA      
## 2     1 No                                               No      No     NA      
## 3     2 Yes, laid off                                    Yes, l… Yes, … NA      
## 4     3 Yes, other reason (vacation, illness, labor dis… Yes, o… Yes, … NA      
## 5    NA <NA>                                             <NA>    <NA>   NA

Reason for Absent

The code below provides the same methodology for the variable representing the reson why someone was not at work.

## # A tibble: 17 × 4
##      val lbl_asec                            lbl_cps                    lbl_atus
##    <dbl> <chr>                               <chr>                      <lgl>   
##  1     0 NIU                                 NIU                        NA      
##  2     1 On temporary layoff (under 30 days) On temporary layoff (unde… NA      
##  3     2 On indefinite layoff (30+ days)     On indefinite layoff (30+… NA      
##  4     3 Slack work/business conditions      Slack work/business condi… NA      
##  5     4 Waiting for a new job to begin      Waiting for a new job to … NA      
##  6     5 Vacation/personal days              Vacation/personal days     NA      
##  7     6 Own illness/injury/medical problems Own illness/injury/medica… NA      
##  8     7 Child care problems                 Child care problems        NA      
##  9     8 Other family/personal obligation    Other family/personal obl… NA      
## 10     9 Maternity/paternity leave           Maternity/paternity leave  NA      
## 11    10 Labor dispute                       Labor dispute              NA      
## 12    11 Weather affected job                Weather affected job       NA      
## 13    12 School/training                     School/training            NA      
## 14    13 Civic/military duty                 Civic/military duty        NA      
## 15    14 Does not work in the business       Does not work in the busi… NA      
## 16    15 Other                               Other                      NA      
## 17    NA <NA>                                <NA>                       NA

## # A tibble: 17 × 5
##      val lbl_asec                            lbl_cps           whyabsnt lbl_atus
##    <dbl> <chr>                               <chr>             <chr>    <lgl>   
##  1     0 NIU                                 NIU               NIU      NA      
##  2     1 On temporary layoff (under 30 days) On temporary lay… 1        NA      
##  3     2 On indefinite layoff (30+ days)     On indefinite la… 2        NA      
##  4     3 Slack work/business conditions      Slack work/busin… 3        NA      
##  5     4 Waiting for a new job to begin      Waiting for a ne… 4        NA      
##  6     5 Vacation/personal days              Vacation/persona… Vacatio… NA      
##  7     6 Own illness/injury/medical problems Own illness/inju… Own ill… NA      
##  8     7 Child care problems                 Child care probl… Care Re… NA      
##  9     8 Other family/personal obligation    Other family/per… Care Re… NA      
## 10     9 Maternity/paternity leave           Maternity/patern… Care Re… NA      
## 11    10 Labor dispute                       Labor dispute     Non-Car… NA      
## 12    11 Weather affected job                Weather affected… Non-Car… NA      
## 13    12 School/training                     School/training   Non-Car… NA      
## 14    13 Civic/military duty                 Civic/military d… Non-Car… NA      
## 15    14 Does not work in the business       Does not work in… 14       NA      
## 16    15 Other                               Other             Other    NA      
## 17    NA <NA>                                <NA>              <NA>     NA

Recoding all Variables

The functions created above provide the methodology to recode all needed variables. The code chunk below creates a general function that uses these above functions to recode the variables ensuring that all CPS Monthly, CPS ASEC, and ATUS samples have identical variable values. Additionally, the code below ensures that numeric variables are correctly coded and provides mutations to data variables to ensure they are all the same format.

The functions in the chunk below are split for variables in all samples, variables in both the ASEC and CPS Monthly, and variables in each of the unique samples. For a reminder of which variables are in which, see the variable classifications previously discussed. Finally, this code chunk creates the final column order which will be used to ensure that all data sets have their variables in the same order.

Loading the Activity Data

In The Care Board methodology, we specifically develop methods to compare both formal paid and informal unpaid activities and time use. We create crosswalks from the data to code all activities as either care activities or not and with a specific care focus for care related activities. The classification of jobs and activities as part of the care economy or not represents a major source of assumptions and decision points. We acknowledge there are many ways to classify some of these detailed activities and that others might have differing opinions about how best to classify them. We thus provide the crosswalks for full transparency, analysis, and review.

The first crosswalk presents the classification of formal occupations as parts of the care economy or not. This crosswalk uses federal standard occupational classification codes (SOC) and for each SOC, labels it as developmental care, daily living care, health care, or none.

##       code                                 occ_category
##      <int>                                       <char>
##   1:    10      MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
##   2:    20      MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
##   3:    30      MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
##   4:   100      MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
##   5:   110      MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
##  ---                                                   
## 454:  9800                            MILITARY SPECIFIC
## 455:  9810                            MILITARY SPECIFIC
## 456:  9820                            MILITARY SPECIFIC
## 457:  9830                            MILITARY SPECIFIC
## 458:  9999 NOT IN UNIVERSE (UNEMPLOYED OR NEVER WORKED)
##                                                                                occ_name
##                                                                                  <char>
##   1:                             Chief executives and legislators/public administration
##   2:                                                    General and Operations Managers
##   3:                           Managers in Marketing, Advertising, and Public Relations
##   4:                                                   Administrative Services Managers
##   5:                                          Computer and Information Systems Managers
##  ---                                                                                   
## 454:                           Military Officer Special and Tactical Operations Leaders
## 455:                                           First-Line Enlisted Military Supervisors
## 456: Military Enlisted Tactical Operations and Air/Weapons Specialists and Crew Members
## 457:                                                       Military, Rank Not Specified
## 458:                                                                                NIU
##                                                   occ_label occ_care_focus
##                                                      <char>         <char>
##   1: Chief executives and legislators/public administration           none
##   2:                                      Business Managers           none
##   3:                                      Business Managers           none
##   4:                                      Business Managers           none
##   5:                                      Business Managers           none
##  ---                                                                      
## 454:                                               Military           none
## 455:                                               Military           none
## 456:                                               Military           none
## 457:                                               Military           none
## 458:                                                   NULL           none

The second crosswalk presents the classification of informal time use activities as part of the care economy or not. This crosswalk uses the ATUS activity codes and for each activity, labels it as developmental care, daily living care, health care, or none.

##        code            activity developmental health daily_living paid_work
##       <int>              <char>         <int>  <int>        <int>     <int>
##   1:  10101            Sleeping            NA     NA           NA        NA
##   2:  10102            Sleeping            NA     NA           NA        NA
##   3:  10199            Sleeping            NA     NA           NA        NA
##   4:  10201       Self Grooming            NA     NA            1        NA
##   5:  10299       Self Grooming            NA     NA            1        NA
##  ---                                                                       
## 457: 181801           Traveling            NA     NA           NA        NA
## 458: 181899           Traveling            NA     NA           NA        NA
## 459: 189999           Traveling            NA     NA           NA        NA
## 460:     NA Secondary Childcare             1     NA           NA        NA
## 461:     NA Secondary Eldercare            NA      1           NA        NA
##      formal_work child_care elder_care householdcare selfcare leisure sleeping
##            <int>      <int>      <int>         <int>    <int>   <int>    <int>
##   1:          NA         NA         NA            NA       NA      NA        1
##   2:          NA         NA         NA            NA       NA      NA        1
##   3:          NA         NA         NA            NA       NA      NA        1
##   4:          NA         NA         NA            NA        1      NA       NA
##   5:          NA         NA         NA            NA        1      NA       NA
##  ---                                                                          
## 457:          NA         NA         NA            NA       NA      NA       NA
## 458:          NA         NA         NA            NA       NA      NA       NA
## 459:          NA         NA         NA            NA       NA      NA       NA
## 460:          NA         NA         NA            NA       NA      NA       NA
## 461:          NA         NA         NA            NA       NA      NA       NA
##      volunteering education
##             <int>     <int>
##   1:           NA        NA
##   2:           NA        NA
##   3:           NA        NA
##   4:           NA        NA
##   5:           NA        NA
##  ---                       
## 457:           NA        NA
## 458:           NA        NA
## 459:           NA        NA
## 460:           NA        NA
## 461:           NA        NA

CPS Monthly Variable Processing

Now that we have investigated the variables across our different samples, we need to apply the various functions above to each of our datasets to recode them to fit the proper format. We start with the CPS Monthly data. The code below uses the ddi file to load in all CPS Monthly data and then applies the functions to recode the variables, assemble them in the correct order, and merge them with the activity coding data. It then saves the files as an rds file for future use.

## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.

Data summary
Name	micro_cps
Number of rows	53950525
Number of columns	66
_______________________
Column type frequency:
character	23
Date	1
factor	2
numeric	40
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
month	0	1.00	3	9	12
age_category	0	1.00	8	23	7
prime_age	0	1.00	9	17	3
sex	0	1.00	4	6	2
hispan	0	1.00	3	12	3
race	0	1.00	5	20	5
race_ethnicity	0	1.00	5	20	6
marst	0	1.00	3	31	4
gender_parent	0	1.00	5	11	5
child_age	0	1.00	3	15	5
educ	0	1.00	3	17	6
empstat	0	1.00	3	12	5
laborstatus	0	1.00	3	12	6
absent	0	1.00	2	13	4
whyabsnt	0	1.00	1	27	9
wkstat	0	1.00	3	10	4
labforce	0	1.00	3	22	3
nilf_activity	7061598	0.87	3	9	6
telwrkpay	51565932	0.04	3	11	3
occ_category	0	1.00	5	46	27
occ_name	0	1.00	3	156	454
occ_label	0	1.00	4	63	82
occ_care_focus	0	1.00	4	13	4

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date	0	1	1990-01-01	2024-09-01	2006-07-01	417

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
region	0	1	FALSE	9	Sou: 9277084, Pac: 7471882, Eas: 6873523, Mid: 6011210
statefip	0	1	FALSE	51	Cal: 4449160, Tex: 2756512, New: 2730754, Flo: 2308199

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
id	0	1.00	26975263.00	15574175.21	1	13487632.00	26975263.00	40462894.00	53950525.00	▇▇▇▇▇
YEAR	0	1.00	2006.16	9.86	1990	1998.00	2006.00	2014.00	2024.00	▇▇▇▇▆
SERIAL	0	1.00	34682.60	20192.19	1	17396.00	34562.00	51935.00	74625.00	▇▇▇▇▅
MONTH	0	1.00	6.48	3.45	1	3.00	6.00	9.00	12.00	▇▆▅▆▇
CPSID	0	1.00	20056711201795.79	98466309399.00	19881000000200	19970702770200.00	20051206270800.00	20140603148800.00	20240906935800.00	▇▇▇▇▆
ASECFLAG	49482961	0.08	2.00	0.00	2	2.00	2.00	2.00	2.00	▁▁▇▁▁
COMPWT	13329726	0.75	1891.60	1567.45	0	358.41	1903.34	3102.71	34708.83	▇▁▁▁▁
HWTFINL	0	1.00	2234.37	1274.08	0	1200.53	2181.91	3116.30	34716.21	▇▁▁▁▁
WTFINL	0	1.00	2268.72	1336.03	0	1203.12	2201.96	3149.89	44747.54	▇▁▁▁▁
REGION	0	1.00	27.78	10.72	11	21.00	31.00	33.00	42.00	▅▆▁▇▆
STATEFIP	0	1.00	28.26	15.74	1	13.00	29.00	41.00	56.00	▇▆▆▇▆
PERNUM	0	1.00	2.16	1.33	1	1.00	2.00	3.00	25.00	▇▁▁▁▁
CPSIDP	0	1.00	20056711201798.03	98466309398.97	19881000000201	19970702770201.00	20051206270801.00	20140603148801.00	20240906935804.00	▇▇▇▇▆
CPSIDV	0	1.00	200567112017981.47	984663093989.67	198810000002011	199707027702011.00	200512062708011.00	201406031488011.00	202409069358041.00	▇▇▇▇▆
AGE	0	1.00	37.41	22.76	0	18.00	37.00	55.00	90.00	▇▇▇▆▂
SEX	0	1.00	1.52	0.50	1	1.00	2.00	2.00	2.00	▇▁▁▁▇
HISPAN	0	1.00	31.64	118.97	0	0.00	0.00	0.00	902.00	▇▁▁▁▁
RACE	0	1.00	148.56	142.48	100	100.00	100.00	100.00	830.00	▇▁▁▁▁
MARST	0	1.00	4.23	3.14	1	1.00	4.00	6.00	9.00	▇▂▁▅▃
MOMLOC	0	1.00	0.52	0.90	0	0.00	0.00	1.00	16.00	▇▁▁▁▁
POPLOC	0	1.00	0.36	0.75	0	0.00	0.00	0.00	16.00	▇▁▁▁▁
SPLOC	0	1.00	0.71	0.91	0	0.00	0.00	1.00	21.00	▇▁▁▁▁
FAMSIZE	0	1.00	3.21	1.68	1	2.00	3.00	4.00	25.00	▇▁▁▁▁
famsize	0	1.00	3.21	1.68	1	2.00	3.00	4.00	25.00	▇▁▁▁▁
NCHILD	0	1.00	0.56	1.01	0	0.00	0.00	1.00	9.00	▇▂▁▁▁
nchild	0	1.00	0.56	1.01	0	0.00	0.00	1.00	9.00	▇▂▁▁▁
YNGCH	0	1.00	73.13	40.08	0	21.00	99.00	99.00	99.00	▃▁▁▁▇
EDUC	0	1.00	64.78	39.58	1	32.00	73.00	91.00	125.00	▆▂▇▆▆
EMPSTAT	0	1.00	15.31	13.05	0	10.00	10.00	34.00	36.00	▃▇▁▁▅
OCC2010	0	1.00	7096.88	3409.72	10	4250.00	9620.00	9999.00	9999.00	▂▂▂▁▇
IND1990	0	1.00	307.07	358.86	0	0.00	20.00	701.00	952.00	▇▁▁▂▃
AHRSWORKT	0	1.00	560.62	478.34	1	40.00	999.00	999.00	999.00	▇▁▁▁▇
ABSENT	0	1.00	0.37	0.59	0	0.00	0.00	1.00	3.00	▇▃▁▁▁
WHYABSNT	0	1.00	0.15	1.14	0	0.00	0.00	0.00	15.00	▇▁▁▁▁
WKSTAT	0	1.00	58.16	41.60	10	11.00	60.00	99.00	99.00	▆▁▁▁▇
LABFORCE	0	1.00	1.30	0.79	0	1.00	2.00	2.00	2.00	▃▁▅▁▇
CLASSWKR	0	1.00	17.37	23.70	0	0.00	21.00	22.00	99.00	▇▇▁▁▁
NILFACT	7061598	0.87	88.46	29.91	1	99.00	99.00	99.00	99.00	▁▁▁▁▇
DIFFCARE	29975790	0.44	0.83	0.42	0	1.00	1.00	1.00	2.00	▂▁▇▁▁
TELWRKPAY	51565932	0.04	0.82	0.94	0	0.00	0.00	2.00	2.00	▇▁▂▁▆

CPS Asec Variable Processing

The code below does the same thing applying the functions created to recode the data for the CPS ASEC variables. This code then assembles the data in the proper order and merges them with the activity data.

## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.

Data summary
Name	micro_asec
Number of rows	6207057
Number of columns	64
_______________________
Column type frequency:
character	21
Date	1
factor	2
numeric	40
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
month	1	5	5	1
age_category	1	8	23	7
prime_age	1	9	17	3
sex	1	4	6	2
hispan	1	3	12	3
race	1	5	20	5
race_ethnicity	1	5	20	6
marst	1	7	31	3
gender_parent	1	5	11	5
child_age	1	3	15	5
educ	1	3	17	6
empstat	1	3	12	5
laborstatus	1	3	12	6
absent	1	2	13	4
whyabsnt	1	1	27	9
wkstat	1	3	10	4
poverty	1	3	26	5
occ_category	1	5	46	27
occ_name	1	3	156	454
occ_label	1	4	63	82
occ_care_focus	1	4	13	4

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date	0	1	1990-03-01	2024-03-01	2008-03-01	35

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
region	0	1	FALSE	9	Sou: 1064521, Pac: 940960, Eas: 751093, Mou: 708900
statefip	0	1	FALSE	51	Cal: 583792, Tex: 356065, New: 309136, Flo: 273960

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
id	0	1.00	3103529.00	1791823.16	1	1551765.00	3103529.00	4655293.00	6207057.00	▇▇▇▇▇
YEAR	0	1.00	2007.39	9.56	1990	2000.00	2008.00	2015.00	2024.00	▆▆▇▇▆
SERIAL	0	1.00	45647.39	26974.60	1	22449.00	44685.00	67543.00	99986.00	▇▇▇▇▅
MONTH	0	1.00	3.00	0.00	3	3.00	3.00	3.00	3.00	▁▁▇▁▁
CPSID	0	1.00	14348026771078.58	9049746509586.89	0	0.00	19990105297800.00	20101204144600.00	20240306932200.00	▃▁▁▁▇
ASECFLAG	0	1.00	1.00	0.00	1	1.00	1.00	1.00	1.00	▁▁▇▁▁
HFLAG	6007501	0.03	0.30	0.46	0	0.00	0.00	1.00	1.00	▇▁▁▁▃
ASECWTH	0	1.00	1672.94	1127.23	0	890.23	1534.18	2187.67	28654.31	▇▁▁▁▁
pernum	0	1.00	2.26	1.38	1	1.00	2.00	3.00	26.00	▇▁▁▁▁
REGION	0	1.00	28.22	10.74	11	21.00	31.00	41.00	42.00	▅▆▁▇▆
STATEFIP	0	1.00	27.88	15.91	1	13.00	28.00	41.00	56.00	▇▅▆▇▆
PERNUM	0	1.00	2.26	1.38	1	1.00	2.00	3.00	26.00	▇▁▁▁▁
CPSIDP	0	1.00	14347651772919.27	9049921692380.36	0	0.00	19990105295102.00	20101204119002.00	20240306932201.00	▃▁▁▁▇
CPSIDV	0	1.00	143476517729193.41	90499216923804.03	0	0.00	199901052951021.00	201012041190021.00	202403069322011.00	▃▁▁▁▇
ASECWT	0	1.00	1706.90	1174.71	0	892.81	1547.88	2242.25	44423.83	▇▁▁▁▁
AGE	0	1.00	35.21	22.33	0	16.00	34.00	52.00	90.00	▇▆▇▅▂
SEX	0	1.00	1.52	0.50	1	1.00	2.00	2.00	2.00	▇▁▁▁▇
HISPAN	0	1.00	42.44	131.56	0	0.00	0.00	0.00	902.00	▇▁▁▁▁
RACE	0	1.00	155.77	153.22	100	100.00	100.00	100.00	830.00	▇▁▁▁▁
MARST	0	1.00	3.70	2.34	1	1.00	4.00	6.00	6.00	▇▁▁▁▇
MOMLOC	0	1.00	0.59	0.97	0	0.00	0.00	1.00	17.00	▇▁▁▁▁
momloc	0	1.00	0.59	0.97	0	0.00	0.00	1.00	17.00	▇▁▁▁▁
POPLOC	0	1.00	0.42	0.84	0	0.00	0.00	1.00	18.00	▇▁▁▁▁
FAMSIZE	0	1.00	3.41	1.71	1	2.00	3.00	4.00	25.00	▇▁▁▁▁
famsize	0	1.00	3.41	1.71	1	2.00	3.00	4.00	25.00	▇▁▁▁▁
NCHILD	0	1.00	0.62	1.06	0	0.00	0.00	1.00	9.00	▇▂▁▁▁
nchild	0	1.00	0.62	1.06	0	0.00	0.00	1.00	9.00	▇▂▁▁▁
YNGCH	0	1.00	70.64	41.29	0	17.00	99.00	99.00	99.00	▃▁▁▁▇
EDUC	0	1.00	61.79	40.58	1	20.00	73.00	91.00	125.00	▇▂▇▆▆
EMPSTAT	0	1.00	14.61	12.97	0	10.00	10.00	32.00	36.00	▅▇▁▁▅
OCC2010	0	1.00	7179.08	3386.32	10	4510.00	9999.00	9999.00	9999.00	▂▂▂▁▇
IND1990	0	1.00	300.44	358.13	0	0.00	10.00	700.00	952.00	▇▁▁▂▂
UHRSWORKT	627549	0.90	516.62	482.77	0	40.00	997.00	999.00	999.00	▇▁▁▁▇
AHRSWORKT	0	1.00	569.96	477.54	1	40.00	999.00	999.00	999.00	▆▁▁▁▇
ABSENT	0	1.00	0.35	0.57	0	0.00	0.00	1.00	3.00	▇▃▁▁▁
WHYABSNT	0	1.00	0.13	1.08	0	0.00	0.00	0.00	15.00	▇▁▁▁▁
WKSTAT	0	1.00	59.31	41.45	10	11.00	99.00	99.00	99.00	▆▁▁▁▇
EARNWT	0	1.00	1365.70	3818.99	0	0.00	0.00	0.00	85013.19	▇▁▁▁▁
INCWAGE	0	1.00	23346324.69	42280815.29	0	0.00	25000.00	127000.00	99999999.00	▇▁▁▁▂
POVERTY	0	1.00	21.14	4.39	0	23.00	23.00	23.00	23.00	▁▁▁▁▇

##              used    (Mb) gc trigger    (Mb)   max used    (Mb)
## Ncells    2438022   130.3    4608092   246.1    4608092   246.1
## Vcells 3186371270 24310.1 6424343182 49013.9 6424343182 49013.9

ATUS Variable Processing

The code below uses the functions created to recode the ATUS variables. This code then assembles the data in the proper order and merges them with the activity data. Following this, this code merges ATUS data with occupation data from the CPS Monthly data. For understanding formal care economy work, we rely on responses in the CPS Monthly data and use the CPSIDP variable to merge between the ATUS and CPS Monthlly datasets. The ATUS is conducted among a subset of individuals in the month when they leave the CPS Monthly data rotation. We use the data from the last month an individual is present in the CPS Monthly to identify their formal occupation status for the ATUS data.

## Use of data from IPUMS ATUS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.

##             used   (Mb) gc trigger    (Mb)   max used    (Mb)
## Ncells   2441034  130.4   33275526  1777.2   41594407  2221.4
## Vcells 641776893 4896.4 7400984145 56465.1 6424343182 49013.9

Data summary
Name	micro_atus
Number of rows	4740486
Number of columns	73
_______________________
Column type frequency:
character	19
Date	2
numeric	52
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
age_category	0	1.00	8	23	7
prime_age	0	1.00	9	17	3
sex	0	1.00	4	6	2
hispan	0	1.00	8	12	2
race	0	1.00	5	20	5
race_ethnicity	0	1.00	5	20	6
marst	0	1.00	7	31	3
gender_parent	0	1.00	5	11	5
child_age	0	1.00	3	15	5
educ	0	1.00	11	17	5
empstat	0	1.00	4	10	3
poverty	3040773	0.36	3	13	3
KIDWAKETIME	0	1.00	8	8	263
KIDBEDTIME	0	1.00	8	8	583
START	0	1.00	8	8	1440
STOP	0	1.00	8	8	1440
activity	146100	0.97	7	49	109
act_care_focus	0	1.00	6	13	4
occ_care_focus	0	1.00	4	13	4

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date	0	1	2003-01-01	2023-01-01	2011-01-01	21
cps_date	0	1	2002-08-01	2023-10-01	2011-04-01	255

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
id	0	1.00	2370243.50	1368460.58	1.0	1185122	2370243.50	3555364.75	4740486.00	▇▇▇▇▇
YEAR	0	1.00	2011.64	5.99	2003.0	2006	2011.00	2017.00	2023.00	▇▆▅▅▃
SERIAL	0	1.00	6164.44	3964.58	1.0	2945	5848.00	8841.00	20720.00	▇▇▆▁▁
DAY	0	1.00	3.97	2.31	1.0	2	4.00	6.00	7.00	▇▂▂▂▇
WT06	155109	0.97	7577532.28	7235883.79	419471.7	3104725	5515035.66	9330912.03	209010030.47	▇▁▁▁▁
WT20	4402397	0.07	8913317.08	8762416.19	0.0	3617274	6635796.94	11389433.39	137151707.75	▇▁▁▁▁
CASEID	0	1.00	20117019506409.80	59891260707.51	20030100013280.0	20061110060893	20110707112061.00	20170101170920.75	20231212232280.00	▇▆▆▅▅
STRATA	1607822	0.66	2725.88	1632.89	-1.0	1200	2700.00	4100.00	5604.00	▇▇▆▇▆
STATEFIP	0	1.00	28.26	15.82	1.0	13	28.00	42.00	56.00	▇▆▆▇▆
PERNUM	0	1.00	1.00	0.00	1.0	1	1.00	1.00	1.00	▁▁▇▁▁
CPSIDP	0	1.00	20102795918058.71	60071242310.74	20010500956102.0	20050604643604	20100206669002.00	20150706792401.00	20230806450301.00	▇▆▆▅▃
AGE	0	1.00	47.94	17.69	15.0	34	46.00	62.00	85.00	▅▇▇▆▃
SEX	0	1.00	1.60	0.49	1.0	1	2.00	2.00	2.00	▆▁▁▁▇
HISPAN	0	1.00	115.22	40.37	100.0	100	100.00	100.00	250.00	▇▁▁▁▁
RACE	0	1.00	104.01	14.38	100.0	100	100.00	100.00	599.00	▇▁▁▁▁
MARST	0	1.00	2.81	2.07	1.0	1	1.00	4.00	6.00	▇▁▂▁▃
HH_SIZE	0	1.00	2.80	1.53	1.0	2	2.00	4.00	16.00	▇▁▁▁▁
FAMINCOME	0	1.00	65.86	225.94	1.0	8	12.00	15.00	998.00	▇▁▁▁▁
HH_NUMADULTS	0	1.00	1.90	0.79	0.0	1	2.00	2.00	12.00	▇▁▁▁▁
NCHILD	0	1.00	0.86	1.14	0.0	0	0.00	2.00	9.00	▇▃▁▁▁
nchild	0	1.00	0.86	1.14	0.0	0	0.00	2.00	9.00	▇▃▁▁▁
YNGCH	0	1.00	58.76	44.86	0.0	9	99.00	99.00	99.00	▆▁▁▁▇
EDUC	0	1.00	29.84	9.57	10.0	21	30.00	40.00	43.00	▂▆▁▆▇
EMPSTAT	0	1.00	2.50	1.88	1.0	1	1.00	5.00	5.00	▇▁▁▁▅
POVERTY185	3040773	0.36	20.96	17.52	10.0	11	20.00	20.00	99.00	▇▁▁▁▁
LINENO	0	1.00	1.00	0.00	1.0	1	1.00	1.00	1.00	▁▁▇▁▁
OCC2	0	1.00	3893.18	4788.38	110.0	127	150.00	9999.00	9999.00	▇▁▁▁▅
OCC_CPS8	0	1.00	36496.01	45575.41	10.0	2830	5400.00	99999.00	99999.00	▇▁▁▁▅
EARNWEEK	0	1.00	45587.07	49297.07	0.0	680	2019.23	99999.99	99999.99	▇▁▁▁▆
HRSWORKT_CPS8	0	1.00	4243.91	4919.92	1.0	40	50.00	9999.00	9999.00	▇▁▁▁▆
SPEMPSTAT	0	1.00	45.10	48.35	1.0	1	3.00	99.00	99.00	▇▁▁▁▆
ECPRIOR	2228561	0.53	1.46	10.86	0.0	0	0.00	0.00	99.00	▇▁▁▁▁
ACTLINE	0	1.00	11.84	8.41	1.0	5	10.00	16.00	91.00	▇▂▁▁▁
ACTIVITY	0	1.00	89041.53	76474.29	10101.0	20201	110101.00	120312.00	509999.00	▇▇▁▁▁
DURATION_EXT	0	1.00	83.22	126.56	1.0	15	30.00	90.00	1472.00	▇▁▁▁▁
DURATION	0	1.00	74.46	100.87	1.0	15	30.00	90.00	1350.00	▇▁▁▁▁
SCC_ALL_LN	0	1.00	6.69	27.27	0.0	0	0.00	0.00	1195.00	▇▁▁▁▁
SCC_OWN_LN	412611	0.91	5.71	24.66	0.0	0	0.00	0.00	1195.00	▇▁▁▁▁
SEC_ALL_LN	2228561	0.53	0.47	8.27	0.0	0	0.00	0.00	1097.00	▇▁▁▁▁
developmental	0	1.00	0.05	0.22	0.0	0	0.00	0.00	1.00	▇▁▁▁▁
health	0	1.00	0.01	0.08	0.0	0	0.00	0.00	1.00	▇▁▁▁▁
daily_living	0	1.00	0.27	0.44	0.0	0	0.00	1.00	1.00	▇▁▁▁▃
paid_work	0	1.00	0.07	0.26	0.0	0	0.00	0.00	1.00	▇▁▁▁▁
formal_work	0	1.00	0.04	0.20	0.0	0	0.00	0.00	1.00	▇▁▁▁▁
child_care	0	1.00	0.07	0.26	0.0	0	0.00	0.00	1.00	▇▁▁▁▁
elder_care	0	1.00	0.01	0.11	0.0	0	0.00	0.00	1.00	▇▁▁▁▁
householdcare	0	1.00	0.19	0.39	0.0	0	0.00	0.00	1.00	▇▁▁▁▂
selfcare	0	1.00	0.24	0.43	0.0	0	0.00	0.00	1.00	▇▁▁▁▂
leisure	0	1.00	0.21	0.41	0.0	0	0.00	0.00	1.00	▇▁▁▁▂
sleeping	0	1.00	0.11	0.32	0.0	0	0.00	0.00	1.00	▇▁▁▁▁
volunteering	0	1.00	0.01	0.08	0.0	0	0.00	0.00	1.00	▇▁▁▁▁
education	0	1.00	0.01	0.08	0.0	0	0.00	0.00	1.00	▇▁▁▁▁

Data Processing Conclusion

The code in this section has provided the methodology for downloading, cleaning, summarizing, and saving the data used for developing the statistics in The Care Board project. This section is essential for understanding the replication methodologies to go from completely raw data to the data used to compile our statistics. The code above saves 3 separate datasets for the CPS monthly files, the yearly CPS ASEC files, and yearly ATUS files. Upon the complete running of this code, three datasets should be written into the proper working directory.

Care and care provision

Understanding how much time in a given day a person requires care is essential for accurately assessing the scale and structure of the care economy. While caregiving can be measured using a variety of methods, time-based measurements provide a more granular and human-centered view of care than simple headcounts or categorical designations of dependency. They help differentiate between levels of care intensity and the allocation of resources across health and social service systems. Furthermore, time-use data allows researchers and policymakers to model scenarios of unmet care needs and evaluate how demographic shifts, such as population aging or rising rates of disability, will affect demand for care services in both formal and informal sectors.

This information is also foundational for estimating the economic value of caregiving. Many individuals who require care do not receive it through formal markets but rely instead on family members or community networks. Without quantifying the time demands associated with caregiving needs, it is difficult to assess the hidden costs borne by households or to design equitable social support programs. Accurately capturing time needs can reveal care deficits and stress points in existing systems, thus informing policies aimed at improving accessibility, equity, and wellbeing outcomes for care recipients.

Equally important is understanding how much time the average person spends providing care on a daily basis. Capturing this data highlights the often-invisible labor that sustains households and communities, particularly the unpaid and gendered work frequently carried out by women. By quantifying caregiving as a time commitment, researchers can estimate its opportunity costs—such as foregone earnings, education, or leisure—and more comprehensively assess its impact on individual wellbeing and economic productivity. This information is crucial for designing interventions, from tax credits to caregiver respite programs, that acknowledge and support the vital contributions of care providers.

Moreover, time-use data on caregiving offers a powerful tool for comparative policy analysis. It enables cross-population comparisons, tracking how caregiving varies by age, gender, socioeconomic status, and family structure. It also facilitates longitudinal studies of how caregiving responsibilities evolve over the life course or in response to social policy changes. Embedding time-based care metrics into national surveys and economic accounts can help integrate care work into the broader understanding of labor markets, social reproduction, and economic development, thereby strengthening the case for investing in the care economy as both a moral and strategic priority.

The first section of The Care Board, What Is the Care Economy, uses a variety of methods to create measurements of average time we need care and how much time we have available to provide care. The code in the next few sections comes with many assumptions and simplifications. It is vital that in the futue more data be collected to provide better estimates on the following outcomes, but for now, this code represents our most complete work on estimating care need and provision by individuals across society.

Age Data

The first piece of data we need is information about the amount of people throughout the U.S. by age group. Care need and the provision of care differ dramatically based on life stage. We use the code below and the 2024 CPS ASEC data to create population estimates for age groups between 0 and 85. This R code chunk is doing the following steps:

Creating an age reference list:
It creates a dataframe called age_list with one column (age) containing every age from 0 to 85, one row per age.
Loading and filtering age data:
- Reads a CSV file (ASECdata.csv) into a dataframe called age_data.
- Filters the data to keep only rows from the most recent year (max(YEAR)).
- Selects just the age (AGE) and person-level weight (ASECWT) columns.
- Cleans column names to standardize them (e.g., lowercase with underscores).
Calculating weighted population by age:
- Groups the cleaned data by age and sums the weights to estimate the total population for each age.
Joining with full age list:
Performs a full join between age_list (ages 0 to 85) and the summarized population data
Handling missing population values:
Fills in any missing population values with 0 using coalesce().

In plain language:
This code creates a complete list of ages from 0 to 85 and attaches population estimates from the latest year of survey data.

To ensure as much accuracy as possible, we look at the age distribution from this using the plot below.

Market Datum

Now that we have the population count for each age group, our next step is to pair this data with information on care and care provision for each age. We use a combination of assumptions and data informed analysis to create these values. We call this section the market datum table. We start by creating a blank table for this where each group is paired with the three possible care focuses.

Once we have this information, we need to load in the ATUS data that we organized. When studying time use, ATUS data will act as our primary source of analysis. We use a variety of methods to convert this data into our desired format. This R code chunk is doing the following steps:

Loading time use data:
It reads a CSV file (ATUSdata.csv) into a dataframe called atus, keeping only selected columns related to time use, demographic info, and care activities.
Cleaning column names:
Standardizes column names to lowercase with underscores.
Identifying recent years (excluding 2020):
- Extracts a list of unique years from the dataset.
- Removes the year 2020 (Due to COVID pandemic survey issues).
- Sorts the years in descending order and keeps the 5 most recent years.
- Saves this list as years_include.
Filtering to recent years and creating new variables:
- Keeps only rows where the year is in the most recent 5 years (excluding 2020).
- Renames the act_care_focus column to care_focus for simplicity.
- Creates new columns:
  - care_job: a binary indicator for whether the activity involved care (1) or not (0), based on the focus column.
  - weight: adjusts the person-level weight to reflect a daily average over 5 years.
  - work_time: calculates time spent in paid care work by multiplying duration with the paid_work and care_job indicators.

In plain language:
This code loads and filters time-use survey data to include only the last 5 valid years, calculates daily weights, flags care-related paid work, and computes how much time individuals spent on that work.

Data summary
Name	atus
Number of rows	831170
Number of columns	21
_______________________
Column type frequency:
character	4
numeric	17
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
marst	0	1.00	7	31	3
activity	27630	0.97	7	49	109
care_focus	0	1.00	6	13	4
occ_care_focus	0	1.00	4	13	4

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	1	2020.44	1.86	2018.00	2019.00	2021.00	2022.00	2023.0	▇▁▃▃▃
caseid	1	20205036905740.19	18635631251.34	20180101180006.00	20190201191853.00	20210403210257.00	20220807220724.00	20231212232280.0	▇▅▃▆▆
wt06	1	10750091.20	9687672.98	719246.62	4787913.42	7996184.57	13275764.22	194366929.6	▇▁▁▁▁
actline	1	11.35	7.99	1.00	5.00	10.00	16.00	72.0	▇▂▁▁▁
hh_size	1	2.60	1.45	1.00	2.00	2.00	4.00	14.0	▇▃▁▁▁
age	1	51.69	18.19	15.00	37.00	52.00	67.00	85.0	▃▇▆▇▅
nchild	1	0.71	1.08	0.00	0.00	0.00	1.00	9.0	▇▂▁▁▁
paid_work	1	0.07	0.25	0.00	0.00	0.00	0.00	1.0	▇▁▁▁▁
child_care	1	0.06	0.24	0.00	0.00	0.00	0.00	1.0	▇▁▁▁▁
elder_care	1	0.01	0.11	0.00	0.00	0.00	0.00	1.0	▇▁▁▁▁
sleeping	1	0.12	0.32	0.00	0.00	0.00	0.00	1.0	▇▁▁▁▁
duration	1	77.61	103.75	1.00	15.00	30.00	90.00	1310.0	▇▁▁▁▁
scc_all_ln	1	5.48	24.81	0.00	0.00	0.00	0.00	900.0	▇▁▁▁▁
sec_all_ln	1	0.53	8.74	0.00	0.00	0.00	0.00	922.0	▇▁▁▁▁
care_job	1	0.20	0.40	0.00	0.00	0.00	0.00	1.0	▇▁▁▁▂
weight	1	5890.46	5308.31	394.11	2623.51	4381.47	7274.39	106502.4	▇▁▁▁▁
work_time	1	2.37	25.73	0.00	0.00	0.00	0.00	1195.0	▇▁▁▁▁

We then use the ATUS hierarchical data to assign more variables to the individuals and activities outlined above. Specifically, we want to link activities using the RELATEW variable which is used to identify with whom an activity was done. This R code chunk is doing the following steps:

Reading IPUMS metadata:
Loads the IPUMS DDI (data description) XML file using read_ipums_ddi(). This file describes the structure and variables of the household microdata.
Reading IPUMS household microdata:
Loads the actual household-level data using read_ipums_micro() based on the structure defined in the DDI file. The resulting dataframe is called atus_hh.
Cleaning column names:
Standardizes the column names in the household data (atus_hh) for consistency and easier use.
Merging household data with individual-level ATUS data:
- Joins selected columns (caseid, actlinew, relatew) from atus_hh into the existing atus dataframe.
- Matches rows based on both caseid and actline (from the individual-level data) aligning with actlinew (from the household data).

In plain language:
This code reads in additional household-level data from IPUMS and merges it with individual time-use data, allowing each activity record to be linked with household relationship information.

## Use of data from IPUMS ATUS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.

Following this we create a function that is used to override data as we desire. This function takes in a care interval, age range, and column name to override it with established data. The reason we need this function is that we are not confident that the ATUS data alone can provide us with the necessary true outcomes without us using significant assumptions. For example, ATUS data does not look at respondents under the age of 15. Additionally, our methodology is likely somewhat biased for individuals at specifically high age ranges, such as those in their 80s. In these cases, we want to override the data with informed assumptions to improve the distribution. This R function is doing the following steps:

Defining a function:
It defines a function called prepare_overrides with three inputs:
- care_interval: a named list where each element corresponds to an age range and contains care-related interval data (e.g., time spent on care by type).
- age_ranges: a dataframe that maps age_range labels to specific ages.
- col_name: the name to give to the new output column created from the interval values.
Reshaping nested data into a dataframe:
- Uses lapply() to loop over each named age_range in care_interval.
- For each age range, it turns the nested data into a dataframe with:
  - age_range: the name of the group (e.g., “0–4”, “65+”)
  - care_focus: the specific type of care activity
  - interval: the numeric value (e.g., minutes per day)
- Combines all these small dataframes into one (care_override) using rbind.
Joining with actual ages:
- Joins the care_override dataframe with age_ranges to expand each age range into its individual ages (e.g., mapping “0–4” to 0, 1, 2, 3, 4).
- Uses relationship = "many-to-many" to allow for multiple age mappings per group.
Final formatting:
- Keeps only the age, care_focus, and interval columns.
- Renames the interval column to whatever is passed in via col_name.
Returning the result:
Returns a tidy dataframe that assigns the given col_name value to each age and care focus type.

In plain language:
This function transforms age-group-based care interval data into a long-form table that maps each individual age to a specific type of care activity and its associated value, labeling that value with a customizable column name.

Care

Now that we have all of the prep work done, we use the code below to assign care to different age groups. Our methodology for computing thisis not simple. At its core, the ATUS methodology provides information on care provision NOT the amount of care an individual needs. We need to be able to use this data to understand how much care a person needs. To do this, we look at cases where the amount of care supplied will be equal to the amount of care demanded. We filter the data for each age group to look only at individuals who live alone and in their entire day spend 0 total minutes providing care to children or elderly adults. We further limit the data to only those activities done alone.

This subset group of individuals are most likely to have a situation where care supply and care demand are equal within their home in the sense that all care for the individual is being done by the individual. These individuals are providing care only to themselves and receiving care only from themselves. As such, by measuring the amount of care provided by this subset we are also measuring the amount of care they might demand. Thus, we can use this methodology to create a measurement of estimated care for each age group. This R code chunk is doing the following steps:

Setting up age loop:
It creates a list of ages from the age_modified dataframe and initializes an empty list (needs_atus_calc) to store results.
Looping through each age: For each age a, it filters the atus dataset to identify individuals with the following characteristics:
- No time spent in secondary care activities (scc_all_ln and sec_all_ln are all 0)
- No time spent in child or elder care (child_care and elder_care are all 0)
- Activity is still labeled as care-related (care_focus != "non-care")
- Lives alone (hh_size == 1)
- Is the reference person in the household (relatew == 100)
- Falls within a 5-year age band around age a (ages a-2 to a+2)
Summarizing individual-level data:
- For each caseid and care_focus group:
  - Sums the total duration of care-related activities for each individual.
  - Keeps the first available weight value.
Calculating need interval estimates:
- Within each care_focus group, calculates the weighted mean duration across individuals to estimate how much care time is associated with that activity for that age group.
- Attaches the current age to the summary row.
Storing results:
- If the filtered and summarized data is not empty, it adds the result to the needs_atus_calc list for that age.
Combining all results:
- After looping through all ages, it binds the list of dataframes into a single dataframe (needs_atus_calc).

In plain language:
This code estimates how much time people living alone at each age (in 5-year bands) typically receive in care-related activities—under the assumption they don’t provide care to others. It uses this as a proxy for care needs across different age groups.

Finally, we use the override function that we defined above to input our assumptions. The code below inserts minutes for health, developmental, and daily care for the age groups of 0-5, 6-12, 13-17, 75-84, and over 85. These specific assumption values are based on informed thought processes as opposed to actual data. For instance, many state laws require that those under the age of 8 years old are supervised 24 hours a day. This R code chunk is doing the following steps:

Defining care time needs for specific age groups:
- It sets values (in minutes per day) for health-related and developmental care needs for specific age ranges.
- It then calculates the remaining time in the day allocated to daily living by subtracting health and developmental care from the total available time (usually 1440 minutes, or 24 hours—though some groups like teenagers or elderly have reduced totals, likely representing actual active time).
Creating age-to-age-range mapping:
- Builds a dataframe called age_ranges that maps specific ages to age range categories (e.g., age 0–5 is “age_0to5”, age 6–12 is “age_6to12”, etc.).
Organizing care need intervals:
- Constructs a list called need_interval where each entry corresponds to an age range and specifies the time (in minutes) needed per day for:
  - Developmental care
  - Health-related care
  - Daily living support
Converting structured input into a long-form dataframe:
- Calls the previously defined function prepare_overrides() to transform the structured need_interval list into a tidy dataframe (needs_ku_override) where:
  - Each row represents one age and one type of care focus (e.g., “developmental” for age 3)
  - Includes the time required (called need_override) for that type of care

In plain language:
This code defines how much care time people in specific age groups typically need each day for health, development, and daily living. It converts that information into a usable table that links each age to specific care needs—ready for analysis or visualization.

Care Provision

Now that we have our data on the average amount of care required within age groups, we replicate the above methodology to measure the average amount of time a group provides care in a day. The main difference between this code and the above is that we do not limit ourselves to only looking at individuals living alone. Instead, we look at all individuals in an age group and measure the average amount of time they spend in all care related activities. These activities could include selfcare, primary care to others, care in a job, and secondary care to a child or elder. This code analyzes on average, how much time a group spends on providing care. This R code chunk is doing the following steps: This R code chunk is doing the following steps:

Setting up for age-based analysis:
- Uses a list of individual ages (age) from age_modified.
- Initializes an empty list (provision_atus_calc) to store results.
Looping through each age:
For each age a, it creates a combined dataset (data) representing all types of care provision, pulling from the atus dataset.
Combining different types of care time:
- Formal care: Includes paid care-related work activities where focus is not "none".
- Informal care: Includes any activity already marked as care-related.
- Secondary child care:
  - Uses scc_all_ln (secondary child care).
  - Filters out people already counted as doing formal or informal care to avoid double-counting.
  - Labels the care as "developmental" and uses scc_all_ln as the care duration.
- Secondary elder care:
  - Same logic as child care but using sec_all_ln (secondary elder care).
  - Labels it as "health" care.
Filtering to 5-year age bands:
- Keeps only observations where the person is within 2 years of the target age a (i.e., ages a-2 to a+2).
Summarizing care time per individual:
- Groups by individual (caseid) and care type (care_focus), summing care durations and keeping the weight.
Calculating care provision estimates:
- Computes the weighted median (wtd.quantile()) of care time per care_focus type, across all individuals in that age band.
- Adds the current age to the result.
Storing results:
- If the resulting summary for that age is not empty, it adds it to the provision_atus_calc list.
Combining all results:
After looping through all ages, it merges the results into a single dataframe.

In plain language:
This code estimates how much care people of each age typically provide—including formal work, informal care, and secondary care (like watching kids or elders while doing something else). It uses weighted medians to summarize how care provision varies by age and type.

Just as with assumptions related to care needs for children and the elderly, we make assumptions for care provision. The code chunk below outlines the assumptions and utilizes the override function to create the values.

Finalize Care Economy Data

We now have our final data on care and care provision by group. We want to use a few more steps to finalize this data so it is ready for presentation. We start by combining the various data frames together into a single market datum file.

We then provide this same information as a plot to help visualise it.

One of the weaknesses of our methodology is that it is prone to outliers. We can see that in the above hump around the age of 54. This method is also highly reliant on our assumptions around the low and high ends. To help account for this, we utilize a smoothing function to help create a smoother density curve. We generally keep the smoothign function relativly weak to preserve as much as the underlying patterns as possible.

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `smoothed_need = predict(...)`.
## ℹ In group 1: `care_focus = "daily_living"`.
## Caused by warning:
## ! `cur_data()` was deprecated in dplyr 1.1.0.
## ℹ Please use `pick()` instead.

Our final step is next to merge this final market datum information with the data on age. We merge these two datasets together and export it for final analysis.

Providers of Care

The next section of statistics we calculate is our “care providers” section. This section aims to identify whom within society provides care. The main outcome of this section is to identify care providers across both informal and formal sectors of care and among different care focuses (developmental, daily living, and health).

Understanding who provides care within society is a cornerstone for analyzing and strengthening the care economy. Care takes place across a continuum of formal and informal settings, including professional healthcare and social service environments, as well as unpaid support provided by family, friends, and community members. Quantifying the individuals engaged in caregiving—particularly by gender and parenthood status—offers critical insight into how care responsibilities are distributed and how this distribution reflects broader social, economic, and policy structures. Without this demographic detail, policymakers risk overlooking the inequities embedded in caregiving systems and the hidden labor that sustains households and communities.

Gender and parenthood status are especially salient categories for measuring who provides care, as these factors strongly influence both the likelihood of providing care and the intensity of caregiving responsibilities both in the home and within formal sector care jobs. Women, for example, disproportionately shoulder unpaid caregiving work, whether as mothers, daughters, or other kin, contributing to persistent gender disparities in income, career advancement, and retirement security. Similarly, parents—especially single parents—may juggle dual care responsibilities for children and aging relatives, compounding their time and emotional burdens. By tracking the number of caregivers across these dimensions, researchers and policymakers can better understand the structural pressures facing different populations and design targeted interventions that support caregivers, such as paid leave policies, subsidized care services, or caregiver tax credits. Accurate demographic data on caregiving is essential not only for recognizing and valuing care work but also for fostering a more equitable and resilient care economy.

Formal Providers

The first section of code below looks at the formal care economy. The code chunk below loads in the CPS ASEC data which is our primary data source for understanding the formal economy. We load in this data and select the correct columns as needed below.

Following this, we calculate the amount of time spent and the number of people working in each specific care focus by each of our gender and parenthood combinations.

Following this, we calculate the amount of time spent and the number of people working in each specific care focus by each of our gender and parenthood combinations. This R code chunk is doing the following steps:

Creating a dataset of formal care providers:
- Filters the asec dataset to include only employed individuals.
- Creates new variables for categorization:
  - gender: lowercase version of the sex column.
  - provider_status: groups people into those with children, without children, or other, based on the gender_parent label.
  - time_use: labels work as “care” if it has a focus, and “non_care” otherwise.
  - care_focus: directly copies the focus variable.
  - care_type: assigns "formal" to indicate this is paid labor.
  - provider_attention: sets "active" to describe the type of care (vs. passive or secondary).
Summarizing formal care stats:
- Removes entries where usual hours worked (uhrsworkt) are coded as 997 (time use fluctuates).
- Replaces a placeholder value (999) in uhrsworkt with 0 to avoid inflating results from NILF.
- Summarizes the data by multiple grouping variables:
  - population: total weighted number of people in each group (using asecwt).
  - provision_interval: total daily minutes of formal care work, calculated as uhrsworkt / 7 * 60 * asecwt (i.e., average minutes worked per day, weighted).

In plain language:
This code prepares and summarizes data on people who provide formal (paid) care work. It calculates the total number of providers and how much care time they give per day, grouped by gender, parent status, and type of care provided.

Informal Providers

The next step is to replicate the above table but to look at informal activities as opposed to only formal activities. To do this we use the ATUS data for informal care activities.

We now use the code below to calculate the number of people and the amount of time spent by these providers for each of the informal care status. For secondary child care, we include it as developmental care. For secondary elder care, we include it as health care. This R code chunk is doing the following steps:

Get the most recent 5-year time window:

yr_range <- atus_yr_range(atus) |> 
  filter(year == max(year))

Uses a helper function atus_yr_range() to get a range of valid years.
Keeps only the entry for the most recent year, likely as a reference for filtering.

Prepare informal care data:

cp_informal <- atus |> 
  pivot_longer(...) |> 
  filter(!is.na(duration))

Reshapes the atus dataset so that columns duration, scc_all_ln (secondary child care), and sec_all_ln (secondary elder care) are stacked into a long format.
This allows consistent handling of different types of care time.
Removes rows with missing duration values.

Create standardized variables for app use:

cp_informal <- cp_informal |> 
  mutate(...)

Creates several new columns to match what the app expects:
- gender: lowercase version of respondent’s sex.
- provider_status: identifies if a person has children or not, based on gender_parent.
- care_type: all entries here are labeled as "informal" (unpaid or household-based care).
- provider_attention: labels care as "active" or passive ("passive_child" or "passive_elder") depending on the source of the care time.
- care_focus: sets the type of care (e.g., "developmental", "health", or "none").
- time_use: categorizes each activity as "care" or "non_care" based on care_focus.
- weight: adjusts the person-level weight (wt06) to a daily scale.

4. Summarize time by person and activity:

cp_informal <- cp_informal |> 
  summarise(...) |> 
  summarise(...)

First summarization:
- Groups by each person (caseid) and activity category to calculate their total time in each care type.
- Multiplies time by daily weights.
Second summarization:
- Aggregates across individuals to get:
  - provision_interval: total care time in minutes, scaled to a 5-year average.
  - population: the weighted count of individuals in each group.
- Grouped by gender, parental status, care type, care focus, and whether the care was active or passive.

In plain language:
This code organizes and summarizes all types of informal care (including active and passive care) across recent years. It groups people by gender and parenting status and calculates how much time, on average, they spend providing different types of care.

Care Provider Demographics

The next step is to create a demographics table for our care providers. Specifically for The Care Board, we are interested in the associated between gender and parenthood status. We use the code below to create a table showing the distribution of this demographic across society. This R code chunk is doing the following steps:

Preparing demographic categories:
- Takes the asec dataset and creates two new classification variables:
  - gender: converts the sex variable to lowercase for consistency.
  - provider_status: categorizes individuals as:
    - "with_children" if they’re labeled as mothers or fathers,
    - "without_children" if labeled as non-mothers or non-fathers,
    - "other" for anyone else (e.g., ambiguous or missing category).
Calculating population totals:
- Groups the data by gender and provider_status.
- Sums the survey weights (asecwt) to estimate the total population size of each group.
Sorting results:
- Arranges the resulting summary by gender and provider_status for easier reading or display.

In plain language:
This code estimates the size of different demographic groups (by gender and parental status) using survey weights, providing a breakdown of how many people fall into each category.

Activity Time data

Collecting data on specific care-related activities across both the formal and informal sectors is vital for a comprehensive understanding of the care economy. By disaggregating care work into its constituent activities—such as nursing, home health assistance, cleaning, teaching, feeding, or emotional support—we gain a clearer picture of the full range and diversity of labor that underpins human well-being and social reproduction. Calculating metrics like the average time spent on each care activity and the corresponding wages earned allows us to evaluate not only the volume of care being provided, but also the economic value placed on different forms of care. This granular approach is particularly important in revealing the undervaluation of essential work, much of which is disproportionately performed by women, people of color, and immigrants.

Detailed activity-level data is also critical for crafting targeted and effective policy interventions. It allows researchers and decision-makers to identify gaps in compensation, exposure to physical and emotional strain, or mismatches between time demands and available support systems. For example, data showing long hours in unpaid eldercare alongside low wages for professional caregivers may point to the need for investment in long-term care infrastructure or wage floors in care occupations. Similarly, tracking time spent on caregiving tasks in educational or domestic contexts can highlight the blurred lines between formal employment and informal labor. Without this specificity, the care economy remains abstract, obscuring the real conditions of care workers and the needs of care recipients. Collecting and analyzing this level of detail helps make visible the full scope of care labor and strengthens the foundation for equitable and sustainable care systems.

Formal Care Activities

The first step is to look at the formal care economy and to gather the needed data for it. Like most analysis of the formal care economy, this section uses CPS ASEC data. The first step is thus to load in the CPS ASEC data and manage it to the proper format. This R code chunk is doing the following steps:

Loading cleaned ASEC data:
- Reads in an .csv file (ASECdata.csv) into the asec dataframe.
- Filters the data to include only:
  - Adults (age 18 or older),
  - Data from the most recent year available,
  - People working in care-related occupations (occ_care_focus != "none"),
  - People who are employed (empstat == "Employed").
- Keeps only selected columns related to occupation, income, hours worked, and weights.
Cleaning and transforming variables:
- Replaces the placeholder value 999 in uhrsworkt (usual hours worked) with 0.
- Creates a new activity_id:
  - Based on the occ_label (occupation label), it replaces non-alphanumeric characters with hyphens and converts the result to lowercase.
  - Then it removes any trailing hyphen from the resulting string.
- Renames:
  - label → name (likely for cleaner display or consistency),
  - focus → care_focus.

In plain language:
This code loads and filters survey data to keep only employed adults working in care occupations, cleans up the occupation labels for use as unique identifiers, and standardizes some column names for further analysis or visualization.

After prepping the data, we calculate three main statistics for each formal care economy activity. First, we calculate the amount of people who engaged in this activity during a day. Second, we calculate the total amount of time spent across this activity throughout society. Third, we calculate the median wage for people employed in this activity in society. This R code chunk is doing the following steps:

Calculate population by care occupation:

act_formal_population <- asec |> 
  summarise(
    population = sum(asecwt),
    .by = c(activity_id, name, care_focus)
  )

Sums the survey weights (asecwt) to estimate the total number of employed individuals working in each care-related occupation.
Groups by:
- activity_id (a cleaned-up identifier for the occupation),
- name (the occupation label),
- care_focus (type of care: developmental, health, etc.).

Calculate time spent in formal care work:

act_formal_time <- asec |> 
  filter(uhrsworkt != 997) |> 
  summarise(...)

Removes entries with invalid or missing work hours (uhrsworkt == 997).
Calculates the total number of daily minutes worked per occupation group using:
- asecwt * uhrsworkt * 60 / 7 (to convert weekly hours to daily minutes).
Groups by the same occupation-related columns.

Calculate median wage by occupation:

act_formal_med_wage <- asec |>
  filter(incwage != 0 & incwage != 99999999) |> 
  summarise(...)

Filters out invalid or placeholder income values.
Computes the weighted median wage (wtd.quantile) for each occupation group.
Again groups by activity_id, name, and care_focus.

Combine all formal care occupation stats:

act_formal_stats <- full_join(...) |> 
  full_join(...) |> 
  arrange(activity_id)

Merges the three separate summaries (population, time, and median_wage) into a single table (act_formal_stats) using full joins.
Ensures that all occupation groups are included, even if one of the summaries is missing.
Sorts the final result by activity_id.

In plain language:
This code summarizes key statistics for each formal care occupation, including how many people work in it, how much time they spend on care work per day, and what the median wage is. The result is a comprehensive dataset ready for reporting or visualization.

We provide a few descriptive statistics below to show the total numbers of the final results.

## [1] "Number of formal care workers: 47,601,123"

## [1] "Total daily hours per day: 239,572,357"

## [1] "Average care hours per day per worker: 5"

We also provide below a plot showing specific statistics related to specific occupations.

Informal Care Activities

After compiling the data on the formal care economy, we calculate the data desired for the informal care economy. Like other times when analyzing the informal care economy, we use the ATUS data. The first step is thus to load in and correctly format the data. This R code chunk is doing the following steps:

Loading and filtering ATUS data:

atus <- read.csv(...) |> 
  filter(activity != "Formal Work") |>   
  filter(YEAR >= 2018 & YEAR != 2020) |> 
  filter(AGE >= 18) |> 
  select(...) |> 
  clean_names()

Reads time-use survey data (ATUSdata.csv) into a dataframe called atus.
Filters out:
- Any records labeled as "Formal Work" (keeping only non-paid activities).
- The year 2020 (often excluded due to COVID-related disruptions).
- Years before 2018.
- Individuals under age 18.
Selects only relevant columns related to time use and care activity.
Cleans column names to a consistent lowercase format with underscores.

Determine the most recent 5-year window:

yr_range <- atus_yr_range(atus) |> 
  filter(year == max(year))

Uses a helper function atus_yr_range() to get a 5-year rolling window.
Selects the most recent year in that window, storing both the start and end years.

Filter and transform the dataset:

atus <- atus |> 
  filter(year >= yr_range$yr_start & year <= yr_range$year) |> 
  rename(...) |> 
  mutate(...)

Keeps only data within the most recent 5-year range.
Renames:
- act_care_focus → care_focus
- activity_2 (presumably a derived activity name column) → activity_name
Creates new variables:
- activity_name: Sets this to "non-care" if care_focus is "non-care", otherwise keeps the original activity name.
- activity_id: Cleans up activity_name to be lowercase and hyphen-separated (removing special characters), and removes trailing hyphens. This creates a consistent identifier for each activity.
- weight: Converts annual person weights (wt06) to a daily average over a 5-year period.

In plain language:
This code prepares recent time-use data for analysis by cleaning, filtering, and standardizing activity names. It focuses on unpaid or informal care activities and creates identifiers and daily weights for later use in summaries or visualizations.

We then calculate selected statistics for the informal care economy activities. To start with we calculate the number of people who engage in each activity. Then we calculate the time spent across the population in each category. This R code chunk is doing the following steps:

1. Create case_stats: time-use totals per person and care activity

case_stats <- bind_rows(...)

It builds a combined dataset of individual-level care time from three sources:

Primary care activities:

Sums duration for each individual (caseid) by:
- activity_id
- activity_name
- care_focus
- Includes their weight for later aggregation

Secondary child care:

Creates synthetic activity labels:
- Sets activity_id to "secondary-childcare"
- Sets activity_name to "Secondary Childcare"
- Labels the care_focus as "developmental"
Sums scc_all_ln (secondary child care minutes) by individual
Keeps only those with positive time spent

Secondary elder care:

Similar to child care, but:
- Uses sec_all_ln
- Sets activity_id to "secondary-eldercare"
- Sets care_focus to "health"
Again, keeps only those with positive time spent

These three blocks are stacked together using bind_rows() to create a single long-form dataset of care activity time by individual.

2. Create activity_stats: aggregate care provision stats

activity_stats <- case_stats |> summarise(...) |> filter(...) |> arrange(...)

Aggregates case_stats to the activity level, calculating:
- provision_interval: total weighted time spent on that activity across all people
- population: total weighted number of people represented
Groups by:
- activity_id, activity_name, and care_focus
Filters out "non-care" activities to focus on actual caregiving
Sorts the results by activity_id

In plain language:
This code calculates how much time people spend on different types of unpaid or informal caregiving activities—including both direct and secondary care—and summarizes it by activity type. It produces a table that shows total care minutes and population size for each category.

Just as with the formal care economy, we want to add a column this this section for the median wages. However, people working in the informal care economy do not earn a wage. To solve for this, we create a shadow wage or “expected income” for each activity. A shadow wage represents the roughly equivalent wage that this activity earns when done in the formal care economy as a full-time, year-round job. The creation of this shadow wage or “expected income” can be useful for us comparing tasks between the formal and informal care economy.

In order to create this shadow wage or “expected income,” we need to pair activities in the formal and informal economy. We use a crosswalk that pairs activity codes with occupation codes. This crosswalk can be found below.

We then load in a new version of the CPS ASEC data which we will use to find wages associated with the cross walked activities.

Finally, we use the crosswalk along with the CPS ASEC data to assign a wage for each of the informal care activities. This R code chunk is doing the following steps:

1. Initialize storage and loop over activity types

df <- list()
activity <- act_cross$activity

Prepares an empty list df to hold results.
Extracts the list of unique activity names from act_cross, a lookup table that likely maps each care activity to an occupational code range.

2. Loop through each activity and calculate median wage

for(sel_activity in activity) {
  ...
}

For each care activity (sel_activity), it:

Gets the occupational code range:

codes <- act_cross |> filter(activity == sel_activity)

Looks up the start and end OCC2010 codes associated with the current activity.

Filters the asec data to those occupations:

filter(occ2010 >= codes$occ_code_start & occ2010 <= codes$occ_code_end)

Selects only individuals working in occupations within the defined range for that activity.

Calculates the median wage:

summarise(
  median_wage = wtd.quantile(incwage, weights = asecwt, probs = 0.5)
)

Uses weighted quantiles to compute the median income (incwage) for those workers, using survey weights (asecwt).

Labels the result:

mutate(activity_name = sel_activity) |> 
relocate(activity_name)

Adds the name of the activity to the result and moves it to the front of the dataframe.

Stores the result in the list:

df[[sel_activity]] <- ...

3. Combine all results into one dataframe

median_wage <- bind_rows(df)

Merges all the individual activity median wage summaries into one dataframe.

In plain language:
This code calculates the median wage for each care activity type by mapping activities to occupation codes, filtering for those jobs in the dataset, and summarizing income data. The result is a table showing what workers in each type of care activity typically earn.

Finally, we combine the different data together, write it to the app, and prepare it for download.

Care Gini Coefficient

The Gini coefficient is a widely used statistical measure of inequality within a distribution, commonly applied to income or wealth. It ranges from 0 to 1, where 0 represents perfect equality (everyone has the same amount) and 1 indicates perfect inequality (one person has everything, and everyone else has nothing). The Gini coefficient is often visualized through a Lorenz curve, which plots the cumulative share of a resource (like income or jobs) against the cumulative share of the population. The further the Lorenz curve deviates from the line of perfect equality, the higher the Gini coefficient, and the more unequal the distribution.

In the context of the care economy, the Gini coefficient offers a powerful lens for understanding the geographic or demographic distribution of care-related jobs (e.g., childcare workers, home health aides, elder care providers) relative to the population in need of care (such as young children, elderly individuals, or people with disabilities). A low Gini coefficient in this setting would suggest that care jobs are relatively evenly distributed among communities based on their level of need, indicating a more equitable alignment of service availability. Conversely, a high Gini coefficient implies that care jobs are concentrated in certain areas or populations, leaving other high-need areas underserved. This kind of analysis is especially important for identifying care deserts—areas with a high demand for care services but few available workers—so policymakers and planners can better target resources and interventions to reduce inequality and improve access to essential care. The Gini Coefficient of Formal Care measures how formal care jobs are geographically distributed across the U.S. among those individuals at risk of needing care. A Gini Coefficient of 0 would indicate perfect equality, meaning formal care jobs are evenly located in areas where at risk individuals reside. A higher Gini Coefficient signals a greater mismatch in the availability of care by location. A Gini Coefficient usually measures income inequality, but here we use it to measure spatial inequality of care services to the population most at-risk of needing those services.

To calculate the GINI coefficient, we use county-level data on employment in the care economy coupled with the distribution of population. We start by loading in data tables obtained from the U.S. Census Bureau and data from the Quarterly Census of Employment and Wages (QCEW). Both data sources include county-level data.

After creating these statistics, we can calculate the GINI coefficient through these data utilize the gini function previously loaded in.

Care Ratio

The Care Ratio is a novel demographic and economic metric designed to quantify balance between care providers and care recipients within a given population. Building on the logic of a traditional dependency ratio used in demographic studies, the Care Ratio advances the concept by incorporating both the diversity of caregiving roles and the differentiated needs of care recipients. The numerator represents the population of potential caregivers, stratified and weighted according to their caregiving contributions. This includes formal sector care workers, unpaid caregivers such as homemakers, and a residual category of individuals not formally engaged in care but who may nonetheless contribute to informal care networks. The denominator consists of the at-risk care-dependent population, including children, the elderly, and individuals with disabilities—each subgroup weighted by the intensity or frequency of care they typically require. This framework provides a more nuanced picture of the care landscape than traditional economic or demographic indicators.

The Care Ratio is critical for the economics of care and related demographic analyses because it brings into focus the structural balance—or imbalance—between those who provide care and those who depend on it. In an era of aging populations, declining fertility rates, and shifting labor market dynamics, the burden of care is increasingly a central challenge for societies. The Care Ratio offers a standardized, comparative tool that can be used to assess the sustainability of care systems, identify regions or groups at risk of care deficits, and inform social policy aimed at redistributing care labor more equitably. By moving beyond simple counts and integrating the complexity of care work and need, this measure helps bridge gaps between demographic modeling, social policy, and lived experience, providing a foundation for developing more responsive and equitable care infrastructures.

We start by loading in a variety of datasets related to age, disability, and employment by county.

We then load in data from previous steps to understand the distribution of need and provision of these groups. This comes in part from the CPS Monthly data which has a specific variable to code homemakers and in part from the market datum, which was calculated above and discusses care need and provision by age.

The first thing we do is calculate the numerator, which represents the population of weighted care providers. The code below provides the methods used to create this numerator. This R code chunk is doing the following steps:

1. Initial setup:

Denominators = {}
Years = {}

Initializes two empty lists to store:
- Denominators: calculated need-based population values for each year
- Years: the corresponding years

2. Looping over each year in YEARS: For each year yr, the script performs the following operations:

3. Filter relevant data for that year:

ages_temp <- ages |> filter(year == yr)
disability_temp <- disability |> filter(Year == yr)

Gets the age group population counts and disability statistics specific to the current year.

4. Get population counts and average weights for each care-needing group:

Children under 5:

Gets total population (under5) and average care weight (under5_W)

Children aged 5–13

Same logic: total count and weight for this age range.

Teens aged 14–17

Collects values but notably does not use them in the final Denom (might be unused or omitted intentionally).

Older adults:

65–69, 70–74, and 75+ groups are handled separately:
- Sums the total populations
- Gets mean weights for each age bracket

Disabled populations:

Sums counts of:
- Children with disabilities (DisabUnd18)
- Adults with disabilities (DisabAdult)
- Elderly with disabilities (DisabElder)
For each, calculates a corresponding average care weight from market_datum
Adds 1 to each average weight (possibly to ensure no zero values or to buffer the care burden)

Calculate the overall denominator**:

Denom = under5*under5_W + 
        five_thirteen*five_thirteen_W + 
        sixtyfive_sixtynine*sixtyfive_sixtynine_W + 
        seventy_seventyfour*seventy_seventyfour_W + 
        seventyfive_plus*seventyfive_plus_W +
        child_disabled*child_disabled_W + 
        adult_disabled*adult_disabled_W +
        elder_disabled*elder_disabled_W

This is a weighted sum that reflects total care demand across age groups and disability types, scaled by an average care weight per group.

Store results**:

Denominators = append(Denominators, Denom)
Years = append(Years, yr)

Adds the calculated value and corresponding year to the tracking lists.

In plain language:
This code calculates a year-by-year estimate of total care need in the population by summing weighted population counts across key age and disability groups. It assigns higher weight to those expected to need more care (like young children or disabled individuals) and stores the results for use in later analyses or visualizations.

## [1] 277389077 276814969 278836176 280975623

Now that we have the denominators we need to calculate the numerators. The code below is utilized to do this. This R code chunk is doing the following steps:

1. Setup:

Numerators = {}
Years = {}
W = c(1.5, 0.5, 1)

Initializes empty lists:
- Numerators to store the total care provision capacity for each year.
- Years to track which year each value corresponds to.
Defines a vector of weights W:
- 1.5 for formal care workers
- 0.5 for non-care industry workers
- 1 for homemakers
  These weights reflect relative care-providing potential by group.

2. Loop through each year in YEARS: For each year yr, it performs:

3. Filter data to that year:

ages_temp <- ages %>% filter(year == yr)
disability_temp <- disability %>% filter(Year == yr)
formal_temp <- formalsector %>% filter(year == yr)
cps_temp <- data %>% filter(YEAR == yr)

Gets age estimates, disability counts, formal sector employment, and CPS data for the selected year.

4. Estimate total population:

population <- sum(ages_temp$POPESTIMATE)

Totals the population estimate for that year.

5. Categorize and count care-relevant workforce:

Formal care workers:

careworkers <- formal_temp %>%
  filter(industry_code != 10)
careworkers <- sum(careworkers$IndustryEmployment)

Includes anyone not in industry code 10 (which likely represents “non-care”).

Non-care industry workers:

workingnoncare <- formal_temp %>%
  filter(industry_code == 10)
workingnoncare <- sum(workingnoncare$IndustryEmployment) - careworkers

Gets the total number of non-care workers.
Subtracts careworkers to avoid double-counting.

Homemakers (unpaid labor):

Homemakers <- sum(cps_temp$WTFINL / length(unique(cps_temp$month)))

Sums person weights across all months and averages them, estimating the number of homemakers (likely using a pre-filtered cps_temp dataset that only includes them).

6. Calculate and store care supply:

Numer <- careworkers*W[1] + workingnoncare*W[2] + Homemakers*W[3]
Numerators <- append(Numerators, Numer)
Years <- append(Years, yr)

Multiplies each group count by its weight to compute a weighted care provision score for that year.
Appends the result to Numerators and the year to Years.

In plain language:
This code estimates the supply of potential caregivers in each year by counting formal care workers, other workers, and homemakers—then weighting them based on how much care they are assumed to provide. It builds a year-by-year summary of care capacity across the workforce and households.

## [1] 112386298 107788401 113401382 115482838

We finalize the Care Ratio by dividing the numerators over the denominators. This gives us the final Care Ratio for export.

The Sandwhich Generation

The Sandwich Generation refers to a group of adults who are simultaneously caring for their own children while also providing care or support to aging parents. This dual responsibility places unique emotional, financial, and time burdens on caregivers, often leading to stress, work-life conflict, and economic strain. In the economics of care, the sandwich generation exemplifies how unpaid and invisible care labor supports the functioning of both the family and broader society. Understanding this is crucial as demographic shifts, such as increased life expectancy and delayed childbearing, intensify these care demands.

Measuring the size and characteristics of the sandwich generation is essential for informing public policy, labor protections, and social support systems. Capturing accurate numbers and understanding the demographic profile of this group, such as gender, income, employment status, and race/ethnicity, can help reveal the hidden costs of informal care and shape interventions that better support multigenerational caregivers. Recognizing their role is vital, not just for their wellbeing, but also for sustaining the broader care economy.

This section provides the code and outcomes used to provide the statistics related to the care economy. To understand the sandwich generation, we use ATUS data. We only use years after 2010 because the year 2011 is the first year to ask questions about secondary elder care activities. The chunk below loads in the data and selects the requested variables.

After loading in the data, we need to identify sandwiched individuals. To do this we calculate the total amount of time that each individual with a child under age 10 spends in elder care. We consider someone to be sandwiched in the case where the following conditions are met.

They are aged 18 or older.
They have an own child living with them aged 10 or younger.
They spend at least 1 minute in their day providing care to an elderly household member.

These assumptions provide a conservative estimate by limiting the count of people in the sandwich generation to not include individuals with older or adult children. Inclusion of these groups would lead to larger estimates of the size of the sandwich generation.

## `summarise()` has grouped output by 'year', 'date', 'caseid', 'wt06', 'yngch'.
## You can override using the `.groups` argument.

Following this we calculate the data needed. This data is specifically the count and proportion of individuals who are labeled as “sandwiched.” The code below uses a 5-year rolling average method to move through the ATUS data. ATUS has sample sizes that are small enough that they become difficult to subset at specific demographics in years. To get around this, it is standard practice to use 5-year rolling averages.

For each five-year group, this code calculates the population of individuals, the total time spent providing care across the population, and the weighted median of care provision across the sandwiched individuals.

Care Labor Force Broad Indicators Prep

The Bureau of Labor Statistics (BLS) releases monthly labor force statistics through the Current Population Survey (CPS), a nationally representative household survey conducted in collaboration with the U.S. Census Bureau. These statistics offer a timely and comprehensive overview of employment, unemployment, labor force participation, and other key indicators that shape our understanding of the U.S. economy. Policymakers, researchers, and the public rely on these data to assess economic conditions, identify trends, and inform decisions ranging from interest rate adjustments to workforce development initiatives. Monthly updates facilitate the close monitoring of labor market dynamics, enabling early detection of economic downturns or recoveries.

In its monthly reports, the BLS provides detailed labor force participation tables disaggregated by variables such as sex and race. Building on this framework, our analysis extends the focus to include parenthood status and employment in care-related industries. We further examine labor force participation not only within the formal care economy but also across informal care roles, capturing a more comprehensive view of care giving and its impact on labor dynamics.

Prepping Economy Data

In the broad indicators section, we conclude by examining the value of the care economy in relation to the U.S. gross domestic product (GDP). To do so, we draw on GDP data from the Federal Reserve Bank of St. Louis’s online dashboard (FRED: https://fred.stlouisfed.org/). The dataset used reports annual GDP figures for the entire U.S. economy, expressed in billions of dollars. This R code chunk is doing the following steps:

Setting a minimum year:
It defines a variable min_year and sets it to 1994. This will be used later to filter the data.
Loading GDP data:
It reads the CSV file called FYGDP.csv into a dataframe called us_gdp.
Cleaning column names:
It standardizes the column names to a consistent, tidy format (e.g., lowercase with underscores).
Creating and modifying columns:
- date: Extracts the year from the observation_date column and converts it into a Date format representing January 1st of that year.
- fygdp: Converts GDP figures to actual dollar amounts by multiplying by 1 billion.
- gdp_daily: Calculates an estimated daily GDP by dividing the annual GDP by 365.
Filtering by year:
It keeps only the rows where the year of the date is greater than or equal to 1994.
Selecting relevant columns:
It keeps only the date and gdp_daily columns for display.
Displaying the data as a table:
It uses datatable() to show an interactive table of us_gdp, with options for:
- Showing 10 rows per page
- Enabling horizontal scrolling
- Automatically adjusting column widths
  The table is given the caption “Table: US GDP By Year” and excludes row names.

In plain language:
This code reads in U.S. GDP data, adjusts it to reflect daily values, filters for years from 1994 onward, and displays it in an interactive table format for easy viewing.

Prepping CPS ASEC Data

In this section we prepare CPS ASEC data. We will use this set of CPS ASEC data for the remainder of the analysis representing all stats in the broad indicator data. The code below is used to load in the data and structure it as needed for analysis. This R code chunk is doing the following steps:

Loading data: It reads a CSV file called ASECdata.csv into a dataframe named asec.
Filtering for valid responses: It keeps only the rows where HFLAG equals 1 or is missing. This accounts for a survey redesign starting in 2014, ensuring consistent data.
Filtering by year: It keeps only the data from a specific range of years, starting from min_year (which is currently defined as 1994 labeled above).
Filtering by age: It keeps only people between aged 18 and over.
Selecting variables: It keeps only the listed columns, which are relevant to the analysis (like year, wages, employment status, education, etc.).
Cleaning column names: It standardizes column names to a consistent format (usually lowercase with underscores).
Creating new variables:
- date: Adds a new column that sets the date to January 1st of the given year.
- uhrsworkt: Replaces a special code (999, meaning NILF) with 0 in the work hours column.
- occ_type: Classifies each person as working in “care” or “non-care” based on the focus variable.
- overall: Adds a column with the same value “overall” for every row, to simplify grouping and plotting later.

In plain language:
This code prepares and cleans survey data, keeping working-age adults, dealing with some inconsistencies, and creating new variables to distinguish between care and non-care occupations.

Data summary
Name	asec
Number of rows	3904352
Number of columns	10
_______________________
Column type frequency:
character	5
Date	1
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
empstat	1	4	12	4
occ_care_focus	1	4	13	4
gender_parent	1	5	11	5
occ_type	1	4	8	2
overall	1	7	7	1

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date	0	1	1994-01-01	2024-01-01	2009-01-01	31

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	1	2009.20	8.49	1994	2002.00	2009.00	2016.00	2024.00	▆▇▇▆▆
asecwt	1	1795.76	1234.61	0	930.65	1625.94	2394.35	44423.83	▇▁▁▁▁
uhrsworkt	1	63.96	193.60	0	0.00	37.00	40.00	997.00	▇▁▁▁▁
incwage	1	28027.18	50059.09	0	0.00	14560.00	40000.00	2099999.00	▇▁▁▁▁

Prepping ATUS Data

In this section we prepare the ATUS data. We will use this set of CPS ASEC data for the remainder of the analysis. The code below is used to load in the data and structure it as needed for analysis representing all stats in the broad indicator data. This R code chunk is doing the following steps:

This R code chunk is doing the following steps:

Loading data:
It reads the CSV file called ATUSdata.csv into a dataframe called atus.
Filtering data:
It keeps only:
- Rows where the activity is not “Formal Work”
- People aged 18 or older
- Years other than 2020 (Removed due to pandemic-related anomalies)
Selecting variables:
It keeps only a specific set of columns related to time use, demographics, and care-related activities.
Cleaning column names:
It standardizes column names to a consistent format (e.g., lowercase with underscores).
Reshaping the data:
It transforms three columns (duration, scc_all_ln, and sec_all_ln) into a long format so that they all go into a single duration column with an associated metric label.
Handling missing data and categorizing activities:
- Replaces missing values in duration with 0.
- Creates a new care_flag column that labels each row as either “care” or “non-care” based on the type of time-use metric or whether the activity was marked as care-related.
- Adds an “overall” column with the same value (“overall”) for grouping purposes.
Summarizing individual-level data:
It creates a new dataframe case_year that:
- Converts annual weights to daily weights by dividing by 365.
- Aggregates total time spent on care and non-care activities for each person per year.
- Adds the same “overall” label to this summary dataset.
Defining the time range:
It uses a custom function atus_yr_range(), defined above, to calculate the range of years represented in the dataset.

In plain language:
This code prepares and summarizes time-use data by filtering adult respondents, reshaping and cleaning time records, flagging care-related activities, and calculating total time spent on care versus non-care activities for each person and year.

Data summary
Name	case_year
Number of rows	454094
Number of columns	7
_______________________
Column type frequency:
character	3
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
gender_parent	1	5	11	5
care_flag	1	4	8	2
overall	1	7	7	1

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	1	2011.56	5.93	2003.00	2006.00	2011.00	2016.00	2023.0	▇▆▅▅▃
caseid	1	20116272616793.08	59325034611.31	20030100013280.00	20061111062480.75	20110706112012.00	20160808162075.00	20231212232280.0	▇▇▆▆▃
weight	1	20688.95	19980.40	1149.24	8367.67	14926.71	25419.15	572630.2	▇▁▁▁▁
total_time	1	694.77	433.82	0.00	270.00	750.00	1060.00	2827.0	▇▇▃▁▁

Labor Force Participation by Care Giver Status

While demographic factors such as sex are essential to labor force analysis, further disaggregating data by parenthood status offers deeper insight into how caregiving responsibilities intersect with employment. Parenthood—particularly in the context of childrearing—can significantly influence labor force participation, working hours, and occupational choices. For example, mothers may reduce their working hours or leave the labor force entirely due to caregiving demands, while fathers may face societal expectations to maintain or increase their labor force involvement. Conversely, some mothers, particularly those from lower socio-economic backgrounds, may be compelled to enter or remain in the workforce after childbirth to help meet caregiving needs. Disaggregating labor force data by both sex and parenthood status reveals these nuanced dynamics, enabling policymakers and researchers to design more targeted interventions that support families, promote gender equity, and illuminate the structural forces shaping labor market outcomes.

Data summary
Name	cps
Number of rows	42083505
Number of columns	6
_______________________
Column type frequency:
character	2
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
month	0	1	3	9	0	12	0
gender_parent	0	1	5	11	0	5	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	1	2006.41	9.88	1990	1998.00	2006.00	2015.00	2024.00	▇▇▇▇▆
age	1	45.79	18.40	16	31.00	44.00	60.00	90.00	▇▇▇▅▂
wtfinl	1	2272.64	1315.15	0	1207.62	2225.02	3169.79	34716.21	▇▁▁▁▁
empstat	1	19.06	11.69	0	10.00	10.00	34.00	36.00	▁▇▁▁▅

## `summarise()` has grouped output by 'year', 'month'. You can override using the
## `.groups` argument.

Formal Care Force

The formal care force is defined as the individuals working in a paid job that has been identified as part of the care economy. Jobs such as nurses, teachers, and janitors all work to provide services which are considered to be care giving at home and thus we code as care jobs in the formal economy. The formal care force statistics on the dashboard provide the overall and gender/parenthood count of individuals in the formal care economy.

The code below uses the CPS ASEC yearly data from the CPS survey’s March supplement. This code creates a function where when a certain variable representing the demographic group is inserted, the count and percent of that demographic group are exported. The proportions are relative to within each group. Overall, as a proportion thus refers to the proportion of all individuals but within a demographic proportion refers to the percent of, for example fathers, who work in the care force.

The code below creates a function that will be used to input a demographic and find the needed statistics. This R function is doing the following steps:

Defining a function:
It creates a function called get_formal_lfp that takes two inputs:
- df: a dataframe (asec for this purpose)
- demo_group: a variable representing the demographic category to group by (e.g., race, gender, education)
Preparing the data:
- It uses the asec dataset (or the passed df).
- Creates two new columns:
  - category_id: stores the name of the demographic group variable passed in.
  - subcategory_id: stores the actual value of that demographic variable for each person.
Grouping the data:
It groups the data by date, the demographic group name (category_id), and each group’s value (subcategory_id).
Summarizing labor force participation:
- formal_care_labor_force: Calculates the weighted number of people employed in care occupations using the person-level weight (asecwt).
- formal_care_force_proportion: Divides the care labor force total by the total weighted population in the group to get a proportion.
Returning the results:
It returns the summarized dataframe, showing the number and proportion of employed care workers for each demographic group over time.

In plain language:
This function calculates the size and share of the formal care labor force across different demographic groups using employment and occupation data. It outputs totals and proportions by group and year.

The table below presents the table, as it is used to feed the care board.

## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.

Informal Care Force

To estimate the number of individuals participating in the informal care workforce, we use data from the ATUS. The function below takes a specified demographic group and calculates the number of individuals within that group engaged in informal labor. For the purposes of this analysis, we define informal labor force participation as providing three or more hours of unpaid care per day. This threshold aligns with the Bureau of Labor Statistics’ criteria for classifying individuals as unpaid family workers on family farms. By adopting this definition, we establish a consistent framework for identifying and analyzing unpaid family care workers within the broader labor force. This R function is doing the following steps:

Defining a function:
It creates a function called get_informal_lfp that takes one input:
- demo_group: the name of a demographic variable (e.g., race, gender, etc.) to group by.
Setting up for looping over years:
- It initializes an empty list called informal_lfp to store results for each year.
- Sets a counter i to track the index during the loop.
Looping through each year in the dataset:
For each year in the dataset (sel_year), using year ranges from yr_range:
- Defines year_min, which sets the lower bound for the moving 5-year window (used in the analysis).
Calculating total population in the group:
- Filters atus to include data from the 5-year window ending in sel_year.
- Keeps distinct individuals with their weights and demographic values.
- Calculates the total population by summing weights (converted to daily and averaged over 5 years).
Identifying informal caregivers:
- Filters case_year for the same 5-year window.
- Keeps only those doing more than 3 hours of care work per day.
- Groups by the demographic variable and calculates the size of the informal care labor force (again, using 5-year averaging of daily weights).
Combining the two datasets:
- Joins the care labor force data with the total population for the group.
- Adds a date column for that year (January 1st).
- Calculates the proportion of the population doing informal caregiving.
- Stores the result in the list for the given year.
Combining and returning results:
After looping through all years, it combines all yearly dataframes into one (df) and returns it.

In plain language:
This function calculates the share of the population doing significant amounts of informal care work (3+ hours/day) within each demographic group over time, using a rolling 5-year window. It returns a dataset showing how informal caregiving participation changes across groups and years.

Just like with formal, we start by presenting this data as an overall and by gender/parenthood estimation.

Time spent working in the Care Economy

The second major broad indicator data created for the Care Board represents the time spent by the above workers in the Care Economy. We utilize both the CPS ASEC and ATUS data to calculate the total number of minutes in a day that workers spend providing care. These statistics are useful to show the size of the care economy work relative to other aspects of the formal economy and informal time use. It is possible that some people are doing care activities during work hours (e.g. secondary care or washing clothes while teleworking). To the extent this happens, and since we are using multiple sources of data to estimate daily care activities, it is possible our estimates are inflated as we may be double counting time spent multitasking. This may be more prevalent in times when flexible work patterns are more common.

Minutes Worked in the Formal Care Economy

The formal care economy constitutes a significant portion of the labor force, as demonstrated in the preceding section. However, measuring its size by participation alone does not fully capture its scope. Time spent working offers a complementary perspective, particularly in care-related roles where intensity and duration of labor vary widely. To address this, we calculate the total minutes worked in the formal care economy as a proportion of all minutes worked across the broader labor force. We implement this through a function, get_formal_time, which takes a specified demographic group and computes the total number of minutes this group spends working in the formal care economy. This R function is doing the following steps:

Defining a function:
It defines a function called get_formal_time that takes two inputs:
- df: a dataframe (most likely asec)
- demo_group: a demographic variable to group by (e.g., race, gender, etc.)
Filtering out invalid data:
It removes rows where reported work hours (uhrsworkt) equals 997, which indicates flexible hours.
Creating grouping variables:
- Adds a category_id column that stores the name of the demographic variable.
- Adds a subcategory_id column that stores the value of that variable for each individual.
Grouping data:
It groups the dataset by year (date), the demographic group name (category_id), and the demographic subgroup (subcategory_id).
Calculating time spent in formal care work:
- formal_care_time: Totals the number of minutes per week spent in care-related paid work, using reported hours and survey weights. The formula:
```
asecwt * uhrsworkt * 60 / 7
```
  converts weekly work hours into minutes per day.
- formal_care_time_proportion: Divides that total by the total number of weighted work minutes for the group to get a proportion.
Returning the result:
It returns a dataframe with the total and proportion of time spent in formal care work, broken down by demographic group and year.

In plain language:
This function calculates how much time people spend in paid care jobs across different demographic groups, using survey data on work hours. It returns both total care time and the share of all work time spent in care occupations.

For the care board data itself we present an overall estimate and an estimate broken down by gender and parenthood.

## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.

Minutes Worked in the Informal Care Economy

To measure time spent in the informal care economy, we draw on data from the ATUS. Using the activity crosswalks described above, we categorize time use entries into specific types of care activities. We then calculate the total time spent on informal caregiving across the population. The function below is designed to compute the average amount of time dedicated to informal caregiving, providing a clearer picture of the scope and intensity of unpaid care work. This R function is doing the following steps:

Defining a function:
It defines a function called get_informal_time that takes one input:
- demo_group: the name of a demographic variable (e.g., gender, education) used to group the data.
Setting up for a loop across years:
- It initializes an empty list called informal_time to store yearly results.
- Starts a counter i for tracking the index during the loop.
Looping through each year:
For each year (sel_year) in yr_range$year, using a 5-year window:
- Sets year_min as the lower limit of that 5-year range.
Calculating informal care time:
- Filters case_year to include records in the 5-year window ending with sel_year.
- Adds two new columns:
  - category_id: the name of the demographic variable.
  - subcategory_id: the value of that variable for each person.
- Groups by care status (care_flag) and demographic category/subgroup.
- Calculates total informal care time using weighted total minutes of care (scaled to a 5-year average).
Calculating proportions:
- Within each demographic subgroup, it calculates the proportion of time spent on each care type (care vs. non-care).
- Adds a date column representing January 1st of the sel_year.
Storing and returning results:
- Each year’s results are stored in the informal_time list.
- After the loop, all results are combined into one dataframe (df) and returned.

In plain language:
This function calculates how much time people in each demographic group spend on informal care work, over rolling 5-year periods. It returns both the total and proportion of informal care time for each group and year.

Just as with the formal sector, for the CB statistics alone we present an overall and a division by gender and parenthood.

Valuing the Care Economy

Formal Care Economy Valuation.

Thus far, we have examined both the number of individuals employed in the formal care economy and the time they spend in care-related work. A final dimension of analysis involves assigning a monetary value to this labor. The code chunk below performs this valuation based on several assumptions. Specifically, we identify individuals working within the formal care economy and associate their roles with corresponding wage data. We then estimate the total value of formal care labor by multiplying the median wage for care-related occupations by the number of individuals employed in these roles. The resulting valuation is compared to U.S. GDP figures to contextualize the economic weight of the formal care sector. For the Care Board, we present both an overall valuation and disaggregated estimates by gender and parenthood status. This R function is doing the following steps:

Defining a function:
It creates a function called get_formal_value that takes two inputs:
- df: a dataframe (usually asec data)
- demo_group: a demographic variable to group by (e.g., race, gender, education)
Filtering the data:
- Removes rows where reported work hours are fluctuating (uhrsworkt == 997)
- Removes rows where income from wages (incwage) is zero (to focus on active earners)
Creating demographic labels:
- Adds two columns:
  - category_id: the name of the demographic variable
  - subcategory_id: the actual value of that demographic variable for each person
Grouping and calculating value of formal care work:
- Groups by year (date) and demographic group
- Calculates formal_value as the total income earned from formal care work (wage * weight), only for people employed in care occupations
Joining with GDP data:
- Joins the result with the us_gdp dataset to add daily GDP information for the corresponding year
Adjusting and scaling values:
- Converts total care earnings to a daily value by dividing by 365
- Calculates the proportion of GDP represented by care labor earnings
Final cleanup:
- Removes the gdp_daily column since it’s no longer needed
Returning the result:
Returns a dataframe with the total daily value of formal care work and its share of GDP, broken down by demographic group and year.

In plain language:
This function calculates how much income people earn from paid care work across different demographic groups and years, and shows how that income compares to the overall U.S. economy.

The table below provides the data related to the valuation of the formal careforce.

## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.

Informal Care Economy Valuation.

We now extend our valuation approach to the informal care economy. Using ATUS data, we estimate the amount of time individuals spend on informal caregiving activities. A key challenge in this process is determining how to assign a monetary value to this unpaid labor. One approach, explored in a related working paper, involves mapping informal care tasks to comparable occupations in the formal care economy—for instance, assigning the average wage of professional chefs to time spent on cooking. While this method offers a nuanced valuation, it raises concerns, as formal sector workers typically possess more training and experience, making their wages an imperfect proxy for informal labor.

To address this limitation and provide a conservative estimate, we adopt the federal minimum wage of $7.25 per hour as the baseline value for informal care labor. This wage reflects the minimum amount an individual is generally expected to earn in the U.S., allowing us to assess the value of informal care at a socially recognized floor. The code below implements this calculation by converting time spent on informal care (from minutes to hours) and multiplying it by $7.25. We then compare the estimated value of informal caregiving to U.S. GDP on a daily basis, presenting it as a proportion of overall economic activity.

Conclusion

This methodology document has outlined the comprehensive framework behind The Care Board, detailing the data sources, processing pipelines, statistical methods, and visualization strategies employed in its development. By bringing together multiple datasets, coding procedures, and analytical models, it provides a transparent account of how the indicators on The Care Board are generated, interpreted, and updated. Whether the data pertains to formal care infrastructure, informal caregiving time, or population-level needs, each component has been carefully constructed to provide a robust, scalable, and replicable platform for monitoring and analyzing care-related dynamics.

The aim of The Care Board is not just to present statistics, but to contextualize them—illuminating the patterns, disparities, and evolving trends that define the care economy across time and place. By linking rigorous data analysis with user-friendly visualization, the board supports decision-making for researchers, policymakers, and advocates who are invested in strengthening the care infrastructure. This document ensures that the metrics presented are not black boxes, but rather the product of thoughtful methodological choices grounded in social science and data science principles.

Looking forward, this methodology will remain a living framework—capable of evolving as new data becomes available, as care dynamics shift, and as users provide feedback. The ongoing development of The Care Board will continue to prioritize transparency, adaptability, and impact, ensuring it remains a vital tool for understanding and improving care systems in diverse communities.

Care Board Methodology

Joseph Bommarito

2025-04-01

Introduction to the Care Board Methodology

Preliminary tasks

Packages

Working Directory

Data Processing

CPS ASEC Data Download

ATUS Data Note

Data Prep Functions

Asec Raw Variables

CPS Raw Variables

ATUS Raw Variables

Creation of new variables

Age Category

Prime Age

Child Age

Gender_parent

Race Ethnicity

Laborstatus

Month

Catagorical Variable Comparisons

Hispanic

Race

Sex

Marital Status

Education

Poverty

Labor Force Status

Employment Status

Work Status

Class of Worker

NILF

Telework

Absenteeism

Reason for Absent

Recoding all Variables

Loading the Activity Data

CPS Monthly Variable Processing

CPS Asec Variable Processing

ATUS Variable Processing

Data Processing Conclusion

Care and care provision

Age Data

Market Datum

Care

Care Provision

Finalize Care Economy Data

Providers of Care

Formal Providers

Informal Providers

4. Summarize time by person and activity:

Care Provider Demographics

Activity Time data

Formal Care Activities

Informal Care Activities

3. Combine all results into one dataframe

Care Gini Coefficient

Care Ratio

The Sandwhich Generation

Care Labor Force Broad Indicators Prep

Prepping Economy Data

Prepping CPS ASEC Data

Prepping ATUS Data

Labor Force Participation by Care Giver Status

Formal Care Force

Informal Care Force

Time spent working in the Care Economy

Minutes Worked in the Formal Care Economy

Minutes Worked in the Informal Care Economy

Valuing the Care Economy

Formal Care Economy Valuation.

Informal Care Economy Valuation.

Conclusion