The Care Board is an online dashboard designed to present comprehensive statistics and insights on the care economy - a critical yet overlooked sector encompassing both paid and unpaid care activities. The care economy includes all tasks related to caring for oneself and others. This includes jobs like nursing, teaching, childcare, and assisting elderly relatives, among others. It also includes informal, unpaid care work we do in our homes like making dinner, caring for children, or washing clothes. These responsibilities form a significant portion of individuals’ daily lives, whether through professional roles or unpaid domestic labor.
Despite its essential role in sustaining individuals, families, and society, care work remains largely invisible within formal economic statistics. For instance, while the Bureau of Labor Statistics (BLS) tracks childcare provided by paid professionals, identical activities performed by parents and relatives remain unaccounted for. This discrepancy highlights the broader issue of how care work is valued and measured within traditional economic frameworks.
The Care Board aims to bridge this gap by providing a centralized platform for measuring, analyzing, and studying caregiving activities in both the formal and informal sectors. By developing novel statistics from publicly available data, the Care Board seeks to foster meaningful discussions among researchers, policymakers, and the general public, bringing greater visibility to the challenges faced by caregivers nationwide.
This document serves as the primary detailed methodology and repository for all statistics presented on The Care Board. The statistics developed for The Care Board offer a new perspective on the economy through the lens of care. Where applicable, links to working papers or peer-reviewed articles will be provided.
All data presented and available for download on The Care Board, along with code and necessary information for replication, are discussed in this document. Each section guides users through the formation of a given statistic, from raw data to its final presentation. Any methodological choices, hurdles, and assumptions are documented for transparency.
To use this document, navigate to the section of the statistic in which you are most interested in. In each section, you will find the raw data input requirements, code, and relevant explanations. If using any data or code on the Care Board, please ensure proper attribution. Publications and reports should cite the appropriate version of the following:
Misty L. Heggeness, Joseph Bommarito, and Lucie Prewitt. The Care Board: Version 1.0 [dataset]. Lawrence, KS: Kansas Population Center, 2025. https://thecareboard.org
Before running any code, the following preliminary tasks will need to be done. The code provided at the beginning of this document must be run before any code in any other section. This code installs relative packages and sets the working directory to be used by all other sections.
Failure to run this code may result in errors.
Ensure that the working directory is updated to fit your data file location. Changing the working directory is needed to successfully run the code in this document.
The first step is to install the required packages. While some statistics require some specific packages to run, other packages are needed for more general data handling. These packages are loaded and described below.
pacman: is a package used to load other packages. This package checks to see if the other package is installed on the user’s computer. In the case it is not installed, pacman will install it prior to loading it from the library. In the case it is installed, pacman will skip installation and load the package directly from the library.
tidyverse: is a commonly used data handling package in the R environment. Tidyverse is used to provide more streamlined and readable coding with the goal of allowing easier access to replication files. Whenever possible, code in this document is conducted via the tidyverse methodology as opposed to base R.
haven: is a package used for reading and writing certain data formats. For the purpose of this documentation, this package is mostly used for the purpose of writing datafiles as STATA .dta files.
data.table: is a package used to efficiently load and write csv files. Large csv files can be resource intensive to load in as a dataset. This package allows them to be loaded in as a table and then worked with directly in the R environment.
Hmisc: is used to handle survey research and is primarily used in the below code to apply survey weights to statistics creating population valid estimates.
janitor: provides simple functions for cleaning and formatting data, especially useful for cleaning column names, detecting and removing empty rows or columns, and summarizing tabular data.
DT: The DT package is used exclusively for this RMD file and is used to provide more readable tables that can be viewed of the data within the HTML output.
skimr: Is a package used specifically for this RMD file to provide descriptions of the appropriate datasets.
DescTools Provides a variety of functions used to describe datasets, most noteably we utilize the Gini command in this package.
To load data, you must set the working directory to the file location where your data is stored. This code utilizes multiple folders based out of a single CareBoard directory. Set your working directory to a general folder where the folders you download will be stored. The code in this document will switch between directories as needed given the assumption they are all in the correct repository. This step is required to execute the code without errors.
Before developing the specific statistics needed for the care board, the raw microdata files need to be compiled and converted into a proper format. The code in this section provides a methodology to pull in all required data, clean it as necessary, and export it to the required locations. This project uses a wide variety of data to compile its variety of statistics but the core data represents micro data from annual surveys conducted by the census bureau and the bureau of labor statistics. The monthly Current Population Survey (CPS) along with the yearly Annual Social and Economic Supplement (ASEC) and the annual American Time Use Survey (ATUS) are the largest used data sources for this project. We pull these data from the Minnesota Population Center (MPC) Integrated Public Use Microdata Series (IPUMS) project. The first step is introducing how these data are loaded in and transformed.
The code below produces a major source of raw data used in the production of The Care Board statistics: CPS monthly and CPS ASEC yearly microdata. The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) is a special supplement to the monthly Current Population Survey (CPS), which is conducted by the U.S. Census Bureau and the Bureau of Labor Statistics (BLS). The monthly CPS is primarily focused on labor force characteristics, such as employment, unemployment, and workforce participation. The CPS ASEC goes beyond this by collecting detailed information on income, poverty, health insurance coverage, and demographic characteristics, making it the primary source of data for measuring income inequality and economic well-being in the U.S.
The monthly CPS is a regular survey of around 60,000 households conducted every month. The CPS ASEC is conducted once a year, typically in March, and includes both regular CPS respondents and additional over sampled households to improve estimates for specific population groups. The CPS ASEC expands the sample size compared to the monthly CPS by including additional households to improve data accuracy, especially for poverty and income statistics. The CPS Monthly data is used for labor force statistics like the unemployment rate while the CPS ASEC data is used for official poverty, estimates, income distribution studies, and health insurance coverage statistics. We utilize the CPS ASEC data to compile data on income and earnings for those working in the formal care economy.
The code below uses an IPUMS API key to download the IPUMS microdata files, which include the relevant information needed to develop statistics for The Care Board. IPUMS (Integrated Public Use Microdata Series) is a project that provides harmonized microdata from various national and international surveys and censuses. It is maintained by the Minnesota Population Center at the University of Minnesota. IPUMS makes large-scale individual- and household-level databases more accessible and comparable over time and across geographic regions.
The data available from IPUMS can be accessed through an API key. In order to replicate the code chunk below you will need to insert your personal key into the slot for set_ipums_api_key. For information on how to get a personal key visit https://www.ipums.org/. An IPUMS API key is free to the public and researchers.
IPUMS CPS data, including the ASEC supplements, can be cited as follows.
Sarah Flood, Miriam King, Renae Rodgers, Steven Ruggles, J. Robert Warren, Daniel Backman, Annie Chen, Grace Cooper, Stephanie Richards, Megan Schouweiler, and Michael Westberry. IPUMS CPS: Version 12.0 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D030.V12.0
When downloading from this repository, we need to first identify the sample ids that represent the required samples. A set of files in the 01_preliminary-code-and-data file store the names of the required samples for both the asec and the CPS monthly data pulls. The key difference between these sample lists is the asec list contains only data from the yearly CPS ASEC data while the cps list contains all monthly sample iterations.
Following the creation of the samples, we also need to list a set of variables that we want to pull from the IPUMS repository. We create three set of variables. var_common refers specifically to var iables that are present in both the CPS ASEC and the CPS Monthly data. var_asec refers specifically to variables that are present only in the CPS ASEC data. These variables refer mostly to income and earnings data. var_cps refers specifically to variables that are present in the CPS Monthly data alone. These variables refer mostly to workforce classification variables. The chunk below populations these lists.
The code below uses the IPUMS API interface to pull data from the yearly CPS ASEC and the CPS Monthly. This R chunk is currently set to NOT run when this markdown file is run. The final results of this data extract can be found in the GITHUB. The code does not run so that all the following code uses the correct iteration or the API data. If you wish to modify the data downloaded from IPUMS simply change any of the samples or variables as you desire and then run the below chunk in an R script. If you wish to simply replicate the data conducted by The Care Board, you can skip this step and load in the ddi files already in the GITHUB repository.
As a note, the IPUMS data API does not currently fully support the download of ATUS data. We provide the xml and .dat.gz file associated with the data in the GITHUB repository for The Care Board. To modify the ATUS data download by changing samples or variables, users will need to conduct a manual extract from IPUMS. The interface to conduct this manual extract along with instructions can be found as follows https://timeuse.ipums.org/. IPUMS kindly requests that usage of this data be cited as follows.
Sarah M. Flood, Liana C. Sayer, Daniel Backman, and Annie Chen. American Time Use Survey Data Extract Builder: Version 3.2 [dataset]. College Park, MD: University of Maryland and Minneapolis, MN: IPUMS, 2023. https://doi.org/10.18128/D060.V3.2
After running the above code OR by downloading the data from the GITHUB repository, we should have three sets of .xml and .dat.gz files. These files represent meta data and zipped downloads of the microdata from IPUMS. The code below is used to load this data into the R environment.
After loading the data into the R environment, we get the variable labels for each of the files. Before creating the statistics, we need to clean the data and ensure consistency between different samples. The CPS Monthly and CPS ASEC data often have variables that measure the same thing as the ATUS data but are coded slightly differently. Thus, we need to ensure that all variables are coded correctly. This section does that while also providing information on the variety of variables throughout the samples.
The tables below represent the different variables gathered from the CPS ASEC data. These variables are so far not tampered with and represent the exact values received when downloaded directly from the IPUMS repository.
## [1] "Survey year"
## [2] "Household serial number"
## [3] "Month"
## [4] "CPSID, household record"
## [5] "Flag for ASEC"
## [6] "Flag for the 3/8 file 2014"
## [7] "Annual Social and Economic Supplement Household weight"
## [8] "Region and division"
## [9] "State (FIPS code)"
## [10] "Person number in sample unit"
## [11] "CPSID, person record"
## [12] "Validated Longitudinal Identifier"
## [13] "Annual Social and Economic Supplement Weight"
## [14] "Age"
## [15] "Sex"
## [16] "Race"
## [17] "Marital status"
## [18] "Person number of first mother (from programming)"
## [19] "Person number of first father (from programming)"
## [20] "Number of own family members in hh"
## [21] "Number of own children in household"
## [22] "Age of youngest own child in household"
## [23] "Hispanic origin"
## [24] "Employment status"
## [25] "Occupation, 2010 basis"
## [26] "Industry, 1990 basis"
## [27] "Hours usually worked per week at all jobs"
## [28] "Hours worked last week"
## [29] "Absent from work last week"
## [30] "Reason for absence from work"
## [31] "Full or part time status"
## [32] "Educational attainment recode"
## [33] "Earnings weight"
## [34] "Wage and salary income"
## [35] "Original poverty status (PUMS original)"
The tables below represent the different variables gathered from the CPS Monthly data. These variables are so far not tampered with and represent the exact values received when downloaded directly from the IPUMS repository.
## [1] "Survey year"
## [2] "Household serial number"
## [3] "Month"
## [4] "Household weight, Basic Monthly"
## [5] "CPSID, household record"
## [6] "Flag for ASEC"
## [7] "Region and division"
## [8] "State (FIPS code)"
## [9] "Person number in sample unit"
## [10] "Final Basic Weight"
## [11] "CPSID, person record"
## [12] "Validated Longitudinal Identifier"
## [13] "Age"
## [14] "Sex"
## [15] "Race"
## [16] "Marital status"
## [17] "Person number of first mother (from programming)"
## [18] "Person number of first father (from programming)"
## [19] "Person number of spouse (from programming)"
## [20] "Number of own family members in hh"
## [21] "Number of own children in household"
## [22] "Age of youngest own child in household"
## [23] "Hispanic origin"
## [24] "Employment status"
## [25] "Labor force status"
## [26] "Occupation, 2010 basis"
## [27] "Industry, 1990 basis"
## [28] "Class of worker "
## [29] "Hours worked last week"
## [30] "Absent from work last week"
## [31] "Reason for absence from work"
## [32] "Full or part time status"
## [33] "Major activity (NILF)"
## [34] "Educational attainment recode"
## [35] "Personal care limitation"
## [36] "Composite Weight for replicating BLS labor force estimates"
## [37] "In the last week, telework or work at home for pay"
The tables below represent the different variables gathered from the ATUS data. These variables are so far not tampered with and represent the exact values received when downloaded directly from the IPUMS repository.
## [1] "Survey year"
## [2] "ATUS Case ID"
## [3] "Household serial number"
## [4] "Scrambled pseudo primary sampling unit (PSU) collapsed stratum "
## [5] "FIPS State Code"
## [6] "Number of people in household"
## [7] "Family income"
## [8] "Number of adults in household"
## [9] "Time first household child woke up"
## [10] "Time last household child went to bed"
## [11] "Household income greater or less than 185% of poverty level (EHM)"
## [12] "Person number (general)"
## [13] "Person line number"
## [14] "ATUS interview day of the week"
## [15] "Person weight, 2006 methodology"
## [16] "Person weight, 2020 methodology"
## [17] "Age"
## [18] "Sex"
## [19] "Race"
## [20] "Hispanic origin"
## [21] "Marital status"
## [22] "Highest level of school completed"
## [23] "Labor force status"
## [24] "General occupation category, main job"
## [25] "Detailed occupation category, main job (CPS)"
## [26] "Weekly earnings"
## [27] "Hours worked last week (CPS)"
## [28] "Employment status (spouse or partner)"
## [29] "Unique Longitudinal CPS Identifier"
## [30] "Eldercare provided in last 3 months"
## [31] "Age of youngest own child (from programming)"
## [32] "Number of own children (from programming)"
## [33] "Activity line number"
## [34] "Activity"
## [35] "Duration of activity (extended version)"
## [36] "Duration of activity"
## [37] "Time spent during activity on secondary child care of all children"
## [38] "Time spent during activity on secondary child care of own children"
## [39] "Time spent during activity on secondary eldercare for household and non-household members"
## [40] "Activity start time"
## [41] "Activity stop time"
The function below presents a methodology for comparing variables between data. This function takes the variable name in each data set and compares the value and labels together. For example, each data set has a variable for Hispanic, but they code this data slightly differently. The check_lookups function will, when the variable Hispanic is inserted, look at the different values to check for consistency. In the case that different samples have different values, we will need to recode them before moving forward.
In addition to the variables generated directly from IPUMS, we create a few other variables of interest. These variables represent recoding numeric variables into categorical variables or combining multiple variables into a single variable for analysis. Each of these is coded specifically for The Care Board project.
This variable represents groupings of ages of individual respondents and acts as a categorical classifier for the different ages. For those under the age of 18, the category “Under 18” is used. For those over the age of sixty-five, the category “Over65” is used. For all other categories other than the 18-24, this variable represents ten-year increments.
This variable identifies individuals who meet the labor economics definition of being in a “prime age” bin. Labor economists define prime age as those individuals aged 25 to 54. This age category represents people who tend to be most productive within the workforce. The ages are typically after higher education and before retirement.
This variable looks within a house and identifies the age of the youngest child, putting that value within age bins. These age bins are under 5, 5-11, and 12-17 representing different aspects of a child’s growth. An additional category of eighteen plus represents adult children living with their parents while the variable NIU represents households without any children. This variable can be used as a categorical variables instead of the numeric child age variable when desired.
This variable represents an interaction between the sex and parenthood status of an individual. This can be one of four unique values representing both the case where a respondent is male or female and the case where the respondent is a parent living with their children in the home or childless (including parents whose children live elsewhere). Parenthood includes step-parents and parents of both biological and adopted children.
This variable coalesces the race variable and the Hispanic variable to create a single value of race_ethnicity. It is common practice to merge these variables adding a person of Hispanic origin to the race categories as a separate observation from other races. In the case where the respondent is not Hispanic, then this variable represents their reported survey race.
This variable represents a combination of the variables wkstat and empstat. The variable empstat identifies a respondent’s labor force participation status as in the labor force, unemployed, or not in the labor force (NILF). The variable wkstat identifies a worker as full- or part-time on the condition that they are in the labor force. Labor status has four unique categories of full-time, part-time, unemployed, and NILF.
This variable provides the name of the month as opposed to a numerical representation of the month for easier readability. This variable is most important for the CPS Monthly data that has monthly iterations of the data as opposed to the yearly CPS ASEC and ATUS.
Now that we have coded our major categorical variables, we need to also ensure they are coded the same between the different surveys. To do this we use the lookup_compare function that we created previously. Using this function, we see the values of the different variables in the CPS ASEC and ATUS data.
For each variable the lookup_compare function provides the value as coded in CPS ASEC, CPS Monthly, and ATUS. For CPS ASEC and CPS Monthly this is generally the same, but for ATUS is often different. For example, the first value we look at is the variable representing Hispanic origin.
## # A tibble: 30 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <chr>
## 1 0 Not Hispanic Not Hispanic <NA>
## 2 100 Mexican Mexican Not Hispanic
## 3 102 Mexican American Mexican American <NA>
## 4 103 Mexicano/Mexicana Mexicano/Mexicana <NA>
## 5 104 Chicano/Chicana Chicano/Chicana <NA>
## 6 108 Mexican (Mexicano) Mexican (Mexicano) <NA>
## 7 109 Mexicano/Chicano Mexicano/Chicano <NA>
## 8 200 Puerto Rican Puerto Rican <NA>
## 9 300 Cuban Cuban <NA>
## 10 400 Dominican Dominican <NA>
## # ℹ 20 more rows
As can be seen, this variable is coded in more detail in the CPS Monthly data and the ATUS data. We thus need to ensure all data is coded in the same format. The functions below provide the methodology for converting the data in ATUS and the CPS Monthly data into the final values for the Hispanic origin data.
## # A tibble: 30 × 6
## val lbl_asec lbl_cps hispan_cps lbl_atus hispan_atus
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 0 Not Hispanic Not Hispanic Not Hispanic <NA> Hispanic
## 2 100 Mexican Mexican Hispanic Not His… Not Hispan…
## 3 102 Mexican American Mexican American Hispanic <NA> Hispanic
## 4 103 Mexicano/Mexicana Mexicano/Mexicana Hispanic <NA> Hispanic
## 5 104 Chicano/Chicana Chicano/Chicana Hispanic <NA> Hispanic
## 6 108 Mexican (Mexicano) Mexican (Mexicano) Hispanic <NA> Hispanic
## 7 109 Mexicano/Chicano Mexicano/Chicano Hispanic <NA> Hispanic
## 8 200 Puerto Rican Puerto Rican Hispanic <NA> Hispanic
## 9 300 Cuban Cuban Hispanic <NA> Hispanic
## 10 400 Dominican Dominican Hispanic <NA> Hispanic
## # ℹ 20 more rows
The code below provides the same methodology for recoding the Race variables to be identifical.
## # A tibble: 62 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <chr>
## 1 100 White White White on…
## 2 200 Black Black White-Bl…
## 3 300 American Indian/Aleut/Eskimo American Indian/Aleut/Eskimo White-Bl…
## 4 650 Asian or Pacific Islander Asian or Pacific Islander <NA>
## 5 651 Asian only Asian only <NA>
## 6 652 Hawaiian/Pacific Islander only Hawaiian/Pacific Islander only <NA>
## 7 700 Other (single) race, n.e.c. Other (single) race, n.e.c. <NA>
## 8 801 White-Black White-Black <NA>
## 9 802 White-American Indian White-American Indian <NA>
## 10 803 White-Asian White-Asian <NA>
## # ℹ 52 more rows
## # A tibble: 62 × 6
## val lbl_asec lbl_cps race_cps lbl_atus race_atus
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 100 White White White White o… White
## 2 200 Black Black Black White-B… Two or M…
## 3 300 American Indian/Aleut/Eskimo American In… America… White-B… Two or M…
## 4 650 Asian or Pacific Islander Asian or Pa… Asian/P… <NA> Two or M…
## 5 651 Asian only Asian only Asian/P… <NA> Two or M…
## 6 652 Hawaiian/Pacific Islander only Hawaiian/Pa… Asian/P… <NA> Two or M…
## 7 700 Other (single) race, n.e.c. Other (sing… Two or … <NA> Two or M…
## 8 801 White-Black White-Black Two or … <NA> Two or M…
## 9 802 White-American Indian White-Ameri… Two or … <NA> Two or M…
## 10 803 White-Asian White-Asian Two or … <NA> Two or M…
## # ℹ 52 more rows
The code below provides the same methodology for recoding the sex variable.
## # A tibble: 4 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <chr>
## 1 1 Male Male Male
## 2 2 Female Female Female
## 3 9 NIU NIU <NA>
## 4 99 <NA> <NA> NIU (Not in universe)
## # A tibble: 4 × 5
## val lbl_asec lbl_cps lbl_atus sex
## <dbl> <chr> <chr> <chr> <chr>
## 1 1 Male Male Male Male
## 2 2 Female Female Female Female
## 3 9 NIU NIU <NA> NIU
## 4 99 <NA> <NA> NIU (Not in universe) NIU
The code below provides the same methodology for recoding the marital status variable.
## # A tibble: 9 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <chr>
## 1 1 Married, spouse present Married, spouse present Married - spouse present
## 2 2 Married, spouse absent Married, spouse absent Married - spouse absent
## 3 3 Separated Separated Widowed
## 4 4 Divorced Divorced Divorced
## 5 5 Widowed Widowed Separated
## 6 6 Never married/single Never married/single Never married
## 7 7 Widowed or Divorced Widowed or Divorced <NA>
## 8 9 NIU NIU <NA>
## 9 99 <NA> <NA> NIU (Not in universe)
## # A tibble: 9 × 5
## val lbl_asec lbl_cps lbl_atus marst
## <dbl> <chr> <chr> <chr> <chr>
## 1 1 Married, spouse present Married, spouse present Married - spouse … Marr…
## 2 2 Married, spouse absent Married, spouse absent Married - spouse … Marr…
## 3 3 Separated Separated Widowed Sepa…
## 4 4 Divorced Divorced Divorced Sepa…
## 5 5 Widowed Widowed Separated Sepa…
## 6 6 Never married/single Never married/single Never married Sing…
## 7 7 Widowed or Divorced Widowed or Divorced <NA> Sepa…
## 8 9 NIU NIU <NA> NIU
## 9 99 <NA> <NA> NIU (Not in unive… NIU
The variable below provides the same methodology for recoding the educaiton variable.
## # A tibble: 42 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <chr>
## 1 0 NIU or no schooling NIU or no schooling <NA>
## 2 1 NIU or blank NIU or blank <NA>
## 3 2 None or preschool None or preschool <NA>
## 4 10 Grades 1, 2, 3, or 4 Grades 1, 2, 3, or 4 Less than 1st grade
## 5 11 Grade 1 Grade 1 1st, 2nd, 3rd, or 4th grade
## 6 12 Grade 2 Grade 2 5th or 6th grade
## 7 13 Grade 3 Grade 3 7th or 8th grade
## 8 14 Grade 4 Grade 4 9th grade
## 9 20 Grades 5 or 6 Grades 5 or 6 High school graduate - GED
## 10 21 Grade 5 Grade 5 High school graduate - diplo…
## # ℹ 32 more rows
## # A tibble: 42 × 6
## val lbl_asec lbl_cps educ_cps lbl_atus educ_atus
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 0 NIU or no schooling NIU or no schooling NIU <NA> 0
## 2 1 NIU or blank NIU or blank NIU <NA> 1
## 3 2 None or preschool None or preschool No HS Dip… <NA> 2
## 4 10 Grades 1, 2, 3, or 4 Grades 1, 2, 3, or 4 No HS Dip… Less th… No HS Di…
## 5 11 Grade 1 Grade 1 No HS Dip… 1st, 2n… No HS Di…
## 6 12 Grade 2 Grade 2 No HS Dip… 5th or … No HS Di…
## 7 13 Grade 3 Grade 3 No HS Dip… 7th or … No HS Di…
## 8 14 Grade 4 Grade 4 No HS Dip… 9th gra… No HS Di…
## 9 20 Grades 5 or 6 Grades 5 or 6 No HS Dip… High sc… High Sch…
## 10 21 Grade 5 Grade 5 No HS Dip… High sc… High Sch…
## # ℹ 32 more rows
The code below provides the same methodology for the poverty variable.
## # A tibble: 13 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <lgl> <chr>
## 1 0 NIU NA <NA>
## 2 10 Below poverty NA HH income less than…
## 3 20 Above poverty NA HH income greater t…
## 4 21 100-124 percent of the low-income level NA <NA>
## 5 22 125-149 percent of the low-income level NA <NA>
## 6 23 150 percent and above the low-income level NA <NA>
## 7 NA <NA> NA <NA>
## 8 11 <NA> NA HH income less than…
## 9 12 <NA> NA HH income equal to …
## 10 96 <NA> NA Refused
## 11 97 <NA> NA Don't know
## 12 98 <NA> NA Blank
## 13 99 <NA> NA NIU (Not in univers…
## # A tibble: 13 × 6
## val lbl_asec pov_asec lbl_cps lbl_atus pov_atus
## <dbl> <chr> <chr> <lgl> <chr> <chr>
## 1 0 NIU NIU NA <NA> NIU
## 2 10 Below poverty Below P… NA HH inco… Below P…
## 3 20 Above poverty Above P… NA HH inco… Above P…
## 4 21 100-124 percent of the low-income l… 100-124… NA <NA> NIU
## 5 22 125-149 percent of the low-income l… 125-149… NA <NA> NIU
## 6 23 150 percent and above the low-incom… 150+ Pe… NA <NA> NIU
## 7 NA <NA> <NA> NA <NA> <NA>
## 8 11 <NA> 11 NA HH inco… Below P…
## 9 12 <NA> 12 NA HH inco… Below P…
## 10 96 <NA> 96 NA Refused NIU
## 11 97 <NA> 97 NA Don't k… NIU
## 12 98 <NA> 98 NA Blank NIU
## 13 99 <NA> 99 NA NIU (No… NIU
The code below provides the same methodology for the Labor Force Status variable.
## val lbl_asec lbl_cps lbl_atus
## 1 NA NA <NA> <NA>
## 2 0 NA NIU <NA>
## 3 1 NA No, not in the labor force Employed - at work
## 4 2 NA Yes, in the labor force Employed - absent
## 5 3 NA <NA> Unemployed - on layoff
## 6 4 NA <NA> Unemployed - looking
## 7 5 NA <NA> Not in labor force
## 8 99 NA <NA> NIU (Not in universe)
## val lbl_asec lbl_cps labforce_cps
## 1 NA NA <NA> <NA>
## 2 0 NA NIU NIU
## 3 1 NA No, not in the labor force Not in the Labor Force
## 4 2 NA Yes, in the labor force In the Labor Force
## 5 3 NA <NA> <NA>
## 6 4 NA <NA> <NA>
## 7 5 NA <NA> <NA>
## 8 99 NA <NA> <NA>
## lbl_atus labforce_atus
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 Employed - at work In the Labor Force
## 4 Employed - absent In the Labor Force
## 5 Unemployed - on layoff In the Labor Force
## 6 Unemployed - looking In the Labor Force
## 7 Not in labor force Not in the Labor Force
## 8 NIU (Not in universe) NIU
The code below provides the same methodology for the Employment Status variable.
## # A tibble: 23 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <chr>
## 1 0 NIU NIU <NA>
## 2 1 Armed Forces Armed Forces Employed…
## 3 10 At work At work <NA>
## 4 12 Has job, not at work last week Has job, not at work last week <NA>
## 5 20 Unemployed Unemployed <NA>
## 6 21 Unemployed, experienced worker Unemployed, experienced worker <NA>
## 7 22 Unemployed, new worker Unemployed, new worker <NA>
## 8 30 Not in labor force Not in labor force <NA>
## 9 31 NILF, housework NILF, housework <NA>
## 10 32 NILF, unable to work NILF, unable to work <NA>
## # ℹ 13 more rows
## # A tibble: 23 × 6
## val lbl_asec lbl_cps empstat_cps lbl_atus empstat_atus
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 0 NIU NIU NIU <NA> 0
## 2 1 Armed Forces Armed … Armed Forc… Employe… Employed
## 3 10 At work At work Employed <NA> 10
## 4 12 Has job, not at work last we… Has jo… Employed <NA> 12
## 5 20 Unemployed Unempl… Unemployed <NA> 20
## 6 21 Unemployed, experienced work… Unempl… Unemployed <NA> 21
## 7 22 Unemployed, new worker Unempl… Unemployed <NA> 22
## 8 30 Not in labor force Not in… NILF <NA> 30
## 9 31 NILF, housework NILF, … NILF <NA> 31
## 10 32 NILF, unable to work NILF, … NILF <NA> 32
## # ℹ 13 more rows
The code below provides the same methodology for the variables identifiying full or part time status.
## # A tibble: 16 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <lgl>
## 1 10 Full-time schedules Full-t… NA
## 2 11 Full-time hours (35+), usually full-time Full-t… NA
## 3 12 Part-time for non-economic reasons, usually full-time Part-t… NA
## 4 13 Not at work, usually full-time Not at… NA
## 5 14 Full-time hours, usually part-time for economic reaso… Full-t… NA
## 6 15 Full-time hours, usually part-time for non-economic r… Full-t… NA
## 7 20 Part-time for economic reasons Part-t… NA
## 8 21 Part-time for economic reasons, usually full-time Part-t… NA
## 9 22 Part-time hours, usually part-time for economic reaso… Part-t… NA
## 10 40 Part-time for non-economic reasons, usually part-time Part-t… NA
## 11 41 Part-time hours, usually part-time for non-economic r… Part-t… NA
## 12 42 Not at work, usually part-time Not at… NA
## 13 50 Unemployed, seeking full-time work Unempl… NA
## 14 60 Unemployed, seeking part-time work Unempl… NA
## 15 99 NIU, blank, or not in labor force NIU, b… NA
## 16 NA <NA> <NA> NA
## # A tibble: 16 × 5
## val lbl_asec lbl_cps wkstat lbl_atus
## <dbl> <chr> <chr> <chr> <lgl>
## 1 10 Full-time schedules Full-t… Full … NA
## 2 11 Full-time hours (35+), usually full-time Full-t… Full … NA
## 3 12 Part-time for non-economic reasons, usually fu… Part-t… Full … NA
## 4 13 Not at work, usually full-time Not at… Full … NA
## 5 14 Full-time hours, usually part-time for economi… Full-t… Full … NA
## 6 15 Full-time hours, usually part-time for non-eco… Full-t… Full … NA
## 7 20 Part-time for economic reasons Part-t… Part … NA
## 8 21 Part-time for economic reasons, usually full-t… Part-t… Part … NA
## 9 22 Part-time hours, usually part-time for economi… Part-t… Part … NA
## 10 40 Part-time for non-economic reasons, usually pa… Part-t… Part … NA
## 11 41 Part-time hours, usually part-time for non-eco… Part-t… Part … NA
## 12 42 Not at work, usually part-time Not at… Part … NA
## 13 50 Unemployed, seeking full-time work Unempl… Unemp… NA
## 14 60 Unemployed, seeking part-time work Unempl… Unemp… NA
## 15 99 NIU, blank, or not in labor force NIU, b… NIU NA
## 16 NA <NA> <NA> <NA> NA
The code below provides the same methodology for analyzing worker classes.
## val lbl_asec lbl_cps lbl_atus
## 1 NA NA <NA> NA
## 2 0 NA NIU NA
## 3 10 NA Self-employed NA
## 4 13 NA Self-employed, not incorporated NA
## 5 14 NA Self-employed, incorporated NA
## 6 20 NA Works for wages or salary NA
## 7 21 NA Wage/salary, private NA
## 8 22 NA Private, for profit NA
## 9 23 NA Private, nonprofit NA
## 10 24 NA Wage/salary, government NA
## 11 25 NA Federal government employee NA
## 12 26 NA Armed forces NA
## 13 27 NA State government employee NA
## 14 28 NA Local government employee NA
## 15 29 NA Unpaid family worker NA
## 16 99 NA Missing/Unknown NA
## val lbl_asec lbl_cps classwkr lbl_atus
## 1 NA NA <NA> <NA> NA
## 2 0 NA NIU NIU NA
## 3 10 NA Self-employed Self_Employed NA
## 4 13 NA Self-employed, not incorporated Self_Employed NA
## 5 14 NA Self-employed, incorporated Self_Employed NA
## 6 20 NA Works for wages or salary Wage/Salary NA
## 7 21 NA Wage/salary, private Wage/Salary NA
## 8 22 NA Private, for profit Wage/Salary NA
## 9 23 NA Private, nonprofit Wage/Salary NA
## 10 24 NA Wage/salary, government Government NA
## 11 25 NA Federal government employee Government NA
## 12 26 NA Armed forces Government NA
## 13 27 NA State government employee Government NA
## 14 28 NA Local government employee Government NA
## 15 29 NA Unpaid family worker Unpaid NA
## 16 99 NA Missing/Unknown Missing/Unknown NA
The code below uses the same methodology for the question analyzing why someone is not in the labor force.,
## val lbl_asec lbl_cps lbl_atus
## 1 NA NA <NA> NA
## 2 1 NA Disabled NA
## 3 2 NA Ill NA
## 4 3 NA In school NA
## 5 4 NA Taking care of house or family NA
## 6 6 NA Something else/ Other NA
## 7 99 NA Blank NA
## val lbl_asec lbl_cps nilf_activity lbl_atus
## 1 NA NA <NA> <NA> NA
## 2 1 NA Disabled Disabled NA
## 3 2 NA Ill Ill NA
## 4 3 NA In school School NA
## 5 4 NA Taking care of house or family Homemaker NA
## 6 6 NA Something else/ Other Other NA
## 7 99 NA Blank NIU NA
The code below provides the same methodology for the varibale asking if someone works via telework.
## val lbl_asec lbl_cps lbl_atus
## 1 NA NA <NA> NA
## 2 0 NA NIU NA
## 3 1 NA Yes NA
## 4 2 NA No NA
## # A tibble: 5 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <lgl>
## 1 0 NIU NIU NA
## 2 1 No No NA
## 3 2 Yes, laid off Yes, laid… NA
## 4 3 Yes, other reason (vacation, illness, labor dispute) Yes, othe… NA
## 5 NA <NA> <NA> NA
## # A tibble: 5 × 5
## val lbl_asec lbl_cps absent lbl_atus
## <dbl> <chr> <chr> <chr> <lgl>
## 1 0 NIU NIU NIU NA
## 2 1 No No No NA
## 3 2 Yes, laid off Yes, l… Yes, … NA
## 4 3 Yes, other reason (vacation, illness, labor dis… Yes, o… Yes, … NA
## 5 NA <NA> <NA> <NA> NA
The code below provides the same methodology for the variable representing the reson why someone was not at work.
## # A tibble: 17 × 4
## val lbl_asec lbl_cps lbl_atus
## <dbl> <chr> <chr> <lgl>
## 1 0 NIU NIU NA
## 2 1 On temporary layoff (under 30 days) On temporary layoff (unde… NA
## 3 2 On indefinite layoff (30+ days) On indefinite layoff (30+… NA
## 4 3 Slack work/business conditions Slack work/business condi… NA
## 5 4 Waiting for a new job to begin Waiting for a new job to … NA
## 6 5 Vacation/personal days Vacation/personal days NA
## 7 6 Own illness/injury/medical problems Own illness/injury/medica… NA
## 8 7 Child care problems Child care problems NA
## 9 8 Other family/personal obligation Other family/personal obl… NA
## 10 9 Maternity/paternity leave Maternity/paternity leave NA
## 11 10 Labor dispute Labor dispute NA
## 12 11 Weather affected job Weather affected job NA
## 13 12 School/training School/training NA
## 14 13 Civic/military duty Civic/military duty NA
## 15 14 Does not work in the business Does not work in the busi… NA
## 16 15 Other Other NA
## 17 NA <NA> <NA> NA
## # A tibble: 17 × 5
## val lbl_asec lbl_cps whyabsnt lbl_atus
## <dbl> <chr> <chr> <chr> <lgl>
## 1 0 NIU NIU NIU NA
## 2 1 On temporary layoff (under 30 days) On temporary lay… 1 NA
## 3 2 On indefinite layoff (30+ days) On indefinite la… 2 NA
## 4 3 Slack work/business conditions Slack work/busin… 3 NA
## 5 4 Waiting for a new job to begin Waiting for a ne… 4 NA
## 6 5 Vacation/personal days Vacation/persona… Vacatio… NA
## 7 6 Own illness/injury/medical problems Own illness/inju… Own ill… NA
## 8 7 Child care problems Child care probl… Care Re… NA
## 9 8 Other family/personal obligation Other family/per… Care Re… NA
## 10 9 Maternity/paternity leave Maternity/patern… Care Re… NA
## 11 10 Labor dispute Labor dispute Non-Car… NA
## 12 11 Weather affected job Weather affected… Non-Car… NA
## 13 12 School/training School/training Non-Car… NA
## 14 13 Civic/military duty Civic/military d… Non-Car… NA
## 15 14 Does not work in the business Does not work in… 14 NA
## 16 15 Other Other Other NA
## 17 NA <NA> <NA> <NA> NA
The functions created above provide the methodology to recode all needed variables. The code chunk below creates a general function that uses these above functions to recode the variables ensuring that all CPS Monthly, CPS ASEC, and ATUS samples have identical variable values. Additionally, the code below ensures that numeric variables are correctly coded and provides mutations to data variables to ensure they are all the same format.
The functions in the chunk below are split for variables in all samples, variables in both the ASEC and CPS Monthly, and variables in each of the unique samples. For a reminder of which variables are in which, see the variable classifications previously discussed. Finally, this code chunk creates the final column order which will be used to ensure that all data sets have their variables in the same order.
In The Care Board methodology, we specifically develop methods to compare both formal paid and informal unpaid activities and time use. We create crosswalks from the data to code all activities as either care activities or not and with a specific care focus for care related activities. The classification of jobs and activities as part of the care economy or not represents a major source of assumptions and decision points. We acknowledge there are many ways to classify some of these detailed activities and that others might have differing opinions about how best to classify them. We thus provide the crosswalks for full transparency, analysis, and review.
The first crosswalk presents the classification of formal occupations as parts of the care economy or not. This crosswalk uses federal standard occupational classification codes (SOC) and for each SOC, labels it as developmental care, daily living care, health care, or none.
## code occ_category
## <int> <char>
## 1: 10 MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
## 2: 20 MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
## 3: 30 MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
## 4: 100 MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
## 5: 110 MANAGEMENT, BUSINESS, SCIENCE, AND ARTS
## ---
## 454: 9800 MILITARY SPECIFIC
## 455: 9810 MILITARY SPECIFIC
## 456: 9820 MILITARY SPECIFIC
## 457: 9830 MILITARY SPECIFIC
## 458: 9999 NOT IN UNIVERSE (UNEMPLOYED OR NEVER WORKED)
## occ_name
## <char>
## 1: Chief executives and legislators/public administration
## 2: General and Operations Managers
## 3: Managers in Marketing, Advertising, and Public Relations
## 4: Administrative Services Managers
## 5: Computer and Information Systems Managers
## ---
## 454: Military Officer Special and Tactical Operations Leaders
## 455: First-Line Enlisted Military Supervisors
## 456: Military Enlisted Tactical Operations and Air/Weapons Specialists and Crew Members
## 457: Military, Rank Not Specified
## 458: NIU
## occ_label occ_care_focus
## <char> <char>
## 1: Chief executives and legislators/public administration none
## 2: Business Managers none
## 3: Business Managers none
## 4: Business Managers none
## 5: Business Managers none
## ---
## 454: Military none
## 455: Military none
## 456: Military none
## 457: Military none
## 458: NULL none
The second crosswalk presents the classification of informal time use activities as part of the care economy or not. This crosswalk uses the ATUS activity codes and for each activity, labels it as developmental care, daily living care, health care, or none.
## code activity developmental health daily_living paid_work
## <int> <char> <int> <int> <int> <int>
## 1: 10101 Sleeping NA NA NA NA
## 2: 10102 Sleeping NA NA NA NA
## 3: 10199 Sleeping NA NA NA NA
## 4: 10201 Self Grooming NA NA 1 NA
## 5: 10299 Self Grooming NA NA 1 NA
## ---
## 457: 181801 Traveling NA NA NA NA
## 458: 181899 Traveling NA NA NA NA
## 459: 189999 Traveling NA NA NA NA
## 460: NA Secondary Childcare 1 NA NA NA
## 461: NA Secondary Eldercare NA 1 NA NA
## formal_work child_care elder_care householdcare selfcare leisure sleeping
## <int> <int> <int> <int> <int> <int> <int>
## 1: NA NA NA NA NA NA 1
## 2: NA NA NA NA NA NA 1
## 3: NA NA NA NA NA NA 1
## 4: NA NA NA NA 1 NA NA
## 5: NA NA NA NA 1 NA NA
## ---
## 457: NA NA NA NA NA NA NA
## 458: NA NA NA NA NA NA NA
## 459: NA NA NA NA NA NA NA
## 460: NA NA NA NA NA NA NA
## 461: NA NA NA NA NA NA NA
## volunteering education
## <int> <int>
## 1: NA NA
## 2: NA NA
## 3: NA NA
## 4: NA NA
## 5: NA NA
## ---
## 457: NA NA
## 458: NA NA
## 459: NA NA
## 460: NA NA
## 461: NA NA
Now that we have investigated the variables across our different samples, we need to apply the various functions above to each of our datasets to recode them to fit the proper format. We start with the CPS Monthly data. The code below uses the ddi file to load in all CPS Monthly data and then applies the functions to recode the variables, assemble them in the correct order, and merge them with the activity coding data. It then saves the files as an rds file for future use.
## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
| Name | micro_cps |
| Number of rows | 53950525 |
| Number of columns | 66 |
| _______________________ | |
| Column type frequency: | |
| character | 23 |
| Date | 1 |
| factor | 2 |
| numeric | 40 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| month | 0 | 1.00 | 3 | 9 | 0 | 12 | 0 |
| age_category | 0 | 1.00 | 8 | 23 | 0 | 7 | 0 |
| prime_age | 0 | 1.00 | 9 | 17 | 0 | 3 | 0 |
| sex | 0 | 1.00 | 4 | 6 | 0 | 2 | 0 |
| hispan | 0 | 1.00 | 3 | 12 | 0 | 3 | 0 |
| race | 0 | 1.00 | 5 | 20 | 0 | 5 | 0 |
| race_ethnicity | 0 | 1.00 | 5 | 20 | 0 | 6 | 0 |
| marst | 0 | 1.00 | 3 | 31 | 0 | 4 | 0 |
| gender_parent | 0 | 1.00 | 5 | 11 | 0 | 5 | 0 |
| child_age | 0 | 1.00 | 3 | 15 | 0 | 5 | 0 |
| educ | 0 | 1.00 | 3 | 17 | 0 | 6 | 0 |
| empstat | 0 | 1.00 | 3 | 12 | 0 | 5 | 0 |
| laborstatus | 0 | 1.00 | 3 | 12 | 0 | 6 | 0 |
| absent | 0 | 1.00 | 2 | 13 | 0 | 4 | 0 |
| whyabsnt | 0 | 1.00 | 1 | 27 | 0 | 9 | 0 |
| wkstat | 0 | 1.00 | 3 | 10 | 0 | 4 | 0 |
| labforce | 0 | 1.00 | 3 | 22 | 0 | 3 | 0 |
| nilf_activity | 7061598 | 0.87 | 3 | 9 | 0 | 6 | 0 |
| telwrkpay | 51565932 | 0.04 | 3 | 11 | 0 | 3 | 0 |
| occ_category | 0 | 1.00 | 5 | 46 | 0 | 27 | 0 |
| occ_name | 0 | 1.00 | 3 | 156 | 0 | 454 | 0 |
| occ_label | 0 | 1.00 | 4 | 63 | 0 | 82 | 0 |
| occ_care_focus | 0 | 1.00 | 4 | 13 | 0 | 4 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| date | 0 | 1 | 1990-01-01 | 2024-09-01 | 2006-07-01 | 417 |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| region | 0 | 1 | FALSE | 9 | Sou: 9277084, Pac: 7471882, Eas: 6873523, Mid: 6011210 |
| statefip | 0 | 1 | FALSE | 51 | Cal: 4449160, Tex: 2756512, New: 2730754, Flo: 2308199 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 26975263.00 | 15574175.21 | 1 | 13487632.00 | 26975263.00 | 40462894.00 | 53950525.00 | ▇▇▇▇▇ |
| YEAR | 0 | 1.00 | 2006.16 | 9.86 | 1990 | 1998.00 | 2006.00 | 2014.00 | 2024.00 | ▇▇▇▇▆ |
| SERIAL | 0 | 1.00 | 34682.60 | 20192.19 | 1 | 17396.00 | 34562.00 | 51935.00 | 74625.00 | ▇▇▇▇▅ |
| MONTH | 0 | 1.00 | 6.48 | 3.45 | 1 | 3.00 | 6.00 | 9.00 | 12.00 | ▇▆▅▆▇ |
| CPSID | 0 | 1.00 | 20056711201795.79 | 98466309399.00 | 19881000000200 | 19970702770200.00 | 20051206270800.00 | 20140603148800.00 | 20240906935800.00 | ▇▇▇▇▆ |
| ASECFLAG | 49482961 | 0.08 | 2.00 | 0.00 | 2 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▇▁▁ |
| COMPWT | 13329726 | 0.75 | 1891.60 | 1567.45 | 0 | 358.41 | 1903.34 | 3102.71 | 34708.83 | ▇▁▁▁▁ |
| HWTFINL | 0 | 1.00 | 2234.37 | 1274.08 | 0 | 1200.53 | 2181.91 | 3116.30 | 34716.21 | ▇▁▁▁▁ |
| WTFINL | 0 | 1.00 | 2268.72 | 1336.03 | 0 | 1203.12 | 2201.96 | 3149.89 | 44747.54 | ▇▁▁▁▁ |
| REGION | 0 | 1.00 | 27.78 | 10.72 | 11 | 21.00 | 31.00 | 33.00 | 42.00 | ▅▆▁▇▆ |
| STATEFIP | 0 | 1.00 | 28.26 | 15.74 | 1 | 13.00 | 29.00 | 41.00 | 56.00 | ▇▆▆▇▆ |
| PERNUM | 0 | 1.00 | 2.16 | 1.33 | 1 | 1.00 | 2.00 | 3.00 | 25.00 | ▇▁▁▁▁ |
| CPSIDP | 0 | 1.00 | 20056711201798.03 | 98466309398.97 | 19881000000201 | 19970702770201.00 | 20051206270801.00 | 20140603148801.00 | 20240906935804.00 | ▇▇▇▇▆ |
| CPSIDV | 0 | 1.00 | 200567112017981.47 | 984663093989.67 | 198810000002011 | 199707027702011.00 | 200512062708011.00 | 201406031488011.00 | 202409069358041.00 | ▇▇▇▇▆ |
| AGE | 0 | 1.00 | 37.41 | 22.76 | 0 | 18.00 | 37.00 | 55.00 | 90.00 | ▇▇▇▆▂ |
| SEX | 0 | 1.00 | 1.52 | 0.50 | 1 | 1.00 | 2.00 | 2.00 | 2.00 | ▇▁▁▁▇ |
| HISPAN | 0 | 1.00 | 31.64 | 118.97 | 0 | 0.00 | 0.00 | 0.00 | 902.00 | ▇▁▁▁▁ |
| RACE | 0 | 1.00 | 148.56 | 142.48 | 100 | 100.00 | 100.00 | 100.00 | 830.00 | ▇▁▁▁▁ |
| MARST | 0 | 1.00 | 4.23 | 3.14 | 1 | 1.00 | 4.00 | 6.00 | 9.00 | ▇▂▁▅▃ |
| MOMLOC | 0 | 1.00 | 0.52 | 0.90 | 0 | 0.00 | 0.00 | 1.00 | 16.00 | ▇▁▁▁▁ |
| POPLOC | 0 | 1.00 | 0.36 | 0.75 | 0 | 0.00 | 0.00 | 0.00 | 16.00 | ▇▁▁▁▁ |
| SPLOC | 0 | 1.00 | 0.71 | 0.91 | 0 | 0.00 | 0.00 | 1.00 | 21.00 | ▇▁▁▁▁ |
| FAMSIZE | 0 | 1.00 | 3.21 | 1.68 | 1 | 2.00 | 3.00 | 4.00 | 25.00 | ▇▁▁▁▁ |
| famsize | 0 | 1.00 | 3.21 | 1.68 | 1 | 2.00 | 3.00 | 4.00 | 25.00 | ▇▁▁▁▁ |
| NCHILD | 0 | 1.00 | 0.56 | 1.01 | 0 | 0.00 | 0.00 | 1.00 | 9.00 | ▇▂▁▁▁ |
| nchild | 0 | 1.00 | 0.56 | 1.01 | 0 | 0.00 | 0.00 | 1.00 | 9.00 | ▇▂▁▁▁ |
| YNGCH | 0 | 1.00 | 73.13 | 40.08 | 0 | 21.00 | 99.00 | 99.00 | 99.00 | ▃▁▁▁▇ |
| EDUC | 0 | 1.00 | 64.78 | 39.58 | 1 | 32.00 | 73.00 | 91.00 | 125.00 | ▆▂▇▆▆ |
| EMPSTAT | 0 | 1.00 | 15.31 | 13.05 | 0 | 10.00 | 10.00 | 34.00 | 36.00 | ▃▇▁▁▅ |
| OCC2010 | 0 | 1.00 | 7096.88 | 3409.72 | 10 | 4250.00 | 9620.00 | 9999.00 | 9999.00 | ▂▂▂▁▇ |
| IND1990 | 0 | 1.00 | 307.07 | 358.86 | 0 | 0.00 | 20.00 | 701.00 | 952.00 | ▇▁▁▂▃ |
| AHRSWORKT | 0 | 1.00 | 560.62 | 478.34 | 1 | 40.00 | 999.00 | 999.00 | 999.00 | ▇▁▁▁▇ |
| ABSENT | 0 | 1.00 | 0.37 | 0.59 | 0 | 0.00 | 0.00 | 1.00 | 3.00 | ▇▃▁▁▁ |
| WHYABSNT | 0 | 1.00 | 0.15 | 1.14 | 0 | 0.00 | 0.00 | 0.00 | 15.00 | ▇▁▁▁▁ |
| WKSTAT | 0 | 1.00 | 58.16 | 41.60 | 10 | 11.00 | 60.00 | 99.00 | 99.00 | ▆▁▁▁▇ |
| LABFORCE | 0 | 1.00 | 1.30 | 0.79 | 0 | 1.00 | 2.00 | 2.00 | 2.00 | ▃▁▅▁▇ |
| CLASSWKR | 0 | 1.00 | 17.37 | 23.70 | 0 | 0.00 | 21.00 | 22.00 | 99.00 | ▇▇▁▁▁ |
| NILFACT | 7061598 | 0.87 | 88.46 | 29.91 | 1 | 99.00 | 99.00 | 99.00 | 99.00 | ▁▁▁▁▇ |
| DIFFCARE | 29975790 | 0.44 | 0.83 | 0.42 | 0 | 1.00 | 1.00 | 1.00 | 2.00 | ▂▁▇▁▁ |
| TELWRKPAY | 51565932 | 0.04 | 0.82 | 0.94 | 0 | 0.00 | 0.00 | 2.00 | 2.00 | ▇▁▂▁▆ |
The code below does the same thing applying the functions created to recode the data for the CPS ASEC variables. This code then assembles the data in the proper order and merges them with the activity data.
## Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
| Name | micro_asec |
| Number of rows | 6207057 |
| Number of columns | 64 |
| _______________________ | |
| Column type frequency: | |
| character | 21 |
| Date | 1 |
| factor | 2 |
| numeric | 40 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| month | 0 | 1 | 5 | 5 | 0 | 1 | 0 |
| age_category | 0 | 1 | 8 | 23 | 0 | 7 | 0 |
| prime_age | 0 | 1 | 9 | 17 | 0 | 3 | 0 |
| sex | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| hispan | 0 | 1 | 3 | 12 | 0 | 3 | 0 |
| race | 0 | 1 | 5 | 20 | 0 | 5 | 0 |
| race_ethnicity | 0 | 1 | 5 | 20 | 0 | 6 | 0 |
| marst | 0 | 1 | 7 | 31 | 0 | 3 | 0 |
| gender_parent | 0 | 1 | 5 | 11 | 0 | 5 | 0 |
| child_age | 0 | 1 | 3 | 15 | 0 | 5 | 0 |
| educ | 0 | 1 | 3 | 17 | 0 | 6 | 0 |
| empstat | 0 | 1 | 3 | 12 | 0 | 5 | 0 |
| laborstatus | 0 | 1 | 3 | 12 | 0 | 6 | 0 |
| absent | 0 | 1 | 2 | 13 | 0 | 4 | 0 |
| whyabsnt | 0 | 1 | 1 | 27 | 0 | 9 | 0 |
| wkstat | 0 | 1 | 3 | 10 | 0 | 4 | 0 |
| poverty | 0 | 1 | 3 | 26 | 0 | 5 | 0 |
| occ_category | 0 | 1 | 5 | 46 | 0 | 27 | 0 |
| occ_name | 0 | 1 | 3 | 156 | 0 | 454 | 0 |
| occ_label | 0 | 1 | 4 | 63 | 0 | 82 | 0 |
| occ_care_focus | 0 | 1 | 4 | 13 | 0 | 4 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| date | 0 | 1 | 1990-03-01 | 2024-03-01 | 2008-03-01 | 35 |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| region | 0 | 1 | FALSE | 9 | Sou: 1064521, Pac: 940960, Eas: 751093, Mou: 708900 |
| statefip | 0 | 1 | FALSE | 51 | Cal: 583792, Tex: 356065, New: 309136, Flo: 273960 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 3103529.00 | 1791823.16 | 1 | 1551765.00 | 3103529.00 | 4655293.00 | 6207057.00 | ▇▇▇▇▇ |
| YEAR | 0 | 1.00 | 2007.39 | 9.56 | 1990 | 2000.00 | 2008.00 | 2015.00 | 2024.00 | ▆▆▇▇▆ |
| SERIAL | 0 | 1.00 | 45647.39 | 26974.60 | 1 | 22449.00 | 44685.00 | 67543.00 | 99986.00 | ▇▇▇▇▅ |
| MONTH | 0 | 1.00 | 3.00 | 0.00 | 3 | 3.00 | 3.00 | 3.00 | 3.00 | ▁▁▇▁▁ |
| CPSID | 0 | 1.00 | 14348026771078.58 | 9049746509586.89 | 0 | 0.00 | 19990105297800.00 | 20101204144600.00 | 20240306932200.00 | ▃▁▁▁▇ |
| ASECFLAG | 0 | 1.00 | 1.00 | 0.00 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| HFLAG | 6007501 | 0.03 | 0.30 | 0.46 | 0 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| ASECWTH | 0 | 1.00 | 1672.94 | 1127.23 | 0 | 890.23 | 1534.18 | 2187.67 | 28654.31 | ▇▁▁▁▁ |
| pernum | 0 | 1.00 | 2.26 | 1.38 | 1 | 1.00 | 2.00 | 3.00 | 26.00 | ▇▁▁▁▁ |
| REGION | 0 | 1.00 | 28.22 | 10.74 | 11 | 21.00 | 31.00 | 41.00 | 42.00 | ▅▆▁▇▆ |
| STATEFIP | 0 | 1.00 | 27.88 | 15.91 | 1 | 13.00 | 28.00 | 41.00 | 56.00 | ▇▅▆▇▆ |
| PERNUM | 0 | 1.00 | 2.26 | 1.38 | 1 | 1.00 | 2.00 | 3.00 | 26.00 | ▇▁▁▁▁ |
| CPSIDP | 0 | 1.00 | 14347651772919.27 | 9049921692380.36 | 0 | 0.00 | 19990105295102.00 | 20101204119002.00 | 20240306932201.00 | ▃▁▁▁▇ |
| CPSIDV | 0 | 1.00 | 143476517729193.41 | 90499216923804.03 | 0 | 0.00 | 199901052951021.00 | 201012041190021.00 | 202403069322011.00 | ▃▁▁▁▇ |
| ASECWT | 0 | 1.00 | 1706.90 | 1174.71 | 0 | 892.81 | 1547.88 | 2242.25 | 44423.83 | ▇▁▁▁▁ |
| AGE | 0 | 1.00 | 35.21 | 22.33 | 0 | 16.00 | 34.00 | 52.00 | 90.00 | ▇▆▇▅▂ |
| SEX | 0 | 1.00 | 1.52 | 0.50 | 1 | 1.00 | 2.00 | 2.00 | 2.00 | ▇▁▁▁▇ |
| HISPAN | 0 | 1.00 | 42.44 | 131.56 | 0 | 0.00 | 0.00 | 0.00 | 902.00 | ▇▁▁▁▁ |
| RACE | 0 | 1.00 | 155.77 | 153.22 | 100 | 100.00 | 100.00 | 100.00 | 830.00 | ▇▁▁▁▁ |
| MARST | 0 | 1.00 | 3.70 | 2.34 | 1 | 1.00 | 4.00 | 6.00 | 6.00 | ▇▁▁▁▇ |
| MOMLOC | 0 | 1.00 | 0.59 | 0.97 | 0 | 0.00 | 0.00 | 1.00 | 17.00 | ▇▁▁▁▁ |
| momloc | 0 | 1.00 | 0.59 | 0.97 | 0 | 0.00 | 0.00 | 1.00 | 17.00 | ▇▁▁▁▁ |
| POPLOC | 0 | 1.00 | 0.42 | 0.84 | 0 | 0.00 | 0.00 | 1.00 | 18.00 | ▇▁▁▁▁ |
| FAMSIZE | 0 | 1.00 | 3.41 | 1.71 | 1 | 2.00 | 3.00 | 4.00 | 25.00 | ▇▁▁▁▁ |
| famsize | 0 | 1.00 | 3.41 | 1.71 | 1 | 2.00 | 3.00 | 4.00 | 25.00 | ▇▁▁▁▁ |
| NCHILD | 0 | 1.00 | 0.62 | 1.06 | 0 | 0.00 | 0.00 | 1.00 | 9.00 | ▇▂▁▁▁ |
| nchild | 0 | 1.00 | 0.62 | 1.06 | 0 | 0.00 | 0.00 | 1.00 | 9.00 | ▇▂▁▁▁ |
| YNGCH | 0 | 1.00 | 70.64 | 41.29 | 0 | 17.00 | 99.00 | 99.00 | 99.00 | ▃▁▁▁▇ |
| EDUC | 0 | 1.00 | 61.79 | 40.58 | 1 | 20.00 | 73.00 | 91.00 | 125.00 | ▇▂▇▆▆ |
| EMPSTAT | 0 | 1.00 | 14.61 | 12.97 | 0 | 10.00 | 10.00 | 32.00 | 36.00 | ▅▇▁▁▅ |
| OCC2010 | 0 | 1.00 | 7179.08 | 3386.32 | 10 | 4510.00 | 9999.00 | 9999.00 | 9999.00 | ▂▂▂▁▇ |
| IND1990 | 0 | 1.00 | 300.44 | 358.13 | 0 | 0.00 | 10.00 | 700.00 | 952.00 | ▇▁▁▂▂ |
| UHRSWORKT | 627549 | 0.90 | 516.62 | 482.77 | 0 | 40.00 | 997.00 | 999.00 | 999.00 | ▇▁▁▁▇ |
| AHRSWORKT | 0 | 1.00 | 569.96 | 477.54 | 1 | 40.00 | 999.00 | 999.00 | 999.00 | ▆▁▁▁▇ |
| ABSENT | 0 | 1.00 | 0.35 | 0.57 | 0 | 0.00 | 0.00 | 1.00 | 3.00 | ▇▃▁▁▁ |
| WHYABSNT | 0 | 1.00 | 0.13 | 1.08 | 0 | 0.00 | 0.00 | 0.00 | 15.00 | ▇▁▁▁▁ |
| WKSTAT | 0 | 1.00 | 59.31 | 41.45 | 10 | 11.00 | 99.00 | 99.00 | 99.00 | ▆▁▁▁▇ |
| EARNWT | 0 | 1.00 | 1365.70 | 3818.99 | 0 | 0.00 | 0.00 | 0.00 | 85013.19 | ▇▁▁▁▁ |
| INCWAGE | 0 | 1.00 | 23346324.69 | 42280815.29 | 0 | 0.00 | 25000.00 | 127000.00 | 99999999.00 | ▇▁▁▁▂ |
| POVERTY | 0 | 1.00 | 21.14 | 4.39 | 0 | 23.00 | 23.00 | 23.00 | 23.00 | ▁▁▁▁▇ |
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 2438022 130.3 4608092 246.1 4608092 246.1
## Vcells 3186371270 24310.1 6424343182 49013.9 6424343182 49013.9
The code below uses the functions created to recode the ATUS variables. This code then assembles the data in the proper order and merges them with the activity data. Following this, this code merges ATUS data with occupation data from the CPS Monthly data. For understanding formal care economy work, we rely on responses in the CPS Monthly data and use the CPSIDP variable to merge between the ATUS and CPS Monthlly datasets. The ATUS is conducted among a subset of individuals in the month when they leave the CPS Monthly data rotation. We use the data from the last month an individual is present in the CPS Monthly to identify their formal occupation status for the ATUS data.
## Use of data from IPUMS ATUS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 2441034 130.4 33275526 1777.2 41594407 2221.4
## Vcells 641776893 4896.4 7400984145 56465.1 6424343182 49013.9
| Name | micro_atus |
| Number of rows | 4740486 |
| Number of columns | 73 |
| _______________________ | |
| Column type frequency: | |
| character | 19 |
| Date | 2 |
| numeric | 52 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| age_category | 0 | 1.00 | 8 | 23 | 0 | 7 | 0 |
| prime_age | 0 | 1.00 | 9 | 17 | 0 | 3 | 0 |
| sex | 0 | 1.00 | 4 | 6 | 0 | 2 | 0 |
| hispan | 0 | 1.00 | 8 | 12 | 0 | 2 | 0 |
| race | 0 | 1.00 | 5 | 20 | 0 | 5 | 0 |
| race_ethnicity | 0 | 1.00 | 5 | 20 | 0 | 6 | 0 |
| marst | 0 | 1.00 | 7 | 31 | 0 | 3 | 0 |
| gender_parent | 0 | 1.00 | 5 | 11 | 0 | 5 | 0 |
| child_age | 0 | 1.00 | 3 | 15 | 0 | 5 | 0 |
| educ | 0 | 1.00 | 11 | 17 | 0 | 5 | 0 |
| empstat | 0 | 1.00 | 4 | 10 | 0 | 3 | 0 |
| poverty | 3040773 | 0.36 | 3 | 13 | 0 | 3 | 0 |
| KIDWAKETIME | 0 | 1.00 | 8 | 8 | 0 | 263 | 0 |
| KIDBEDTIME | 0 | 1.00 | 8 | 8 | 0 | 583 | 0 |
| START | 0 | 1.00 | 8 | 8 | 0 | 1440 | 0 |
| STOP | 0 | 1.00 | 8 | 8 | 0 | 1440 | 0 |
| activity | 146100 | 0.97 | 7 | 49 | 0 | 109 | 0 |
| act_care_focus | 0 | 1.00 | 6 | 13 | 0 | 4 | 0 |
| occ_care_focus | 0 | 1.00 | 4 | 13 | 0 | 4 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| date | 0 | 1 | 2003-01-01 | 2023-01-01 | 2011-01-01 | 21 |
| cps_date | 0 | 1 | 2002-08-01 | 2023-10-01 | 2011-04-01 | 255 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 2370243.50 | 1368460.58 | 1.0 | 1185122 | 2370243.50 | 3555364.75 | 4740486.00 | ▇▇▇▇▇ |
| YEAR | 0 | 1.00 | 2011.64 | 5.99 | 2003.0 | 2006 | 2011.00 | 2017.00 | 2023.00 | ▇▆▅▅▃ |
| SERIAL | 0 | 1.00 | 6164.44 | 3964.58 | 1.0 | 2945 | 5848.00 | 8841.00 | 20720.00 | ▇▇▆▁▁ |
| DAY | 0 | 1.00 | 3.97 | 2.31 | 1.0 | 2 | 4.00 | 6.00 | 7.00 | ▇▂▂▂▇ |
| WT06 | 155109 | 0.97 | 7577532.28 | 7235883.79 | 419471.7 | 3104725 | 5515035.66 | 9330912.03 | 209010030.47 | ▇▁▁▁▁ |
| WT20 | 4402397 | 0.07 | 8913317.08 | 8762416.19 | 0.0 | 3617274 | 6635796.94 | 11389433.39 | 137151707.75 | ▇▁▁▁▁ |
| CASEID | 0 | 1.00 | 20117019506409.80 | 59891260707.51 | 20030100013280.0 | 20061110060893 | 20110707112061.00 | 20170101170920.75 | 20231212232280.00 | ▇▆▆▅▅ |
| STRATA | 1607822 | 0.66 | 2725.88 | 1632.89 | -1.0 | 1200 | 2700.00 | 4100.00 | 5604.00 | ▇▇▆▇▆ |
| STATEFIP | 0 | 1.00 | 28.26 | 15.82 | 1.0 | 13 | 28.00 | 42.00 | 56.00 | ▇▆▆▇▆ |
| PERNUM | 0 | 1.00 | 1.00 | 0.00 | 1.0 | 1 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| CPSIDP | 0 | 1.00 | 20102795918058.71 | 60071242310.74 | 20010500956102.0 | 20050604643604 | 20100206669002.00 | 20150706792401.00 | 20230806450301.00 | ▇▆▆▅▃ |
| AGE | 0 | 1.00 | 47.94 | 17.69 | 15.0 | 34 | 46.00 | 62.00 | 85.00 | ▅▇▇▆▃ |
| SEX | 0 | 1.00 | 1.60 | 0.49 | 1.0 | 1 | 2.00 | 2.00 | 2.00 | ▆▁▁▁▇ |
| HISPAN | 0 | 1.00 | 115.22 | 40.37 | 100.0 | 100 | 100.00 | 100.00 | 250.00 | ▇▁▁▁▁ |
| RACE | 0 | 1.00 | 104.01 | 14.38 | 100.0 | 100 | 100.00 | 100.00 | 599.00 | ▇▁▁▁▁ |
| MARST | 0 | 1.00 | 2.81 | 2.07 | 1.0 | 1 | 1.00 | 4.00 | 6.00 | ▇▁▂▁▃ |
| HH_SIZE | 0 | 1.00 | 2.80 | 1.53 | 1.0 | 2 | 2.00 | 4.00 | 16.00 | ▇▁▁▁▁ |
| FAMINCOME | 0 | 1.00 | 65.86 | 225.94 | 1.0 | 8 | 12.00 | 15.00 | 998.00 | ▇▁▁▁▁ |
| HH_NUMADULTS | 0 | 1.00 | 1.90 | 0.79 | 0.0 | 1 | 2.00 | 2.00 | 12.00 | ▇▁▁▁▁ |
| NCHILD | 0 | 1.00 | 0.86 | 1.14 | 0.0 | 0 | 0.00 | 2.00 | 9.00 | ▇▃▁▁▁ |
| nchild | 0 | 1.00 | 0.86 | 1.14 | 0.0 | 0 | 0.00 | 2.00 | 9.00 | ▇▃▁▁▁ |
| YNGCH | 0 | 1.00 | 58.76 | 44.86 | 0.0 | 9 | 99.00 | 99.00 | 99.00 | ▆▁▁▁▇ |
| EDUC | 0 | 1.00 | 29.84 | 9.57 | 10.0 | 21 | 30.00 | 40.00 | 43.00 | ▂▆▁▆▇ |
| EMPSTAT | 0 | 1.00 | 2.50 | 1.88 | 1.0 | 1 | 1.00 | 5.00 | 5.00 | ▇▁▁▁▅ |
| POVERTY185 | 3040773 | 0.36 | 20.96 | 17.52 | 10.0 | 11 | 20.00 | 20.00 | 99.00 | ▇▁▁▁▁ |
| LINENO | 0 | 1.00 | 1.00 | 0.00 | 1.0 | 1 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| OCC2 | 0 | 1.00 | 3893.18 | 4788.38 | 110.0 | 127 | 150.00 | 9999.00 | 9999.00 | ▇▁▁▁▅ |
| OCC_CPS8 | 0 | 1.00 | 36496.01 | 45575.41 | 10.0 | 2830 | 5400.00 | 99999.00 | 99999.00 | ▇▁▁▁▅ |
| EARNWEEK | 0 | 1.00 | 45587.07 | 49297.07 | 0.0 | 680 | 2019.23 | 99999.99 | 99999.99 | ▇▁▁▁▆ |
| HRSWORKT_CPS8 | 0 | 1.00 | 4243.91 | 4919.92 | 1.0 | 40 | 50.00 | 9999.00 | 9999.00 | ▇▁▁▁▆ |
| SPEMPSTAT | 0 | 1.00 | 45.10 | 48.35 | 1.0 | 1 | 3.00 | 99.00 | 99.00 | ▇▁▁▁▆ |
| ECPRIOR | 2228561 | 0.53 | 1.46 | 10.86 | 0.0 | 0 | 0.00 | 0.00 | 99.00 | ▇▁▁▁▁ |
| ACTLINE | 0 | 1.00 | 11.84 | 8.41 | 1.0 | 5 | 10.00 | 16.00 | 91.00 | ▇▂▁▁▁ |
| ACTIVITY | 0 | 1.00 | 89041.53 | 76474.29 | 10101.0 | 20201 | 110101.00 | 120312.00 | 509999.00 | ▇▇▁▁▁ |
| DURATION_EXT | 0 | 1.00 | 83.22 | 126.56 | 1.0 | 15 | 30.00 | 90.00 | 1472.00 | ▇▁▁▁▁ |
| DURATION | 0 | 1.00 | 74.46 | 100.87 | 1.0 | 15 | 30.00 | 90.00 | 1350.00 | ▇▁▁▁▁ |
| SCC_ALL_LN | 0 | 1.00 | 6.69 | 27.27 | 0.0 | 0 | 0.00 | 0.00 | 1195.00 | ▇▁▁▁▁ |
| SCC_OWN_LN | 412611 | 0.91 | 5.71 | 24.66 | 0.0 | 0 | 0.00 | 0.00 | 1195.00 | ▇▁▁▁▁ |
| SEC_ALL_LN | 2228561 | 0.53 | 0.47 | 8.27 | 0.0 | 0 | 0.00 | 0.00 | 1097.00 | ▇▁▁▁▁ |
| developmental | 0 | 1.00 | 0.05 | 0.22 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| health | 0 | 1.00 | 0.01 | 0.08 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| daily_living | 0 | 1.00 | 0.27 | 0.44 | 0.0 | 0 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| paid_work | 0 | 1.00 | 0.07 | 0.26 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| formal_work | 0 | 1.00 | 0.04 | 0.20 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| child_care | 0 | 1.00 | 0.07 | 0.26 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| elder_care | 0 | 1.00 | 0.01 | 0.11 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| householdcare | 0 | 1.00 | 0.19 | 0.39 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| selfcare | 0 | 1.00 | 0.24 | 0.43 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| leisure | 0 | 1.00 | 0.21 | 0.41 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| sleeping | 0 | 1.00 | 0.11 | 0.32 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| volunteering | 0 | 1.00 | 0.01 | 0.08 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| education | 0 | 1.00 | 0.01 | 0.08 | 0.0 | 0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
The code in this section has provided the methodology for downloading, cleaning, summarizing, and saving the data used for developing the statistics in The Care Board project. This section is essential for understanding the replication methodologies to go from completely raw data to the data used to compile our statistics. The code above saves 3 separate datasets for the CPS monthly files, the yearly CPS ASEC files, and yearly ATUS files. Upon the complete running of this code, three datasets should be written into the proper working directory.
Understanding how much time in a given day a person requires care is essential for accurately assessing the scale and structure of the care economy. While caregiving can be measured using a variety of methods, time-based measurements provide a more granular and human-centered view of care than simple headcounts or categorical designations of dependency. They help differentiate between levels of care intensity and the allocation of resources across health and social service systems. Furthermore, time-use data allows researchers and policymakers to model scenarios of unmet care needs and evaluate how demographic shifts, such as population aging or rising rates of disability, will affect demand for care services in both formal and informal sectors.
This information is also foundational for estimating the economic value of caregiving. Many individuals who require care do not receive it through formal markets but rely instead on family members or community networks. Without quantifying the time demands associated with caregiving needs, it is difficult to assess the hidden costs borne by households or to design equitable social support programs. Accurately capturing time needs can reveal care deficits and stress points in existing systems, thus informing policies aimed at improving accessibility, equity, and wellbeing outcomes for care recipients.
Equally important is understanding how much time the average person spends providing care on a daily basis. Capturing this data highlights the often-invisible labor that sustains households and communities, particularly the unpaid and gendered work frequently carried out by women. By quantifying caregiving as a time commitment, researchers can estimate its opportunity costs—such as foregone earnings, education, or leisure—and more comprehensively assess its impact on individual wellbeing and economic productivity. This information is crucial for designing interventions, from tax credits to caregiver respite programs, that acknowledge and support the vital contributions of care providers.
Moreover, time-use data on caregiving offers a powerful tool for comparative policy analysis. It enables cross-population comparisons, tracking how caregiving varies by age, gender, socioeconomic status, and family structure. It also facilitates longitudinal studies of how caregiving responsibilities evolve over the life course or in response to social policy changes. Embedding time-based care metrics into national surveys and economic accounts can help integrate care work into the broader understanding of labor markets, social reproduction, and economic development, thereby strengthening the case for investing in the care economy as both a moral and strategic priority.
The first section of The Care Board, What Is the Care Economy, uses a variety of methods to create measurements of average time we need care and how much time we have available to provide care. The code in the next few sections comes with many assumptions and simplifications. It is vital that in the futue more data be collected to provide better estimates on the following outcomes, but for now, this code represents our most complete work on estimating care need and provision by individuals across society.
The first piece of data we need is information about the amount of people throughout the U.S. by age group. Care need and the provision of care differ dramatically based on life stage. We use the code below and the 2024 CPS ASEC data to create population estimates for age groups between 0 and 85. This R code chunk is doing the following steps:
Creating an age reference list:
It creates a dataframe called age_list with one column
(age) containing every age from 0 to 85, one row per
age.
Loading and filtering age data:
ASECdata.csv) into a dataframe called
age_data.max(YEAR)).AGE) and person-level weight
(ASECWT) columns.Calculating weighted population by age:
Joining with full age list:
Performs a full join between age_list (ages 0 to 85) and the summarized
population data
Handling missing population values:
Fills in any missing population values with 0 using
coalesce().
In plain language:
This code creates a complete list of ages from 0 to 85 and
attaches population estimates from the latest year of survey
data.
To ensure as much accuracy as possible, we look at the age distribution from this using the plot below.
Now that we have the population count for each age group, our next step is to pair this data with information on care and care provision for each age. We use a combination of assumptions and data informed analysis to create these values. We call this section the market datum table. We start by creating a blank table for this where each group is paired with the three possible care focuses.
Once we have this information, we need to load in the ATUS data that we organized. When studying time use, ATUS data will act as our primary source of analysis. We use a variety of methods to convert this data into our desired format. This R code chunk is doing the following steps:
Loading time use data:
It reads a CSV file (ATUSdata.csv) into a dataframe called
atus, keeping only selected columns related to time use,
demographic info, and care activities.
Cleaning column names:
Standardizes column names to lowercase with underscores.
Identifying recent years (excluding 2020):
years_include.Filtering to recent years and creating new variables:
act_care_focus column to
care_focus for simplicity.care_job: a binary indicator for whether the activity
involved care (1) or not (0), based on the focus
column.weight: adjusts the person-level weight to reflect a
daily average over 5 years.work_time: calculates time spent in paid care
work by multiplying duration with the
paid_work and care_job indicators.In plain language:
This code loads and filters time-use survey data to include only
the last 5 valid years, calculates daily weights, flags care-related
paid work, and computes how much time individuals spent on that
work.
| Name | atus |
| Number of rows | 831170 |
| Number of columns | 21 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 17 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| marst | 0 | 1.00 | 7 | 31 | 0 | 3 | 0 |
| activity | 27630 | 0.97 | 7 | 49 | 0 | 109 | 0 |
| care_focus | 0 | 1.00 | 6 | 13 | 0 | 4 | 0 |
| occ_care_focus | 0 | 1.00 | 4 | 13 | 0 | 4 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 2020.44 | 1.86 | 2018.00 | 2019.00 | 2021.00 | 2022.00 | 2023.0 | ▇▁▃▃▃ |
| caseid | 0 | 1 | 20205036905740.19 | 18635631251.34 | 20180101180006.00 | 20190201191853.00 | 20210403210257.00 | 20220807220724.00 | 20231212232280.0 | ▇▅▃▆▆ |
| wt06 | 0 | 1 | 10750091.20 | 9687672.98 | 719246.62 | 4787913.42 | 7996184.57 | 13275764.22 | 194366929.6 | ▇▁▁▁▁ |
| actline | 0 | 1 | 11.35 | 7.99 | 1.00 | 5.00 | 10.00 | 16.00 | 72.0 | ▇▂▁▁▁ |
| hh_size | 0 | 1 | 2.60 | 1.45 | 1.00 | 2.00 | 2.00 | 4.00 | 14.0 | ▇▃▁▁▁ |
| age | 0 | 1 | 51.69 | 18.19 | 15.00 | 37.00 | 52.00 | 67.00 | 85.0 | ▃▇▆▇▅ |
| nchild | 0 | 1 | 0.71 | 1.08 | 0.00 | 0.00 | 0.00 | 1.00 | 9.0 | ▇▂▁▁▁ |
| paid_work | 0 | 1 | 0.07 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 1.0 | ▇▁▁▁▁ |
| child_care | 0 | 1 | 0.06 | 0.24 | 0.00 | 0.00 | 0.00 | 0.00 | 1.0 | ▇▁▁▁▁ |
| elder_care | 0 | 1 | 0.01 | 0.11 | 0.00 | 0.00 | 0.00 | 0.00 | 1.0 | ▇▁▁▁▁ |
| sleeping | 0 | 1 | 0.12 | 0.32 | 0.00 | 0.00 | 0.00 | 0.00 | 1.0 | ▇▁▁▁▁ |
| duration | 0 | 1 | 77.61 | 103.75 | 1.00 | 15.00 | 30.00 | 90.00 | 1310.0 | ▇▁▁▁▁ |
| scc_all_ln | 0 | 1 | 5.48 | 24.81 | 0.00 | 0.00 | 0.00 | 0.00 | 900.0 | ▇▁▁▁▁ |
| sec_all_ln | 0 | 1 | 0.53 | 8.74 | 0.00 | 0.00 | 0.00 | 0.00 | 922.0 | ▇▁▁▁▁ |
| care_job | 0 | 1 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.0 | ▇▁▁▁▂ |
| weight | 0 | 1 | 5890.46 | 5308.31 | 394.11 | 2623.51 | 4381.47 | 7274.39 | 106502.4 | ▇▁▁▁▁ |
| work_time | 0 | 1 | 2.37 | 25.73 | 0.00 | 0.00 | 0.00 | 0.00 | 1195.0 | ▇▁▁▁▁ |
We then use the ATUS hierarchical data to assign more variables to the individuals and activities outlined above. Specifically, we want to link activities using the RELATEW variable which is used to identify with whom an activity was done. This R code chunk is doing the following steps:
Reading IPUMS metadata:
Loads the IPUMS DDI (data description) XML file using
read_ipums_ddi(). This file describes the structure and
variables of the household microdata.
Reading IPUMS household microdata:
Loads the actual household-level data using
read_ipums_micro() based on the structure defined in the
DDI file. The resulting dataframe is called
atus_hh.
Cleaning column names:
Standardizes the column names in the household data
(atus_hh) for consistency and easier use.
Merging household data with individual-level ATUS data:
caseid, actlinew,
relatew) from atus_hh into the existing
atus dataframe.caseid and
actline (from the individual-level data) aligning with
actlinew (from the household data).In plain language:
This code reads in additional household-level data from IPUMS
and merges it with individual time-use data, allowing each activity
record to be linked with household relationship
information.
## Use of data from IPUMS ATUS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
Following this we create a function that is used to override data as we desire. This function takes in a care interval, age range, and column name to override it with established data. The reason we need this function is that we are not confident that the ATUS data alone can provide us with the necessary true outcomes without us using significant assumptions. For example, ATUS data does not look at respondents under the age of 15. Additionally, our methodology is likely somewhat biased for individuals at specifically high age ranges, such as those in their 80s. In these cases, we want to override the data with informed assumptions to improve the distribution. This R function is doing the following steps:
prepare_overrides with three
inputs:
care_interval: a named list where each element
corresponds to an age range and contains care-related interval data
(e.g., time spent on care by type).age_ranges: a dataframe that maps
age_range labels to specific ages.col_name: the name to give to the new output column
created from the interval values.lapply() to loop over each named
age_range in care_interval.age_range: the name of the group (e.g., “0–4”,
“65+”)care_focus: the specific type of care activityinterval: the numeric value (e.g., minutes per
day)care_override) using rbind.care_override dataframe with
age_ranges to expand each age range into its individual
ages (e.g., mapping “0–4” to 0, 1, 2, 3, 4).relationship = "many-to-many" to allow for
multiple age mappings per group.age, care_focus, and
interval columns.interval column to whatever is passed in
via col_name.col_name
value to each age and care focus type.In plain language:
This function transforms age-group-based care interval data into a
long-form table that maps each individual age to a specific type of care
activity and its associated value, labeling that value with a
customizable column name.
Now that we have all of the prep work done, we use the code below to assign care to different age groups. Our methodology for computing thisis not simple. At its core, the ATUS methodology provides information on care provision NOT the amount of care an individual needs. We need to be able to use this data to understand how much care a person needs. To do this, we look at cases where the amount of care supplied will be equal to the amount of care demanded. We filter the data for each age group to look only at individuals who live alone and in their entire day spend 0 total minutes providing care to children or elderly adults. We further limit the data to only those activities done alone.
This subset group of individuals are most likely to have a situation where care supply and care demand are equal within their home in the sense that all care for the individual is being done by the individual. These individuals are providing care only to themselves and receiving care only from themselves. As such, by measuring the amount of care provided by this subset we are also measuring the amount of care they might demand. Thus, we can use this methodology to create a measurement of estimated care for each age group. This R code chunk is doing the following steps:
Setting up age loop:
It creates a list of ages from the age_modified dataframe
and initializes an empty list (needs_atus_calc) to store
results.
Looping through each age: For each age
a, it filters the atus dataset to identify
individuals with the following characteristics:
scc_all_ln
and sec_all_ln are all 0)child_care and
elder_care are all 0)care_focus != "non-care")hh_size == 1)relatew == 100)a (ages a-2 to a+2)Summarizing individual-level data:
caseid and care_focus group:
Calculating need interval estimates:
care_focus group, calculates the
weighted mean duration across individuals to estimate
how much care time is associated with that activity for that age
group.age to the summary row.Storing results:
needs_atus_calc list for that age.Combining all results:
needs_atus_calc).In plain language:
This code estimates how much time people living alone at each
age (in 5-year bands) typically receive in care-related activities—under
the assumption they don’t provide care to others. It uses this as a
proxy for care needs across different age groups.
Finally, we use the override function that we defined above to input our assumptions. The code below inserts minutes for health, developmental, and daily care for the age groups of 0-5, 6-12, 13-17, 75-84, and over 85. These specific assumption values are based on informed thought processes as opposed to actual data. For instance, many state laws require that those under the age of 8 years old are supervised 24 hours a day. This R code chunk is doing the following steps:
age_ranges that maps specific
ages to age range categories (e.g., age 0–5 is “age_0to5”, age 6–12 is
“age_6to12”, etc.).need_interval where each entry
corresponds to an age range and specifies the time (in minutes) needed
per day for:
prepare_overrides() to transform the structured
need_interval list into a tidy dataframe
(needs_ku_override) where:
need_override) for
that type of careIn plain language:
This code defines how much care time people in specific age
groups typically need each day for health, development, and daily
living. It converts that information into a usable table that links each
age to specific care needs—ready for analysis or
visualization.
Now that we have our data on the average amount of care required within age groups, we replicate the above methodology to measure the average amount of time a group provides care in a day. The main difference between this code and the above is that we do not limit ourselves to only looking at individuals living alone. Instead, we look at all individuals in an age group and measure the average amount of time they spend in all care related activities. These activities could include selfcare, primary care to others, care in a job, and secondary care to a child or elder. This code analyzes on average, how much time a group spends on providing care. This R code chunk is doing the following steps: This R code chunk is doing the following steps:
Setting up for age-based analysis:
age) from
age_modified.provision_atus_calc) to
store results.Looping through each age:
For each age a, it creates a combined dataset
(data) representing all types of care
provision, pulling from the atus dataset.
Combining different types of care time:
focus is not "none".scc_all_ln (secondary child care)."developmental" and uses
scc_all_ln as the care duration.sec_all_ln
(secondary elder care)."health" care.Filtering to 5-year age bands:
a (i.e., ages a-2 to
a+2).Summarizing care time per individual:
caseid) and care type
(care_focus), summing care durations and keeping the
weight.Calculating care provision estimates:
wtd.quantile()) of care time per care_focus
type, across all individuals in that age band.age to the result.Storing results:
provision_atus_calc list.Combining all results:
After looping through all ages, it merges the results into a single
dataframe.
In plain language:
This code estimates how much care people of each age typically
provide—including formal work, informal care, and secondary
care (like watching kids or elders while doing something else). It uses
weighted medians to summarize how care provision varies by age and
type.
Just as with assumptions related to care needs for children and the elderly, we make assumptions for care provision. The code chunk below outlines the assumptions and utilizes the override function to create the values.
We now have our final data on care and care provision by group. We want to use a few more steps to finalize this data so it is ready for presentation. We start by combining the various data frames together into a single market datum file.
We then provide this same information as a plot to help visualise it.
One of the weaknesses of our methodology is that it is prone to outliers. We can see that in the above hump around the age of 54. This method is also highly reliant on our assumptions around the low and high ends. To help account for this, we utilize a smoothing function to help create a smoother density curve. We generally keep the smoothign function relativly weak to preserve as much as the underlying patterns as possible.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `smoothed_need = predict(...)`.
## ℹ In group 1: `care_focus = "daily_living"`.
## Caused by warning:
## ! `cur_data()` was deprecated in dplyr 1.1.0.
## ℹ Please use `pick()` instead.
Our final step is next to merge this final market datum information with the data on age. We merge these two datasets together and export it for final analysis.
The next section of statistics we calculate is our “care providers” section. This section aims to identify whom within society provides care. The main outcome of this section is to identify care providers across both informal and formal sectors of care and among different care focuses (developmental, daily living, and health).
Understanding who provides care within society is a cornerstone for analyzing and strengthening the care economy. Care takes place across a continuum of formal and informal settings, including professional healthcare and social service environments, as well as unpaid support provided by family, friends, and community members. Quantifying the individuals engaged in caregiving—particularly by gender and parenthood status—offers critical insight into how care responsibilities are distributed and how this distribution reflects broader social, economic, and policy structures. Without this demographic detail, policymakers risk overlooking the inequities embedded in caregiving systems and the hidden labor that sustains households and communities.
Gender and parenthood status are especially salient categories for measuring who provides care, as these factors strongly influence both the likelihood of providing care and the intensity of caregiving responsibilities both in the home and within formal sector care jobs. Women, for example, disproportionately shoulder unpaid caregiving work, whether as mothers, daughters, or other kin, contributing to persistent gender disparities in income, career advancement, and retirement security. Similarly, parents—especially single parents—may juggle dual care responsibilities for children and aging relatives, compounding their time and emotional burdens. By tracking the number of caregivers across these dimensions, researchers and policymakers can better understand the structural pressures facing different populations and design targeted interventions that support caregivers, such as paid leave policies, subsidized care services, or caregiver tax credits. Accurate demographic data on caregiving is essential not only for recognizing and valuing care work but also for fostering a more equitable and resilient care economy.
The first section of code below looks at the formal care economy. The code chunk below loads in the CPS ASEC data which is our primary data source for understanding the formal economy. We load in this data and select the correct columns as needed below.
Following this, we calculate the amount of time spent and the number of people working in each specific care focus by each of our gender and parenthood combinations.
Following this, we calculate the amount of time spent and the number of people working in each specific care focus by each of our gender and parenthood combinations. This R code chunk is doing the following steps:
asec dataset to include only
employed individuals.gender: lowercase version of the sex
column.provider_status: groups people into those with
children, without children, or
other, based on the gender_parent
label.time_use: labels work as “care” if it
has a focus, and “non_care” otherwise.care_focus: directly copies the focus
variable.care_type: assigns "formal" to indicate
this is paid labor.provider_attention: sets "active" to
describe the type of care (vs. passive or secondary).uhrsworkt)
are coded as 997 (time use fluctuates).uhrsworkt with 0
to avoid inflating results from NILF.population: total weighted number of people in each
group (using asecwt).provision_interval: total daily
minutes of formal care work, calculated as
uhrsworkt / 7 * 60 * asecwt (i.e., average minutes worked
per day, weighted).In plain language:
This code prepares and summarizes data on people who provide
formal (paid) care work. It calculates the total number of providers and
how much care time they give per day, grouped by gender, parent status,
and type of care provided.
The next step is to replicate the above table but to look at informal activities as opposed to only formal activities. To do this we use the ATUS data for informal care activities.
We now use the code below to calculate the number of people and the amount of time spent by these providers for each of the informal care status. For secondary child care, we include it as developmental care. For secondary elder care, we include it as health care. This R code chunk is doing the following steps:
yr_range <- atus_yr_range(atus) |>
filter(year == max(year))
atus_yr_range() to get a range
of valid years.cp_informal <- atus |>
pivot_longer(...) |>
filter(!is.na(duration))
atus dataset so that columns
duration, scc_all_ln (secondary child care),
and sec_all_ln (secondary elder care) are stacked into a
long format.cp_informal <- cp_informal |>
mutate(...)
gender: lowercase version of respondent’s sex.provider_status: identifies if a person has children or
not, based on gender_parent.care_type: all entries here are labeled as
"informal" (unpaid or household-based care).provider_attention: labels care as
"active" or passive ("passive_child" or
"passive_elder") depending on the source of the care
time.care_focus: sets the type of care (e.g.,
"developmental", "health", or
"none").time_use: categorizes each activity as
"care" or "non_care" based on
care_focus.weight: adjusts the person-level weight
(wt06) to a daily scale.cp_informal <- cp_informal |>
summarise(...) |>
summarise(...)
caseid) and activity category to
calculate their total time in each care type.provision_interval: total care time in minutes, scaled
to a 5-year average.population: the weighted count of individuals in each
group.In plain language:
This code organizes and summarizes all types of informal
care (including active and passive care) across recent years. It
groups people by gender and parenting status and calculates how much
time, on average, they spend providing different types of
care.
The next step is to create a demographics table for our care providers. Specifically for The Care Board, we are interested in the associated between gender and parenthood status. We use the code below to create a table showing the distribution of this demographic across society. This R code chunk is doing the following steps:
asec dataset and creates two new
classification variables:
gender: converts the sex variable to
lowercase for consistency.provider_status: categorizes individuals as:
"with_children" if they’re labeled as mothers or
fathers,"without_children" if labeled as non-mothers or
non-fathers,"other" for anyone else (e.g., ambiguous or missing
category).gender and
provider_status.asecwt) to estimate the total
population size of each group.gender and
provider_status for easier reading or display.In plain language:
This code estimates the size of different demographic groups (by
gender and parental status) using survey weights, providing a breakdown
of how many people fall into each category.
Collecting data on specific care-related activities across both the formal and informal sectors is vital for a comprehensive understanding of the care economy. By disaggregating care work into its constituent activities—such as nursing, home health assistance, cleaning, teaching, feeding, or emotional support—we gain a clearer picture of the full range and diversity of labor that underpins human well-being and social reproduction. Calculating metrics like the average time spent on each care activity and the corresponding wages earned allows us to evaluate not only the volume of care being provided, but also the economic value placed on different forms of care. This granular approach is particularly important in revealing the undervaluation of essential work, much of which is disproportionately performed by women, people of color, and immigrants.
Detailed activity-level data is also critical for crafting targeted and effective policy interventions. It allows researchers and decision-makers to identify gaps in compensation, exposure to physical and emotional strain, or mismatches between time demands and available support systems. For example, data showing long hours in unpaid eldercare alongside low wages for professional caregivers may point to the need for investment in long-term care infrastructure or wage floors in care occupations. Similarly, tracking time spent on caregiving tasks in educational or domestic contexts can highlight the blurred lines between formal employment and informal labor. Without this specificity, the care economy remains abstract, obscuring the real conditions of care workers and the needs of care recipients. Collecting and analyzing this level of detail helps make visible the full scope of care labor and strengthens the foundation for equitable and sustainable care systems.
The first step is to look at the formal care economy and to gather the needed data for it. Like most analysis of the formal care economy, this section uses CPS ASEC data. The first step is thus to load in the CPS ASEC data and manage it to the proper format. This R code chunk is doing the following steps:
.csv file (ASECdata.csv) into
the asec dataframe.occ_care_focus != "none"),empstat == "Employed").999 in
uhrsworkt (usual hours worked) with 0.activity_id:
occ_label (occupation label), it replaces
non-alphanumeric characters with hyphens and converts the result to
lowercase.label → name (likely for cleaner display
or consistency),focus → care_focus.In plain language:
This code loads and filters survey data to keep only employed
adults working in care occupations, cleans up the occupation labels for
use as unique identifiers, and standardizes some column names for
further analysis or visualization.
After prepping the data, we calculate three main statistics for each formal care economy activity. First, we calculate the amount of people who engaged in this activity during a day. Second, we calculate the total amount of time spent across this activity throughout society. Third, we calculate the median wage for people employed in this activity in society. This R code chunk is doing the following steps:
act_formal_population <- asec |>
summarise(
population = sum(asecwt),
.by = c(activity_id, name, care_focus)
)
asecwt) to estimate the total
number of employed individuals working in each care-related
occupation.activity_id (a cleaned-up identifier for the
occupation),name (the occupation label),care_focus (type of care: developmental, health,
etc.).act_formal_time <- asec |>
filter(uhrsworkt != 997) |>
summarise(...)
uhrsworkt == 997).asecwt * uhrsworkt * 60 / 7 (to convert weekly hours to
daily minutes).act_formal_med_wage <- asec |>
filter(incwage != 0 & incwage != 99999999) |>
summarise(...)
wtd.quantile) for each occupation group.activity_id, name, and
care_focus.act_formal_stats <- full_join(...) |>
full_join(...) |>
arrange(activity_id)
population,
time, and median_wage) into a single table
(act_formal_stats) using full joins.activity_id.In plain language:
This code summarizes key statistics for each formal care
occupation, including how many people work in it, how much time they
spend on care work per day, and what the median wage is. The result is a
comprehensive dataset ready for reporting or visualization.
We provide a few descriptive statistics below to show the total numbers of the final results.
## [1] "Number of formal care workers: 47,601,123"
## [1] "Total daily hours per day: 239,572,357"
## [1] "Average care hours per day per worker: 5"
We also provide below a plot showing specific statistics related to specific occupations.
After compiling the data on the formal care economy, we calculate the data desired for the informal care economy. Like other times when analyzing the informal care economy, we use the ATUS data. The first step is thus to load in and correctly format the data. This R code chunk is doing the following steps:
atus <- read.csv(...) |>
filter(activity != "Formal Work") |>
filter(YEAR >= 2018 & YEAR != 2020) |>
filter(AGE >= 18) |>
select(...) |>
clean_names()
ATUSdata.csv) into a
dataframe called atus."Formal Work" (keeping only
non-paid activities).yr_range <- atus_yr_range(atus) |>
filter(year == max(year))
atus_yr_range() to get a 5-year
rolling window.atus <- atus |>
filter(year >= yr_range$yr_start & year <= yr_range$year) |>
rename(...) |>
mutate(...)
act_care_focus → care_focusactivity_2 (presumably a derived activity name column)
→ activity_nameactivity_name: Sets this to "non-care" if
care_focus is "non-care", otherwise keeps the
original activity name.activity_id: Cleans up activity_name to be
lowercase and hyphen-separated (removing special characters), and
removes trailing hyphens. This creates a consistent identifier for each
activity.weight: Converts annual person weights
(wt06) to a daily average over a 5-year
period.In plain language:
This code prepares recent time-use data for analysis by
cleaning, filtering, and standardizing activity names. It focuses on
unpaid or informal care activities and creates identifiers and daily
weights for later use in summaries or visualizations.
We then calculate selected statistics for the informal care economy activities. To start with we calculate the number of people who engage in each activity. Then we calculate the time spent across the population in each category. This R code chunk is doing the following steps:
1. Create case_stats: time-use totals per person
and care activity
case_stats <- bind_rows(...)
It builds a combined dataset of individual-level care time from three sources:
duration for each individual (caseid)
by:
activity_idactivity_namecare_focusactivity_id to
"secondary-childcare"activity_name to
"Secondary Childcare"care_focus as
"developmental"scc_all_ln (secondary child care minutes) by
individualsec_all_lnactivity_id to
"secondary-eldercare"care_focus to "health"These three blocks are stacked together using
bind_rows() to create a single long-form dataset of care
activity time by individual.
2. Create activity_stats: aggregate care
provision stats
activity_stats <- case_stats |> summarise(...) |> filter(...) |> arrange(...)
case_stats to the activity
level, calculating:
provision_interval: total weighted time spent on that
activity across all peoplepopulation: total weighted number of people
representedactivity_id, activity_name, and
care_focus"non-care" activities to focus on actual
caregivingactivity_idIn plain language:
This code calculates how much time people spend on different
types of unpaid or informal caregiving activities—including both direct
and secondary care—and summarizes it by activity type. It produces a
table that shows total care minutes and population size for each
category.
Just as with the formal care economy, we want to add a column this this section for the median wages. However, people working in the informal care economy do not earn a wage. To solve for this, we create a shadow wage or “expected income” for each activity. A shadow wage represents the roughly equivalent wage that this activity earns when done in the formal care economy as a full-time, year-round job. The creation of this shadow wage or “expected income” can be useful for us comparing tasks between the formal and informal care economy.
In order to create this shadow wage or “expected income,” we need to pair activities in the formal and informal economy. We use a crosswalk that pairs activity codes with occupation codes. This crosswalk can be found below.
We then load in a new version of the CPS ASEC data which we will use to find wages associated with the cross walked activities.
Finally, we use the crosswalk along with the CPS ASEC data to assign a wage for each of the informal care activities. This R code chunk is doing the following steps:
1. Initialize storage and loop over activity types
df <- list()
activity <- act_cross$activity
df to hold results.act_cross, a lookup table that likely maps each care
activity to an occupational code range.2. Loop through each activity and calculate median wage
for(sel_activity in activity) {
...
}
For each care activity (sel_activity), it:
codes <- act_cross |> filter(activity == sel_activity)
OCC2010 codes associated
with the current activity.asec data to those
occupations:filter(occ2010 >= codes$occ_code_start & occ2010 <= codes$occ_code_end)
summarise(
median_wage = wtd.quantile(incwage, weights = asecwt, probs = 0.5)
)
incwage) for those workers, using survey
weights (asecwt).mutate(activity_name = sel_activity) |>
relocate(activity_name)
df[[sel_activity]] <- ...
median_wage <- bind_rows(df)
In plain language:
This code calculates the median wage for each care activity type
by mapping activities to occupation codes, filtering for those jobs in
the dataset, and summarizing income data. The result is a table showing
what workers in each type of care activity typically earn.
Finally, we combine the different data together, write it to the app, and prepare it for download.
The Gini coefficient is a widely used statistical measure of inequality within a distribution, commonly applied to income or wealth. It ranges from 0 to 1, where 0 represents perfect equality (everyone has the same amount) and 1 indicates perfect inequality (one person has everything, and everyone else has nothing). The Gini coefficient is often visualized through a Lorenz curve, which plots the cumulative share of a resource (like income or jobs) against the cumulative share of the population. The further the Lorenz curve deviates from the line of perfect equality, the higher the Gini coefficient, and the more unequal the distribution.
In the context of the care economy, the Gini coefficient offers a powerful lens for understanding the geographic or demographic distribution of care-related jobs (e.g., childcare workers, home health aides, elder care providers) relative to the population in need of care (such as young children, elderly individuals, or people with disabilities). A low Gini coefficient in this setting would suggest that care jobs are relatively evenly distributed among communities based on their level of need, indicating a more equitable alignment of service availability. Conversely, a high Gini coefficient implies that care jobs are concentrated in certain areas or populations, leaving other high-need areas underserved. This kind of analysis is especially important for identifying care deserts—areas with a high demand for care services but few available workers—so policymakers and planners can better target resources and interventions to reduce inequality and improve access to essential care. The Gini Coefficient of Formal Care measures how formal care jobs are geographically distributed across the U.S. among those individuals at risk of needing care. A Gini Coefficient of 0 would indicate perfect equality, meaning formal care jobs are evenly located in areas where at risk individuals reside. A higher Gini Coefficient signals a greater mismatch in the availability of care by location. A Gini Coefficient usually measures income inequality, but here we use it to measure spatial inequality of care services to the population most at-risk of needing those services.
To calculate the GINI coefficient, we use county-level data on employment in the care economy coupled with the distribution of population. We start by loading in data tables obtained from the U.S. Census Bureau and data from the Quarterly Census of Employment and Wages (QCEW). Both data sources include county-level data.
After creating these statistics, we can calculate the GINI coefficient through these data utilize the gini function previously loaded in.
The Care Ratio is a novel demographic and economic metric designed to quantify balance between care providers and care recipients within a given population. Building on the logic of a traditional dependency ratio used in demographic studies, the Care Ratio advances the concept by incorporating both the diversity of caregiving roles and the differentiated needs of care recipients. The numerator represents the population of potential caregivers, stratified and weighted according to their caregiving contributions. This includes formal sector care workers, unpaid caregivers such as homemakers, and a residual category of individuals not formally engaged in care but who may nonetheless contribute to informal care networks. The denominator consists of the at-risk care-dependent population, including children, the elderly, and individuals with disabilities—each subgroup weighted by the intensity or frequency of care they typically require. This framework provides a more nuanced picture of the care landscape than traditional economic or demographic indicators.
The Care Ratio is critical for the economics of care and related demographic analyses because it brings into focus the structural balance—or imbalance—between those who provide care and those who depend on it. In an era of aging populations, declining fertility rates, and shifting labor market dynamics, the burden of care is increasingly a central challenge for societies. The Care Ratio offers a standardized, comparative tool that can be used to assess the sustainability of care systems, identify regions or groups at risk of care deficits, and inform social policy aimed at redistributing care labor more equitably. By moving beyond simple counts and integrating the complexity of care work and need, this measure helps bridge gaps between demographic modeling, social policy, and lived experience, providing a foundation for developing more responsive and equitable care infrastructures.
We start by loading in a variety of datasets related to age, disability, and employment by county.
We then load in data from previous steps to understand the distribution of need and provision of these groups. This comes in part from the CPS Monthly data which has a specific variable to code homemakers and in part from the market datum, which was calculated above and discusses care need and provision by age.
The first thing we do is calculate the numerator, which represents the population of weighted care providers. The code below provides the methods used to create this numerator. This R code chunk is doing the following steps:
1. Initial setup:
Denominators = {}
Years = {}
Denominators: calculated need-based population values
for each yearYears: the corresponding years2. Looping over each year in YEARS: For
each year yr, the script performs the following
operations:
3. Filter relevant data for that year:
ages_temp <- ages |> filter(year == yr)
disability_temp <- disability |> filter(Year == yr)
4. Get population counts and average weights for each care-needing group:
under5) and average care weight
(under5_W)Denom (might be unused or omitted
intentionally).DisabUnd18)DisabAdult)DisabElder)market_datumDenom = under5*under5_W +
five_thirteen*five_thirteen_W +
sixtyfive_sixtynine*sixtyfive_sixtynine_W +
seventy_seventyfour*seventy_seventyfour_W +
seventyfive_plus*seventyfive_plus_W +
child_disabled*child_disabled_W +
adult_disabled*adult_disabled_W +
elder_disabled*elder_disabled_W
Denominators = append(Denominators, Denom)
Years = append(Years, yr)
In plain language:
This code calculates a year-by-year estimate of total care need
in the population by summing weighted population counts across key age
and disability groups. It assigns higher weight to those expected to
need more care (like young children or disabled individuals) and stores
the results for use in later analyses or visualizations.
## [1] 277389077 276814969 278836176 280975623
Now that we have the denominators we need to calculate the numerators. The code below is utilized to do this. This R code chunk is doing the following steps:
1. Setup:
Numerators = {}
Years = {}
W = c(1.5, 0.5, 1)
Numerators to store the total care provision
capacity for each year.Years to track which year each value corresponds
to.W:
1.5 for formal care workers0.5 for non-care industry workers1 for homemakers2. Loop through each year in YEARS: For
each year yr, it performs:
3. Filter data to that year:
ages_temp <- ages %>% filter(year == yr)
disability_temp <- disability %>% filter(Year == yr)
formal_temp <- formalsector %>% filter(year == yr)
cps_temp <- data %>% filter(YEAR == yr)
4. Estimate total population:
population <- sum(ages_temp$POPESTIMATE)
5. Categorize and count care-relevant workforce:
careworkers <- formal_temp %>%
filter(industry_code != 10)
careworkers <- sum(careworkers$IndustryEmployment)
workingnoncare <- formal_temp %>%
filter(industry_code == 10)
workingnoncare <- sum(workingnoncare$IndustryEmployment) - careworkers
careworkers to avoid double-counting.Homemakers <- sum(cps_temp$WTFINL / length(unique(cps_temp$month)))
cps_temp dataset that only includes them).6. Calculate and store care supply:
Numer <- careworkers*W[1] + workingnoncare*W[2] + Homemakers*W[3]
Numerators <- append(Numerators, Numer)
Years <- append(Years, yr)
Numerators and the year to
Years.In plain language:
This code estimates the supply of potential caregivers in each
year by counting formal care workers, other workers, and homemakers—then
weighting them based on how much care they are assumed to provide. It
builds a year-by-year summary of care capacity across the workforce and
households.
## [1] 112386298 107788401 113401382 115482838
We finalize the Care Ratio by dividing the numerators over the denominators. This gives us the final Care Ratio for export.
The Sandwich Generation refers to a group of adults who are simultaneously caring for their own children while also providing care or support to aging parents. This dual responsibility places unique emotional, financial, and time burdens on caregivers, often leading to stress, work-life conflict, and economic strain. In the economics of care, the sandwich generation exemplifies how unpaid and invisible care labor supports the functioning of both the family and broader society. Understanding this is crucial as demographic shifts, such as increased life expectancy and delayed childbearing, intensify these care demands.
Measuring the size and characteristics of the sandwich generation is essential for informing public policy, labor protections, and social support systems. Capturing accurate numbers and understanding the demographic profile of this group, such as gender, income, employment status, and race/ethnicity, can help reveal the hidden costs of informal care and shape interventions that better support multigenerational caregivers. Recognizing their role is vital, not just for their wellbeing, but also for sustaining the broader care economy.
This section provides the code and outcomes used to provide the statistics related to the care economy. To understand the sandwich generation, we use ATUS data. We only use years after 2010 because the year 2011 is the first year to ask questions about secondary elder care activities. The chunk below loads in the data and selects the requested variables.
After loading in the data, we need to identify sandwiched individuals. To do this we calculate the total amount of time that each individual with a child under age 10 spends in elder care. We consider someone to be sandwiched in the case where the following conditions are met.
These assumptions provide a conservative estimate by limiting the count of people in the sandwich generation to not include individuals with older or adult children. Inclusion of these groups would lead to larger estimates of the size of the sandwich generation.
## `summarise()` has grouped output by 'year', 'date', 'caseid', 'wt06', 'yngch'.
## You can override using the `.groups` argument.
Following this we calculate the data needed. This data is specifically the count and proportion of individuals who are labeled as “sandwiched.” The code below uses a 5-year rolling average method to move through the ATUS data. ATUS has sample sizes that are small enough that they become difficult to subset at specific demographics in years. To get around this, it is standard practice to use 5-year rolling averages.
For each five-year group, this code calculates the population of individuals, the total time spent providing care across the population, and the weighted median of care provision across the sandwiched individuals.
The Bureau of Labor Statistics (BLS) releases monthly labor force statistics through the Current Population Survey (CPS), a nationally representative household survey conducted in collaboration with the U.S. Census Bureau. These statistics offer a timely and comprehensive overview of employment, unemployment, labor force participation, and other key indicators that shape our understanding of the U.S. economy. Policymakers, researchers, and the public rely on these data to assess economic conditions, identify trends, and inform decisions ranging from interest rate adjustments to workforce development initiatives. Monthly updates facilitate the close monitoring of labor market dynamics, enabling early detection of economic downturns or recoveries.
In its monthly reports, the BLS provides detailed labor force participation tables disaggregated by variables such as sex and race. Building on this framework, our analysis extends the focus to include parenthood status and employment in care-related industries. We further examine labor force participation not only within the formal care economy but also across informal care roles, capturing a more comprehensive view of care giving and its impact on labor dynamics.
In the broad indicators section, we conclude by examining the value of the care economy in relation to the U.S. gross domestic product (GDP). To do so, we draw on GDP data from the Federal Reserve Bank of St. Louis’s online dashboard (FRED: https://fred.stlouisfed.org/). The dataset used reports annual GDP figures for the entire U.S. economy, expressed in billions of dollars. This R code chunk is doing the following steps:
Setting a minimum year:
It defines a variable min_year and sets it to 1994. This
will be used later to filter the data.
Loading GDP data:
It reads the CSV file called FYGDP.csv into a dataframe
called us_gdp.
Cleaning column names:
It standardizes the column names to a consistent, tidy format (e.g.,
lowercase with underscores).
Creating and modifying columns:
date: Extracts the year from the
observation_date column and converts it into a Date format
representing January 1st of that year.fygdp: Converts GDP figures to actual dollar amounts by
multiplying by 1 billion.gdp_daily: Calculates an estimated daily GDP by
dividing the annual GDP by 365.Filtering by year:
It keeps only the rows where the year of the date is
greater than or equal to 1994.
Selecting relevant columns:
It keeps only the date and gdp_daily columns
for display.
Displaying the data as a table:
It uses datatable() to show an interactive table of
us_gdp, with options for:
In plain language:
This code reads in U.S. GDP data, adjusts it to reflect daily
values, filters for years from 1994 onward, and displays it in an
interactive table format for easy viewing.
In this section we prepare CPS ASEC data. We will use this set of CPS ASEC data for the remainder of the analysis representing all stats in the broad indicator data. The code below is used to load in the data and structure it as needed for analysis. This R code chunk is doing the following steps:
Loading data: It reads a CSV file called
ASECdata.csv into a dataframe named
asec.
Filtering for valid responses: It keeps only the
rows where HFLAG equals 1 or is missing. This accounts for
a survey redesign starting in 2014, ensuring consistent data.
Filtering by year: It keeps only the data from a
specific range of years, starting from min_year (which is
currently defined as 1994 labeled above).
Filtering by age: It keeps only people between aged 18 and over.
Selecting variables: It keeps only the listed columns, which are relevant to the analysis (like year, wages, employment status, education, etc.).
Cleaning column names: It standardizes column names to a consistent format (usually lowercase with underscores).
Creating new variables:
date: Adds a new column that sets the date to January
1st of the given year.uhrsworkt: Replaces a special code (999, meaning NILF)
with 0 in the work hours column.occ_type: Classifies each person as working in “care”
or “non-care” based on the focus variable.overall: Adds a column with the same value “overall”
for every row, to simplify grouping and plotting later.In plain language:
This code prepares and cleans survey data, keeping working-age
adults, dealing with some inconsistencies, and creating new variables to
distinguish between care and non-care occupations.
| Name | asec |
| Number of rows | 3904352 |
| Number of columns | 10 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| Date | 1 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| empstat | 0 | 1 | 4 | 12 | 0 | 4 | 0 |
| occ_care_focus | 0 | 1 | 4 | 13 | 0 | 4 | 0 |
| gender_parent | 0 | 1 | 5 | 11 | 0 | 5 | 0 |
| occ_type | 0 | 1 | 4 | 8 | 0 | 2 | 0 |
| overall | 0 | 1 | 7 | 7 | 0 | 1 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| date | 0 | 1 | 1994-01-01 | 2024-01-01 | 2009-01-01 | 31 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 2009.20 | 8.49 | 1994 | 2002.00 | 2009.00 | 2016.00 | 2024.00 | ▆▇▇▆▆ |
| asecwt | 0 | 1 | 1795.76 | 1234.61 | 0 | 930.65 | 1625.94 | 2394.35 | 44423.83 | ▇▁▁▁▁ |
| uhrsworkt | 0 | 1 | 63.96 | 193.60 | 0 | 0.00 | 37.00 | 40.00 | 997.00 | ▇▁▁▁▁ |
| incwage | 0 | 1 | 28027.18 | 50059.09 | 0 | 0.00 | 14560.00 | 40000.00 | 2099999.00 | ▇▁▁▁▁ |
In this section we prepare the ATUS data. We will use this set of CPS ASEC data for the remainder of the analysis. The code below is used to load in the data and structure it as needed for analysis representing all stats in the broad indicator data. This R code chunk is doing the following steps:
This R code chunk is doing the following steps:
Loading data:
It reads the CSV file called ATUSdata.csv into a dataframe
called atus.
Filtering data:
It keeps only:
Selecting variables:
It keeps only a specific set of columns related to time use,
demographics, and care-related activities.
Cleaning column names:
It standardizes column names to a consistent format (e.g., lowercase
with underscores).
Reshaping the data:
It transforms three columns (duration,
scc_all_ln, and sec_all_ln) into a long format
so that they all go into a single duration column with an
associated metric label.
Handling missing data and categorizing activities:
duration with 0.care_flag column that labels each row as
either “care” or “non-care” based on the type of time-use metric or
whether the activity was marked as care-related.Summarizing individual-level data:
It creates a new dataframe case_year that:
Defining the time range:
It uses a custom function atus_yr_range(), defined above,
to calculate the range of years represented in the dataset.
In plain language:
This code prepares and summarizes time-use data by filtering
adult respondents, reshaping and cleaning time records, flagging
care-related activities, and calculating total time spent on care versus
non-care activities for each person and year.
| Name | case_year |
| Number of rows | 454094 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| gender_parent | 0 | 1 | 5 | 11 | 0 | 5 | 0 |
| care_flag | 0 | 1 | 4 | 8 | 0 | 2 | 0 |
| overall | 0 | 1 | 7 | 7 | 0 | 1 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 2011.56 | 5.93 | 2003.00 | 2006.00 | 2011.00 | 2016.00 | 2023.0 | ▇▆▅▅▃ |
| caseid | 0 | 1 | 20116272616793.08 | 59325034611.31 | 20030100013280.00 | 20061111062480.75 | 20110706112012.00 | 20160808162075.00 | 20231212232280.0 | ▇▇▆▆▃ |
| weight | 0 | 1 | 20688.95 | 19980.40 | 1149.24 | 8367.67 | 14926.71 | 25419.15 | 572630.2 | ▇▁▁▁▁ |
| total_time | 0 | 1 | 694.77 | 433.82 | 0.00 | 270.00 | 750.00 | 1060.00 | 2827.0 | ▇▇▃▁▁ |
While demographic factors such as sex are essential to labor force analysis, further disaggregating data by parenthood status offers deeper insight into how caregiving responsibilities intersect with employment. Parenthood—particularly in the context of childrearing—can significantly influence labor force participation, working hours, and occupational choices. For example, mothers may reduce their working hours or leave the labor force entirely due to caregiving demands, while fathers may face societal expectations to maintain or increase their labor force involvement. Conversely, some mothers, particularly those from lower socio-economic backgrounds, may be compelled to enter or remain in the workforce after childbirth to help meet caregiving needs. Disaggregating labor force data by both sex and parenthood status reveals these nuanced dynamics, enabling policymakers and researchers to design more targeted interventions that support families, promote gender equity, and illuminate the structural forces shaping labor market outcomes.
| Name | cps |
| Number of rows | 42083505 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| month | 0 | 1 | 3 | 9 | 0 | 12 | 0 |
| gender_parent | 0 | 1 | 5 | 11 | 0 | 5 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 2006.41 | 9.88 | 1990 | 1998.00 | 2006.00 | 2015.00 | 2024.00 | ▇▇▇▇▆ |
| age | 0 | 1 | 45.79 | 18.40 | 16 | 31.00 | 44.00 | 60.00 | 90.00 | ▇▇▇▅▂ |
| wtfinl | 0 | 1 | 2272.64 | 1315.15 | 0 | 1207.62 | 2225.02 | 3169.79 | 34716.21 | ▇▁▁▁▁ |
| empstat | 0 | 1 | 19.06 | 11.69 | 0 | 10.00 | 10.00 | 34.00 | 36.00 | ▁▇▁▁▅ |
## `summarise()` has grouped output by 'year', 'month'. You can override using the
## `.groups` argument.
The formal care force is defined as the individuals working in a paid job that has been identified as part of the care economy. Jobs such as nurses, teachers, and janitors all work to provide services which are considered to be care giving at home and thus we code as care jobs in the formal economy. The formal care force statistics on the dashboard provide the overall and gender/parenthood count of individuals in the formal care economy.
The code below uses the CPS ASEC yearly data from the CPS survey’s March supplement. This code creates a function where when a certain variable representing the demographic group is inserted, the count and percent of that demographic group are exported. The proportions are relative to within each group. Overall, as a proportion thus refers to the proportion of all individuals but within a demographic proportion refers to the percent of, for example fathers, who work in the care force.
The code below creates a function that will be used to input a demographic and find the needed statistics. This R function is doing the following steps:
Defining a function:
It creates a function called get_formal_lfp that takes two
inputs:
df: a dataframe (asec for this purpose)demo_group: a variable representing the demographic
category to group by (e.g., race, gender, education)Preparing the data:
asec dataset (or the passed
df).category_id: stores the name of the demographic group
variable passed in.subcategory_id: stores the actual value of that
demographic variable for each person.Grouping the data:
It groups the data by date, the demographic group name
(category_id), and each group’s value
(subcategory_id).
Summarizing labor force participation:
formal_care_labor_force: Calculates the weighted number
of people employed in care occupations using the person-level weight
(asecwt).formal_care_force_proportion: Divides the care labor
force total by the total weighted population in the group to get a
proportion.Returning the results:
It returns the summarized dataframe, showing the number and proportion
of employed care workers for each demographic group over time.
In plain language:
This function calculates the size and share of the formal care
labor force across different demographic groups using employment and
occupation data. It outputs totals and proportions by group and
year.
The table below presents the table, as it is used to feed the care board.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
To estimate the number of individuals participating in the informal care workforce, we use data from the ATUS. The function below takes a specified demographic group and calculates the number of individuals within that group engaged in informal labor. For the purposes of this analysis, we define informal labor force participation as providing three or more hours of unpaid care per day. This threshold aligns with the Bureau of Labor Statistics’ criteria for classifying individuals as unpaid family workers on family farms. By adopting this definition, we establish a consistent framework for identifying and analyzing unpaid family care workers within the broader labor force. This R function is doing the following steps:
get_informal_lfp that takes
one input:
demo_group: the name of a demographic variable (e.g.,
race, gender, etc.) to group by.informal_lfp to
store results for each year.i to track the index during the
loop.sel_year), using year ranges
from yr_range:
year_min, which sets the lower bound for the
moving 5-year window (used in the analysis).atus to include data from the 5-year window
ending in sel_year.case_year for the same 5-year window.date column for that year (January 1st).df) and returns it.In plain language:
This function calculates the share of the population doing
significant amounts of informal care work (3+ hours/day) within each
demographic group over time, using a rolling 5-year window. It returns a
dataset showing how informal caregiving participation changes across
groups and years.
Just like with formal, we start by presenting this data as an overall and by gender/parenthood estimation.
The second major broad indicator data created for the Care Board represents the time spent by the above workers in the Care Economy. We utilize both the CPS ASEC and ATUS data to calculate the total number of minutes in a day that workers spend providing care. These statistics are useful to show the size of the care economy work relative to other aspects of the formal economy and informal time use. It is possible that some people are doing care activities during work hours (e.g. secondary care or washing clothes while teleworking). To the extent this happens, and since we are using multiple sources of data to estimate daily care activities, it is possible our estimates are inflated as we may be double counting time spent multitasking. This may be more prevalent in times when flexible work patterns are more common.
The formal care economy constitutes a significant portion of the labor force, as demonstrated in the preceding section. However, measuring its size by participation alone does not fully capture its scope. Time spent working offers a complementary perspective, particularly in care-related roles where intensity and duration of labor vary widely. To address this, we calculate the total minutes worked in the formal care economy as a proportion of all minutes worked across the broader labor force. We implement this through a function, get_formal_time, which takes a specified demographic group and computes the total number of minutes this group spends working in the formal care economy. This R function is doing the following steps:
Defining a function:
It defines a function called get_formal_time that takes two
inputs:
df: a dataframe (most likely asec)demo_group: a demographic variable to group by (e.g.,
race, gender, etc.)Filtering out invalid data:
It removes rows where reported work hours (uhrsworkt)
equals 997, which indicates flexible hours.
Creating grouping variables:
category_id column that stores the name of the
demographic variable.subcategory_id column that stores the value of
that variable for each individual.Grouping data:
It groups the dataset by year (date), the demographic group
name (category_id), and the demographic subgroup
(subcategory_id).
Calculating time spent in formal care work:
formal_care_time: Totals the number of minutes per
week spent in care-related paid work, using reported hours and survey
weights. The formula:
asecwt * uhrsworkt * 60 / 7
converts weekly work hours into minutes per day.
formal_care_time_proportion: Divides that total by
the total number of weighted work minutes for the group to get a
proportion.
Returning the result:
It returns a dataframe with the total and proportion of time spent in
formal care work, broken down by demographic group and year.
In plain language:
This function calculates how much time people spend in paid care
jobs across different demographic groups, using survey data on work
hours. It returns both total care time and the share of all work time
spent in care occupations.
For the care board data itself we present an overall estimate and an estimate broken down by gender and parenthood.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
To measure time spent in the informal care economy, we draw on data from the ATUS. Using the activity crosswalks described above, we categorize time use entries into specific types of care activities. We then calculate the total time spent on informal caregiving across the population. The function below is designed to compute the average amount of time dedicated to informal caregiving, providing a clearer picture of the scope and intensity of unpaid care work. This R function is doing the following steps:
get_informal_time that takes
one input:
demo_group: the name of a demographic variable (e.g.,
gender, education) used to group the data.informal_time to
store yearly results.i for tracking the index during the
loop.sel_year) in yr_range$year,
using a 5-year window:
year_min as the lower limit of that 5-year
range.case_year to include records in the 5-year
window ending with sel_year.category_id: the name of the demographic variable.subcategory_id: the value of that variable for each
person.care_flag) and demographic
category/subgroup.date column representing January 1st of the
sel_year.informal_time
list.df) and returned.In plain language:
This function calculates how much time people in each
demographic group spend on informal care work, over rolling 5-year
periods. It returns both the total and proportion of informal care time
for each group and year.
Just as with the formal sector, for the CB statistics alone we present an overall and a division by gender and parenthood.
Thus far, we have examined both the number of individuals employed in the formal care economy and the time they spend in care-related work. A final dimension of analysis involves assigning a monetary value to this labor. The code chunk below performs this valuation based on several assumptions. Specifically, we identify individuals working within the formal care economy and associate their roles with corresponding wage data. We then estimate the total value of formal care labor by multiplying the median wage for care-related occupations by the number of individuals employed in these roles. The resulting valuation is compared to U.S. GDP figures to contextualize the economic weight of the formal care sector. For the Care Board, we present both an overall valuation and disaggregated estimates by gender and parenthood status. This R function is doing the following steps:
get_formal_value that takes
two inputs:
df: a dataframe (usually asec data)demo_group: a demographic variable to group by (e.g.,
race, gender, education)uhrsworkt == 997)incwage) is zero
(to focus on active earners)category_id: the name of the demographic variablesubcategory_id: the actual value of that demographic
variable for each persondate) and demographic groupformal_value as the total income earned from
formal care work (wage * weight), only for people employed in care
occupationsus_gdp dataset to add daily
GDP information for the corresponding yeargdp_daily column since it’s no longer
neededIn plain language:
This function calculates how much income people earn from paid
care work across different demographic groups and years, and shows how
that income compares to the overall U.S. economy.
The table below provides the data related to the valuation of the formal careforce.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'date', 'category_id'. You can override
## using the `.groups` argument.
We now extend our valuation approach to the informal care economy. Using ATUS data, we estimate the amount of time individuals spend on informal caregiving activities. A key challenge in this process is determining how to assign a monetary value to this unpaid labor. One approach, explored in a related working paper, involves mapping informal care tasks to comparable occupations in the formal care economy—for instance, assigning the average wage of professional chefs to time spent on cooking. While this method offers a nuanced valuation, it raises concerns, as formal sector workers typically possess more training and experience, making their wages an imperfect proxy for informal labor.
To address this limitation and provide a conservative estimate, we adopt the federal minimum wage of $7.25 per hour as the baseline value for informal care labor. This wage reflects the minimum amount an individual is generally expected to earn in the U.S., allowing us to assess the value of informal care at a socially recognized floor. The code below implements this calculation by converting time spent on informal care (from minutes to hours) and multiplying it by $7.25. We then compare the estimated value of informal caregiving to U.S. GDP on a daily basis, presenting it as a proportion of overall economic activity.
This methodology document has outlined the comprehensive framework behind The Care Board, detailing the data sources, processing pipelines, statistical methods, and visualization strategies employed in its development. By bringing together multiple datasets, coding procedures, and analytical models, it provides a transparent account of how the indicators on The Care Board are generated, interpreted, and updated. Whether the data pertains to formal care infrastructure, informal caregiving time, or population-level needs, each component has been carefully constructed to provide a robust, scalable, and replicable platform for monitoring and analyzing care-related dynamics.
The aim of The Care Board is not just to present statistics, but to contextualize them—illuminating the patterns, disparities, and evolving trends that define the care economy across time and place. By linking rigorous data analysis with user-friendly visualization, the board supports decision-making for researchers, policymakers, and advocates who are invested in strengthening the care infrastructure. This document ensures that the metrics presented are not black boxes, but rather the product of thoughtful methodological choices grounded in social science and data science principles.
Looking forward, this methodology will remain a living framework—capable of evolving as new data becomes available, as care dynamics shift, and as users provide feedback. The ongoing development of The Care Board will continue to prioritize transparency, adaptability, and impact, ensuring it remains a vital tool for understanding and improving care systems in diverse communities.