Introduction

The Careboard represents a location for viewing a wide variety of statistics related to the Care Economy. The care economy represents all paid and unpaid activities involved in providing care to other people. Tasks such as nursing, teachers, watching children at home, and helping an elderly relative all fall under the scope of the Care Economy. Despite the widespread importance of these tasks, many of them go unmeasured. The Careboard provides a location to measure, analyze, and study these activities while providing information to the wider community.

This document acts as a methodology review for all statistics presented on the Careboard. Many of the statistics presented in the Careboard are novel in that they were developed specifically for the Careboard. Most of these novel statistics will have academically peer reivewed research papers to go along with them and present the methodologies and scope of the analysis. This paper though offers a single location to review all statistics and methodologies for the careboard. In spots needed, links to working papers or peer reviewed articles will be included.

This R Markdown file will act as the final methods repository for all analysis on the careboard website. All data presented and available for download in the Care Board will be discussed in this document. This document will provide all needed information and code to replicate any data available on the Careboard. For each statistic, this file will specifically walk users through the formation of the statistics starting with the raw data and ending with the visible statistic. Any choices, hurdles, and assumptions made along the way will be explicity laid out for critical analysis.

To use this document, move to the section of the statistic that you are most interested in learning about. When in this section, start by looking over the raw data input requirements and then review the code and explanations. For certain areas of the code, there will be references to appendix sections. The sections of the appendix represent preliminary coding decisions, data, and methods used prior to the formation of the statistics. These preliminary codes often feed into multiple statistics and thus are referenced at the end of this document in an appendix.

For using any data or code from this page we kindly ask that the data are appropriately cited. Publicantions and reports should include the appropriate version of the citation as follows:

Misty Heggeness, Joseph Bommarito, Lucie Prewett, Pilar Mcdonald. CareBoard: Version 1.0 [dataset]. Lawrence, KS: Careboard, 2025.

Preliminary tasks

Before running any code, the following preliminary tasks will need to be done. This code provided at the begining of this document must be run before any code in any other section. This code installs relative packages and sets the working directory to be used by all other sections. Ensure that you run this code prior to any others or else you are likely to receive errors.

The first step is to install the required packages. While some statistics require some specific packages to run, other packages are needed to for more general data handling. These packages are loaded and described here.

if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman")
pacman::p_load(
  ipumsr,
  tidyverse,
  haven,
  data.table,
  Hmisc,
  DT
)
  • pacman: is a package used to load other packages. This package checks to see if the other package is installed on the user’s computer. In the case it is not installed, pacman will install it prior to loading it from the library. In the case it is installed, pacman will skip installation and load the package directly from the library.

  • tidyverse: is a commonly used data handling package in the R environment. Tidyverse is used to provide more streamlined and readable coding with the goal of allowing easier access to replication files. Whenever possible, code in this document is conducted via the tidyverse methodology as opposed to base R.

  • haven: is a package used for reading and writing certain data formats. For the purpose of this documentation, this package is mostly used for the purpose of writing datafiles as STATA .dta files.

  • data.table: is a package used to efficiently load and write csv files. Large csv files can be resource intensive to load in as a datset. This package allows them to be loaded in as a table and then worked with directly in the R environment.

  • Hmisc is used to handle survey research and is primarily used in the below code to apply survey weights to statistics creating population valid estimates.

  • DT The DT package is used exclusivly for this RMD file and is used to provide more readable tables that can be viewed of the data within the HTML output.

setwd("K:/Care Board/Data/Care_Board_Directory/")

To load data, you must set the working directory to the file location where your data is stored. Modify the code chunk above with the correct file path for your personal machine to download the data. This is the one code in this document which you will need to personalize to ensure it runs properly.

The Care Economy

Age Data

The first section of the Careboard provides a measurement of the need and the provision of care throughout the US population. The need and provision of care is split among 3 catagories, developmental, health, and daily living. Developmental care is care related to the task of providing growth to children. Health care is care related to the task of providing physical and mental care assisting with the health of another individual. Eldercare is related to this specific care category. Finally daily living care is care related to general daily activities such as cooking and cleaning. This section of the methodology provides all data and code used to create the tables that feed into the Care Economy page. The final results of this data and code will be available for download as a CSV, DTA, and Excel file via the data tab on the Care Board.

The Care Economy section essentially outlines the market of caregiving in the US by age. To display this data we need to start off with the following two variables.

  • age - to represent each individual age group in the United States.
  • population - to represent the population of the US at each age.
ddi_file <- read_ipums_ddi("usa_00042.xml")
age_data <- read_ipums_micro(ddi_file)
## Use of data from IPUMS USA is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
ipums_conditions()
## Users of IPUMS USA data must agree to abide by the conditions of use. A user's
## license is valid for one year and may be renewed.  Users must agree to the
## following conditions:
## 
## (1) No fees may be charged for use or distribution of the data.
## 
## (2) Cite IPUMS appropriately.  For information on proper citation, refer to the
## citation requirement section of this DDI document.
## 
## (3) Tell us about any work you do using the IPUMS.  Publications, research
## reports, or presentations making use of IPUMS USA should be added to our
## Bibliography. Continued funding for the IPUMS depends on our ability to show
## our sponsor agencies that researchers are using the data for productive
## purposes.
## 
## (4) The IPUMS cannot be used for genealogical research
## 
## (5) It is difficult to use the IPUMS to study small geographic areas.  In the
## IPUMS census samples for years 1940-present, no places having a population of
## fewer than 100,000 persons can be identified.
## 
## (6) Use it for GOOD -- never for EVIL.
## 
## (7) Please notify ipums@umn.edu regarding errors in the data or documentation.
## 
## Publications and research reports based on the IPUMS USA database must cite it
## appropriately. The citation should include the following:
## 
## Steven Ruggles, Sarah Flood, Matthew Sobek, Daniel Backman, Annie Chen, Grace
## Cooper, Stephanie Richards, Renae Rodgers, and Megan Schouweiler. IPUMS USA:
## Version 15.0 [dataset]. Minneapolis, MN: IPUMS, 2024.
## https://doi.org/10.18128/D010.V15.0
## 
## The licensing agreement for use of IPUMS USA data requires that users supply us
## with the title and full citation for any publications, research reports, or
## educational materials making use of the data or documentation. Please add your
## citation to the IPUMS bibliography at http://bibliography.ipums.org/.

The data for these variables come from the most recent ACES survey accessed through the ipums interface.

Steven Ruggles, Sarah Flood, Matthew Sobek, Daniel Backman, Annie Chen, Grace Cooper, Stephanie Richards, Renae Rodgers, and Megan Schouweiler. IPUMS USA: Version 15.0 [dataset]. Minneapolis, MN: IPUMS, 2024. https://doi.org/10.18128/D010.V15.0

The .xml file represents a metadata pull and the read_ipums command converts the associated data file into a table usable in an R interface. This datafile has the following format.

summary(age_data)
##       YEAR          SAMPLE           SERIAL           CBSERIAL        
##  Min.   :2023   Min.   :202301   Min.   :      1   Min.   :2.023e+12  
##  1st Qu.:2023   1st Qu.:202301   1st Qu.: 372386   1st Qu.:2.023e+12  
##  Median :2023   Median :202301   Median : 756830   Median :2.023e+12  
##  Mean   :2023   Mean   :202301   Mean   : 758992   Mean   :2.023e+12  
##  3rd Qu.:2023   3rd Qu.:202301   3rd Qu.:1147002   3rd Qu.:2.023e+12  
##  Max.   :2023   Max.   :202301   Max.   :1519010   Max.   :2.023e+12  
##       HHWT            CLUSTER              STRATA              GQ       
##  Min.   :   1.00   Min.   :2.023e+12   Min.   :  10001   Min.   :1.000  
##  1st Qu.:  48.00   1st Qu.:2.023e+12   1st Qu.: 100005   1st Qu.:1.000  
##  Median :  71.00   Median :2.023e+12   Median : 231248   Median :1.000  
##  Mean   :  97.24   Mean   :2.023e+12   Mean   : 488810   Mean   :1.134  
##  3rd Qu.: 115.00   3rd Qu.:2.023e+12   3rd Qu.: 480148   3rd Qu.:1.000  
##  Max.   :2225.00   Max.   :2.023e+12   Max.   :8100351   Max.   :5.000  
##      PERNUM           PERWT              AGE       
##  Min.   : 1.000   Min.   :   1.00   Min.   : 0.00  
##  1st Qu.: 1.000   1st Qu.:  47.00   1st Qu.:22.00  
##  Median : 2.000   Median :  72.00   Median :44.00  
##  Mean   : 2.058   Mean   :  98.34   Mean   :43.11  
##  3rd Qu.: 3.000   3rd Qu.: 117.00   3rd Qu.:63.00  
##  Max.   :20.000   Max.   :2225.00   Max.   :96.00

Using this data, we provide small mutations to create a framework that fits with our visualization. We first rename the column AGE to age to suit our rules on variable naming conventions. We then collapse all data above the age of 86 into the category of 86. This effectively makes the value for 86 equal to the value 86+. While the max age in this data is 96, allowing us to go higher, we top code the data at 86 to provide more consistent matching to other datasets which follow this convention. It is also likely that the ages above 85 are highly similar to each other allowing this collapsed group to happen. Many datasets we use for this project follow the convention of top coding ages at 85. By matching this convetion we have smoother transitions between visuals.

After conducting these mutations, we summarise the data by calculating the weighted sum of the population for each age. The weighted sum is calculated using the PERWT variable from the original data which represents the respondents population survey weight.

age_data <- age_data %>%
  rename(age = AGE) %>% #Rename the Age variable to proper formatting style
  mutate(age = ifelse(age >= 86, 86, age)) %>% #Collapse all individuals over 85 into 85 plus
  group_by(age) %>%
  summarise(
    population = sum(PERWT, na.rm = TRUE) #Sum the weighted population within each group.
  )

The results of this transformation look like this. We’ve restricted this data to having only two columns, one for age and one for the associated population.

datatable(age_data, options = list(pageLength = 1000))

We then need to double check this data to ensure that it is in fact correct. We know roughly what the US population is, in order to ensure the data fits our understanding we can take the sum of the population column. If this number is off what the population of the US is expected to be, it is a sign of an issue in the run code.

message("Total US population estimate: ", sum(age_data$population))
## Total US population estimate: 334914896

Market Datum

If the above number is significantly off your understanding of what the USA population is, please double check the code that you ran before proceeding. Please note as well, that we utilize ipums data pulls to gather our numbers which tend to lag behind official census reports. Thus, the number presented above could be at most one year older than the current official reports, depending on the timing of the release.

Now that we have the above population data, we want to add a few more variables representing the need and provision of care throughout the population.

  • care_focus - valued at developmental, daily_living, or health this variable will determine the different care focuses that are analyzed in the dashboard and will provide a different need and provision valuation for each.

  • need_interval - provides the estimated number of minutes in a day a person of the associated age needs care for each specific focus.

  • provision_interval - provides the estimated number of minutes in a day a person of the assocated age provides care for each specific focus.

To start, let’s create a blank dataframe of the ages and care_focuses that we will populate with the need and care information. This code expands out each age to have an observation for each care focus using the expand.grid command. This code then creates two new interval columns each of which is valued at NA. We will populate these NA columns in the upcoming code chunks

age <- age_data$age
care_focus <- c("developmental", "daily_living", "health")

market_datum <- expand.grid(
  age = age,
  care_focus = care_focus
) %>%
  mutate(
    need_interval = NA, #how much care does this group need?
    provision_interval = NA #how much care does this group provider?
  )

summary(market_datum)
##       age             care_focus need_interval  provision_interval
##  Min.   : 0   developmental:87   Mode:logical   Mode:logical      
##  1st Qu.:21   daily_living :87   NA's:261       NA's:261          
##  Median :43   health       :87                                    
##  Mean   :43                                                       
##  3rd Qu.:65                                                       
##  Max.   :86

After creating a blank dataset, we will load in the data extract from the American Time Use Survey (ATUS). For more information on the creation of this data extract and coding of variables see Appendix A: ATUS Methods. When loading in the atus data, we restrict the data to the years 2023, 2022, 2021, 2019, and 2018. Utilization of a 5 year rolling average is beneficial in allowing us to be protected from potential outlines that may influence estimates due in large part to small sample sizes in the yearly ATUS survey. However, utilization of five year rolling average will make the estimates less responsive to sudden shifts in community behavior. Additionally, the year 2020 is excluded from analysis due to the COVID-19 pandemic leading to a severe disruption in the ATUS survey implementation.

After loading in the ATUS data, we need to split responses for primary active care giving and secondary care giving. The variable SCC_ALL_LN provides time spent during an activity on secondary care giving to a child. For instance, a primary activity might be cooking, and during this time activity, the respondent was also providing care and supervision over their child. These secondary care giving times mark a large amount of care giving that exists in the data. The variable SEC_ALL_LN provides the same information for secondary care giving of elderly adults in the household.

In the data, these secondary time uses are separate columns from the main activities. We want to pull out these observations and make them their own activity observations. The chunk below does this and then binds all data together so that both primary and secondary activities are coded as distinct observations. This code also revalues nas in the Eldercare and Childcare datasets as 0s. NAs in this column represent 0 values (See ATUS Methods appendix) and thus must be updated to allow us to correctly filter observations.

atus <- fread("03_ATUSdata.csv") %>%
  select(CASEID, ACTLINE, YEAR, HH_SIZE, formal_care_focus, marst, nchild, ChildCare, ElderCare, SCC_ALL_LN, SEC_ALL_LN, FOCUS, DURATION, AGE, PaidWork, WT06) %>%
  filter(YEAR != 2020) %>%
  filter(YEAR >= 2018) %>%
  mutate(care_job = ifelse(formal_care_focus == "none", 0, 1))%>%
  mutate(ChildCare = ifelse(is.na(ChildCare), 0, ChildCare))%>%
  mutate(ElderCare = ifelse(is.na(ElderCare), 0, ElderCare))

atus$time_use <- "primary"

atus_secondary <- atus %>%
  filter(SCC_ALL_LN > 0) %>%
  mutate(
    FOCUS = "developmental",
    DURATION = SCC_ALL_LN,
    time_use = "secondary"
  )

atus <- bind_rows(atus, atus_secondary)

atus_secondary <- atus %>%
  filter(SEC_ALL_LN > 0) %>%
  mutate(
    FOCUS = "health",
    DURATION = SEC_ALL_LN,
    time_use = "secondary"
  )

# Append modified rows to the original dataset
atus <- bind_rows(atus, atus_secondary)

summary(atus)
##      CASEID                  ACTLINE           YEAR         HH_SIZE      
##  Min.   :20180101180006   Min.   : 1.00   Min.   :2018   Min.   : 1.000  
##  1st Qu.:20190201191515   1st Qu.: 5.00   1st Qu.:2019   1st Qu.: 2.000  
##  Median :20210303212088   Median :10.00   Median :2021   Median : 2.000  
##  Mean   :20204886382012   Mean   :11.53   Mean   :2020   Mean   : 2.706  
##  3rd Qu.:20220806221223   3rd Qu.:16.00   3rd Qu.:2022   3rd Qu.: 4.000  
##  Max.   :20231212232280   Max.   :72.00   Max.   :2023   Max.   :14.000  
##                                                                          
##  formal_care_focus     marst               nchild         ChildCare      
##  Length:918510      Length:918510      Min.   :0.0000   Min.   :0.00000  
##  Class :character   Class :character   1st Qu.:0.0000   1st Qu.:0.00000  
##  Mode  :character   Mode  :character   Median :0.0000   Median :0.00000  
##                                        Mean   :0.8531   Mean   :0.05508  
##                                        3rd Qu.:2.0000   3rd Qu.:0.00000  
##                                        Max.   :9.0000   Max.   :1.00000  
##                                                                          
##    ElderCare        SCC_ALL_LN        SEC_ALL_LN         FOCUS          
##  Min.   :0.0000   Min.   :  0.000   Min.   :  0.000   Length:918510     
##  1st Qu.:0.0000   1st Qu.:  0.000   1st Qu.:  0.000   Class :character  
##  Median :0.0000   Median :  0.000   Median :  0.000   Mode  :character  
##  Mean   :0.0137   Mean   :  9.908   Mean   :  1.001                     
##  3rd Qu.:0.0000   3rd Qu.:  0.000   3rd Qu.:  0.000                     
##  Max.   :1.0000   Max.   :900.000   Max.   :922.000                     
##                                                                         
##     DURATION            AGE          PaidWork           WT06          
##  Min.   :   1.00   Min.   :16.0   Min.   :1        Min.   :   719247  
##  1st Qu.:  15.00   1st Qu.:37.0   1st Qu.:1        1st Qu.:  4650918  
##  Median :  30.00   Median :50.0   Median :1        Median :  7735671  
##  Mean   :  73.97   Mean   :51.3   Mean   :1        Mean   : 10277023  
##  3rd Qu.:  90.00   3rd Qu.:66.0   3rd Qu.:1        3rd Qu.: 12779696  
##  Max.   :1310.00   Max.   :85.0   Max.   :1        Max.   :194366930  
##                                   NA's   :859632                      
##     care_job        time_use        
##  Min.   :0.0000   Length:918510     
##  1st Qu.:0.0000   Class :character  
##  Median :0.0000   Mode  :character  
##  Mean   :0.1908                     
##  3rd Qu.:0.0000                     
##  Max.   :1.0000                     
## 

Now that we have the loaded time use data, we need to add and mutate a few of the variables to get them in the correct format. First, for the weight variable, to get a daily weight we need to divide by 365 and then divide again by 5. The WT06 variable gives us a value of the population suitable for a calendar year, dividing bt 365 (366 in a leap year) accounts for this. We then divide by 5 to account for the fact that we have five years of data.

We then create a variable called “worktime” that is the value of the variable DURATION X PaidWork X care_job. DURATION is a numeric count of minutes spent in an activity, PaidWork is a binary variable valued at 1 if the activity consists of paid work and 0 otherwise, care_job is a binary variable values at 1 in the case the formal work is coded as a care economy job and 0 otherwise. This new variable thus represents the time spent in an activity only if that time is spent both in paid work and working in a care sector job. This will be important to calculate the amount of time different groups provide care later on.

For more information about how care economy jobs are coded, see Appendix: Coding Care Economy Occupations.

Following this we load in and merge a hierarchical version of the ATUS data. ATUS data is in two formats. The format used in the first data loaded is rectangular activity data which contains activity records and requested person information attached to each activity record. Hierarchical data contains a distinct household record followed by separate person records for other individuals within the household. We utilize the hierarchical data to understand people with whom activities are conducted with. The variable RELATEW provides codes for whom each activity was conducted with, for instance alone, with a child, or with a spouse. We load in this hierarchical data and then merge the RELATEW data into the other dataset by matching based on the CASEID person identifier and the ACTLINE unique activity identifier number. For instance, person id 1 activity 1 (representing the first activity recorded by the first person in the data) will be paired between the two datasets and RELATEW will be added to the atus data.

atus$weight = atus$WT06/365/5

atus$worktime <- atus$DURATION*atus$WORKPAID*atus$care_job

ddi_file <- read_ipums_ddi("./atus_00026.xml")
atus_H <- read_ipums_micro(ddi_file)
## Use of data from IPUMS ATUS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
atus$CASEID <- as.numeric(atus$CASEID)

atus <- atus %>%
  left_join(atus_H %>% select(CASEID, ACTLINEW, RELATEW),
            by = c("CASEID" = "CASEID", "ACTLINE" = "ACTLINEW"))
## Warning in left_join(., atus_H %>% select(CASEID, ACTLINEW, RELATEW), by = c(CASEID = "CASEID", : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 47 of `x` matches multiple rows in `y`.
## ℹ Row 381 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
summary(atus)
##      CASEID             ACTLINE           YEAR         HH_SIZE      
##  Min.   :2.018e+13   Min.   : 1.00   Min.   :2018   Min.   : 1.000  
##  1st Qu.:2.019e+13   1st Qu.: 6.00   1st Qu.:2019   1st Qu.: 2.000  
##  Median :2.021e+13   Median :10.00   Median :2021   Median : 3.000  
##  Mean   :2.020e+13   Mean   :11.79   Mean   :2020   Mean   : 3.059  
##  3rd Qu.:2.022e+13   3rd Qu.:16.00   3rd Qu.:2022   3rd Qu.: 4.000  
##  Max.   :2.023e+13   Max.   :72.00   Max.   :2023   Max.   :14.000  
##                                                                     
##  formal_care_focus     marst               nchild        ChildCare      
##  Length:1198564     Length:1198564     Min.   :0.000   Min.   :0.00000  
##  Class :character   Class :character   1st Qu.:0.000   1st Qu.:0.00000  
##  Mode  :character   Mode  :character   Median :1.000   Median :0.00000  
##                                        Mean   :1.143   Mean   :0.07324  
##                                        3rd Qu.:2.000   3rd Qu.:0.00000  
##                                        Max.   :9.000   Max.   :1.00000  
##                                                                         
##    ElderCare         SCC_ALL_LN       SEC_ALL_LN         FOCUS          
##  Min.   :0.00000   Min.   :  0.00   Min.   :  0.000   Length:1198564    
##  1st Qu.:0.00000   1st Qu.:  0.00   1st Qu.:  0.000   Class :character  
##  Median :0.00000   Median :  0.00   Median :  0.000   Mode  :character  
##  Mean   :0.01369   Mean   : 15.98   Mean   :  1.005                     
##  3rd Qu.:0.00000   3rd Qu.: 15.00   3rd Qu.:  0.000                     
##  Max.   :1.00000   Max.   :900.00   Max.   :922.000                     
##                                                                         
##     DURATION            AGE           PaidWork            WT06          
##  Min.   :   1.00   Min.   :16.00   Min.   :1         Min.   :   719247  
##  1st Qu.:  15.00   1st Qu.:35.00   1st Qu.:1         1st Qu.:  4567399  
##  Median :  30.00   Median :45.00   Median :1         Median :  7539175  
##  Mean   :  70.55   Mean   :48.86   Mean   :1         Mean   : 10070500  
##  3rd Qu.:  85.00   3rd Qu.:63.00   3rd Qu.:1         3rd Qu.: 12449980  
##  Max.   :1310.00   Max.   :85.00   Max.   :1         Max.   :194366930  
##                                    NA's   :1128987                      
##     care_job        time_use             weight            worktime      
##  Min.   :0.0000   Length:1198564     Min.   :   394.1   Min.   : NA      
##  1st Qu.:0.0000   Class :character   1st Qu.:  2502.7   1st Qu.: NA      
##  Median :0.0000   Mode  :character   Median :  4131.1   Median : NA      
##  Mean   :0.2019                      Mean   :  5518.1   Mean   :NaN      
##  3rd Qu.:0.0000                      3rd Qu.:  6821.9   3rd Qu.: NA      
##  Max.   :1.0000                      Max.   :106502.4   Max.   : NA      
##                                                         NA's   :1198564  
##     RELATEW    
##  Min.   : 100  
##  1st Qu.: 100  
##  Median : 202  
##  Mean   :1545  
##  3rd Qu.: 401  
##  Max.   :9998  
## 

The next step is detmaining the need of caregiving throughout the population. For each age, we want to know how much care in each of the different focuses is needed. While we can make a variety of assumptions about how much care is needed, we want to be as data driven as possible. Thus, to measure care needs we look at the amount of care individuals give to themselves when they have no one else they are responsible for or to help them. Within a household of multiple people, alotting caregiving can be complicated and is often an impportant economic arangement. For instance, economies of scale hold for most households resulting in different activities being conducted by only one person instead of both. As an example, each person needs to spend time preparing food during the day (assuming they don’t eat out). When two people live together, the time spent cooking might increase but is unlikely to double and might be only done by one of the two individuals, freeing the other up to spend less time on care activities or to focus on other care activies (such as doing the dishes). These household arrangements can make it difficult to determine exactly how much care is needed at an individual level.

We need to avoid measuring these households. To avoid them we filter the data to only include individuals, living alone, without a wife or child, who spend no time in the day on child or elder care giving. We futher filter these individuals to only include tasks that they did alone, so this data will not inlcude tasks conducted with any outside house members. These are the individuals who have no one to balance out workloads with to apply economies of scale. Thus we can relatively safely assume these individuals are doing 100% of the care giving that the need in their household. We also can safetly assume that these individuals are the sole recipients of care giving within their household. These individuals are not providing care to other people or recieving it from other people and thus provide accurate measures of care need when looking at their daily activities. While the above assumptions might be justified to be made, it is important to note they are nonetheless assumptions and there are likely activities that are missed. For example, a person spending time making food which they plan on delivering to their elderly parents who don’t live with them on a future date, would likely not be coded as providing eldercare. At the same time, eating food that a mother dropped off the previous day, and thus spending less time cooking, would not be coded as recieving care. While a worry, these instances are likely uncommon in the data and thus it is fair to proceed.

The code below provides a loop to go through each age. In each loop the data is first filtered via the above criteria. Following that we calculated the average time spent in each care_focus using a weighted mean approach. Utilization of weighted means allows us to produce population level estimates. This estimate is then used as our measure of how muich time is needed by people within this age group. It is important to note, that this measure represents the average for the population, including people of all health statuses. It is likely that some individuals who suffer health issues will require significantly more care than this estimate provides. However, this code seeks to provide only population level estimates without looking at the specific characteristics.

for(a in age){
  data <- atus %>%
    filter(HH_SIZE == 1) %>% # Look at people living by themselves
    group_by(CASEID) %>%
      filter(all(SCC_ALL_LN == 0)) %>%
      filter(all(SEC_ALL_LN == 0)) %>%
      filter(all(ChildCare == 0)) %>%
      filter(all(ElderCare == 0)) %>%
    ungroup() %>%
    filter(RELATEW == 100) %>%
    filter(AGE == a | AGE == a-1 | AGE == a+1 | AGE == a-2 | AGE == a+2) %>% # 5-year lag group
    group_by(FOCUS, CASEID) %>% # Estimates for each individual
    summarise(
      Duration = sum(DURATION, na.rm = TRUE),
      weight = first(weight),
      .groups = "drop"
    ) %>%
    ungroup() %>%
    filter(FOCUS != "None") %>% # Aggregate by care focus
    group_by(FOCUS) %>%
    summarise(
      need_interval = weighted.mean(Duration, w = weight, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    rename(care_focus = FOCUS) %>%
    filter(care_focus != "none")
  
  data$age <- a
  
  market_datum <- market_datum %>%
    left_join(data %>% select(age, care_focus, need_interval), 
              by = c("age", "care_focus")) %>%
    mutate(need_interval = coalesce(need_interval.x, need_interval.y)) %>%
    select(age, care_focus, need_interval, provision_interval)
}

market_datum$need_interval[is.na(market_datum$need_interval)] <- 0

While the above code gives us information for adults, it isn’t useful for those at the highs or lows of the data in terms of age. For instance, we cannot use this method to determine the average care needs for groups who on average rely on others for caregiving. We define these groups as those under the age of 18 and over the age of 75. The methodology above would give us the average needs of these groups only for those who do not receive outside help. For children these do not exist in the data, and for the elderly it is reasonable to believe that after a certain age, not needing assistance of any kind becomes less common meaning any examples of this in the data might not be reflective of the general popualtion.

We therefore use a variety of assumptions to determine the time needs for these groups. For those under the age of 12, we assume that 24 hours of total care time is necessary. For those over the age of 12 but still under the age of 18 we assume that 20 hours total care time is needed. Many states have laws forbidding those under the age of 12 from being left alone, which is why we make this assumption. For those between 12 and 18, our number comes from a middle ground between what an 18 year old seems to need and what a child who is not yet fully independent might need.

The code below assigns values to age ranges of under 5, 5-12, and 13-17 for each of the three care focuses based on assumptions. These assumptions come from a review of data, takling to care-givers, and review of the literature. However, it is important to understand that these assumptions are not fully data driven and thus present potential bias. Future research should seek to better understand the exact care demands faced by children in the US.

under5_health <- 300
five_twelve_health <- 150
twelve_eighteen_health <- 90

under5_develop <- 420
five_twelve_develop <- 480
twelve_eighteen_develop <- 360

under5_daily <- 1440 - under5_health - under5_develop
five_twelve_daily <- 1440 - five_twelve_health - five_twelve_develop
twelve_eighteen_daily <- 1200 - twelve_eighteen_health - twelve_eighteen_develop

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "health"] <- under5_health
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "health"] <- five_twelve_health
market_datum$need_interval[market_datum$age > 12 & market_datum$age <= 18 & market_datum$care_focus == "health"] <- twelve_eighteen_health

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "developmental"] <- under5_develop
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "developmental"] <- five_twelve_develop
market_datum$need_interval[market_datum$age > 12 & market_datum$age <= 18 & market_datum$care_focus == "developmental"] <- twelve_eighteen_develop

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "daily_living"] <- under5_daily
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "daily_living"] <- five_twelve_daily
market_datum$need_interval[market_datum$age > 12 & market_datum$age <= 18 & market_datum$care_focus == "daily_living"] <- twelve_eighteen_daily

We do the same thing for individuals aged 75 and older. It is difficult to know exactly what the care needs for this group are but through a review of fa variety of sources we can come up with rough estimates. We assign these rough estimates using the code below for the three care focuses. Once again it is important to note that these estimates represent assumptions as opposed to being fully data driven. Further research should be done to understand the care needs for this group.

over_85_health <- 300
over_75_health <- 200

over_85_develop <- 0
over_75_develop <- 0

over_85_daily = 1200 - over_85_health - over_85_develop
over_75_daily = 780 - over_75_health - over_75_develop

market_datum$need_interval[market_datum$age > 74 & market_datum$age < 85  & market_datum$care_focus == "health"] <- over_75_health
market_datum$need_interval[market_datum$age > 84 & market_datum$care_focus == "health"] <- over_85_health

market_datum$need_interval[market_datum$age > 74 & market_datum$age < 85  & market_datum$care_focus == "developmental"] <- over_75_develop
market_datum$need_interval[market_datum$age > 84 & market_datum$care_focus == "developmental"] <- over_85_develop

market_datum$need_interval[market_datum$age > 74 & market_datum$age < 85  & market_datum$care_focus == "daily_living"] <- over_75_daily
market_datum$need_interval[market_datum$age > 84 & market_datum$care_focus == "daily_living"] <- over_85_daily

Running all the above code will fill in the columns related to need for care.

summary(market_datum)
##       age      care_focus        need_interval   provision_interval
##  Min.   : 0   Length:261         Min.   :  0.0   Mode:logical      
##  1st Qu.:21   Class :character   1st Qu.:  0.0   NA's:261          
##  Median :43   Mode  :character   Median :101.2                     
##  Mean   :43                      Mean   :185.1                     
##  3rd Qu.:65                      3rd Qu.:200.0                     
##  Max.   :86                      Max.   :900.0

However, we still have to fill in the columns related to the provision of care. Measuring the care provision can be done in a similar way to above where we look at the average ability of a person by age to provide care. However, unlike above we do not subset our data to only include people living alone. We instead look at the total amount of time within a market where an individual is able to provide care.

To do this we add up a person’s time spent in the following catagories. 1) Paid work in the formal care economy. 2) Unpaid work in the informal care economy. 3) Secondary child care and elder care.

This gives us a distribution of the total time spent on care provision. For each of these categories we specifically look at the 75th percentile of individuals to pull out care provision. This represents the amount to which only 25% of individuals provide more care. We believe this is a conservative estimate about how much someone is able to provide care, allowing for them to have some time related to leisure and self-care along with paid work. While some people will provide more care than this, meaning it is technically possible, research looking at the mental health burden of caregiving shows that many people provide more than the healthy amount of time providing care. We thus want to use a value lower than the max amount to determine what is possible. The 75th percentile is thus a conservative estimate. It is nontheless important to have better discussions on what is the appropriate amount of time that a care giver should be expected to spend throughout their day.

The code below looks through ages in the same method as the previous chunck. Within each loop this code adds information about the time spent in the three caregiving catagories using the weighted median approach. This code then adds this data to the dataset.

for(a in age){
  # Calculate work hours
  work_hours <- atus %>%
    filter(time_use == "primary") %>%
    filter(AGE %in% c(a-2, a-1, a, a+1, a+2)) %>%
    group_by(CASEID, formal_care_focus) %>%
    summarise(
      Duration = sum(worktime, na.rm = TRUE),
      weight = first(weight),
      .groups = "drop"
    ) %>%
    reframe( # Use reframe instead of summarise to allow multiple rows
      Duration = wtd.quantile(Duration, weights = weight, probs = 0.75, normwt = FALSE)
    )
  
  work_hours <- as.numeric(work_hours / 3)
  
  # Calculate care provision intervals
  data <- atus %>%
    filter(time_use == "primary") %>%
    filter(AGE %in% c(a-2, a-1, a, a+1, a+2)) %>%
    group_by(FOCUS, CASEID) %>%
    summarise(
      Duration = sum(DURATION, na.rm = TRUE),
      weight = first(weight),
      .groups = "drop"
    ) %>%
    filter(FOCUS != "None") %>%
    group_by(FOCUS) %>%
    reframe( # Use reframe to allow multiple rows
      provision_interval = wtd.quantile(Duration, weights = weight, probs = 0.75, normwt = FALSE)
    ) %>%
    rename(care_focus = FOCUS)
  
  # Adjust provision_interval with work_hours
  data$provision_interval <- data$provision_interval + as.numeric(work_hours)  
  data$age <- a
  
  # Merge with market_datum
  market_datum <- market_datum %>%
    left_join(data %>% select(age, care_focus, provision_interval), 
              by = c("age", "care_focus")) %>%
    mutate(provision_interval = coalesce(provision_interval.x, provision_interval.y)) %>%
    select(age, care_focus, need_interval, provision_interval)
}

Just like for care need, this provides us only information for the midle age groups and we thus need to modify it to include data on the low and high ages. These assumptions are a little more straight forward for us. We assume after the age 85, and below the age of 18, people are providing 0 care. We know this is likely not exactly true, but we believe it is a safe assumption to make as it’s best to not rely on these age groups and many in these age groups do not provide care. We nonetheless will relax these assumptions in the next step, specifically for the age group of 13-17.

over_85_health <- 0
over_85_develop <- 0
over_85_daily = 0

market_datum$provision_interval[market_datum$age > 84 & market_datum$care_focus == "health"] <- over_85_health
market_datum$provision_interval[market_datum$age > 84 & market_datum$care_focus == "developmental"] <- over_85_develop
market_datum$provision_interval[market_datum$age > 84 & market_datum$care_focus == "daily_living"] <- over_85_daily

## Now we need to do the same thing for children.
## For now we will do the same as with elderly and assume they on average provide 0 care.

under5_health <- 0
five_twelve_health <- 0
twelve_eighteen_health <- 0

under5_develop <- 0
five_twelve_develop <- 0
twelve_eighteen_develop <- 0

under5_daily <- 0
five_twelve_daily <- 0
twelve_eighteen_daily <- 0

market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "health"] <- under5_health
market_datum$provision_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "health"] <- five_twelve_health
market_datum$provision_interval[market_datum$age > 12 & market_datum$age < 18 & market_datum$care_focus == "health"] <- twelve_eighteen_health

market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "developmental"] <- under5_develop
market_datum$provision_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "developmental"] <- five_twelve_develop
market_datum$provision_interval[market_datum$age > 12 & market_datum$age < 18 & market_datum$care_focus == "developmental"] <- twelve_eighteen_develop

market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "daily_living"] <- under5_daily
market_datum$provision_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "daily_living"] <- five_twelve_daily
market_datum$provision_interval[market_datum$age > 12 & market_datum$age < 18 & market_datum$care_focus == "daily_living"] <- twelve_eighteen_daily

We now have fully populated data.

summary(market_datum)
##       age      care_focus        need_interval   provision_interval
##  Min.   : 0   Length:261         Min.   :  0.0   Min.   :  0.0     
##  1st Qu.:21   Class :character   1st Qu.:  0.0   1st Qu.: 70.0     
##  Median :43   Mode  :character   Median :101.2   Median :180.0     
##  Mean   :43                      Mean   :185.1   Mean   :182.4     
##  3rd Qu.:65                      3rd Qu.:200.0   3rd Qu.:310.0     
##  Max.   :86                      Max.   :900.0   Max.   :422.0

We conduct one more step to finalize our estimates of care need and provision which involves applying a smoothing function. The data estimates above have some sharp peaks and valleys caused by outlines within specific age groups. One issue with bringing the data down to only single age bins, even when using five year averages, is that some ages have very few individuals. Smoothing functions allows us to help smooth over the outliers by letting us learn from the data around an observation to identify and decrease the impact of these values.

These smoothing functions also allow us to fill in the areas around our assumption borders. For instance, we know that those aged 12-18 provide some care giving and likely need less care giving than a 12 year old. We also know that this group is likely in more need and less provision than those aged 18 and above. Smoothing functions thus allows us to provide a smoother transition from the age of 12-18 which will blend this age group with the existing data allowing for a more accurate estimate. This also allows us to not be overly relient on our assumptions. We specifically utilize a LOESS methodology in the following code chunk.

The formula for the smoothing function is below. Of note, we bound the function at the minimum and max values meaning that no age group is being changed to increase above or below the current bounds of the data.

smooth_data <- function(df) {
  df %>%
    group_by(care_focus) %>%
    arrange(age) %>%  # Ensure data is ordered before smoothing
    mutate(
      smoothed_need = predict(loess(need_interval ~ age, data = cur_data(), span = 0.2), newdata = data.frame(age = age)), 
      min_val_need = min(need_interval, na.rm = TRUE),
      max_val_need = max(need_interval, na.rm = TRUE),
      smoothed_need = pmax(pmin(smoothed_need, max_val_need), min_val_need), # Ensure within bounds
      smoothed_prov = predict(loess(provision_interval ~ age, data = cur_data(), span = 0.3), newdata = data.frame(age = age)), 
      min_val_prov = min(provision_interval, na.rm = TRUE),
      max_val_prov = max(provision_interval, na.rm = TRUE),
      smoothed_prov = pmax(pmin(smoothed_prov, max_val_prov), min_val_prov) # Ensure within bounds
    ) %>%
    ungroup()
}

# Apply smoothing function to your dataset
market_datum <- smooth_data(market_datum) %>%
  select(age, care_focus, smoothed_need, smoothed_prov) %>%
  rename("need_interval" = smoothed_need) %>%
  rename("provision_interval" = smoothed_prov)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `smoothed_need = predict(...)`.
## ℹ In group 1: `care_focus = "daily_living"`.
## Caused by warning:
## ! `cur_data()` was deprecated in dplyr 1.1.0.
## ℹ Please use `pick()` instead.

While the smoothing function is helpful for us to smooth over our estimates and assumptions, we nonetheless want to make sure that some of the assumptions are fimrly held. We believe that it is essential to reiterate that individuals under the age of 12 require 24 hours of care and are providing no care. We also reiterate that those over 85 are recieing and providing care relative to the above assumptions. By repopulating these values, we overwrite the smoothed estimates for them while keeping the smoothed estimates for all other ages.

under5_health <- 300
five_twelve_health <- 150

under5_develop <- 420
five_twelve_develop <- 480

under5_daily <- 1440 - under5_health - under5_develop
five_twelve_daily <- 1440 - five_twelve_health - five_twelve_develop

no_provision <- 0

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "health"] <- under5_health
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "health"] <- five_twelve_health

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "developmental"] <- under5_develop
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "developmental"] <- five_twelve_develop

market_datum$need_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "daily_living"] <- under5_daily
market_datum$need_interval[market_datum$age > 5 & market_datum$age < 13 & market_datum$care_focus == "daily_living"] <- five_twelve_daily


market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "health"] <- no_provision
market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "developmental"] <- no_provision
market_datum$provision_interval[market_datum$age >= 0 & market_datum$age < 6 & market_datum$care_focus == "daily_living"] <- no_provision

market_datum$provision_interval[market_datum$age > 84 & market_datum$care_focus == "health"] <- no_provision
market_datum$provision_interval[market_datum$age > 84 & market_datum$care_focus == "developmental"] <- no_provision
market_datum$provision_interval[market_datum$age > 84 & market_datum$care_focus == "daily_living"] <- no_provision

over_85_health <- 300
over_85_develop <- 0
over_85_daily = 1200 - over_85_health - over_85_develop

market_datum$need_interval[market_datum$age > 84 & market_datum$care_focus == "health"] <- over_85_health
market_datum$need_interval[market_datum$age > 84 & market_datum$care_focus == "developmental"] <- over_85_develop
market_datum$need_interval[market_datum$age > 84 & market_datum$care_focus == "daily_living"] <- over_85_daily

We can now look through the data for this table.

summary(market_datum)
##       age      care_focus        need_interval    provision_interval
##  Min.   : 0   Length:261         Min.   :  0.00   Min.   :  0.0     
##  1st Qu.:21   Class :character   1st Qu.: 21.98   1st Qu.: 77.7     
##  Median :43   Mode  :character   Median :106.48   Median :159.5     
##  Mean   :43                      Mean   :187.49   Mean   :179.6     
##  3rd Qu.:65                      3rd Qu.:222.34   3rd Qu.:303.8     
##  Max.   :86                      Max.   :900.00   Max.   :411.8

Finalize Care Economy Data

Finally we merge this data with age_data which was previously created to add the popualtion value to this dataframe. This gives us the following final dataframe for analysis.

Care_Economy <- market_datum %>%
  left_join(age_data, by = "age")

datatable(Care_Economy, options = list(pageLength = 100))

We then write this final dataframe as a csv file and a dta file into our repository for upload to the careboard. Following this we are finished with the care economy section. All data in this section is used to populate the care economy page and is available for download on the data tab.

write.csv(Care_Economy, "market_datum/Care_Economy_methodology.csv", row.names = FALSE)
write_dta(Care_Economy, "market_datum/Care_Economy_methodololgy.dta")

Lives of Care (Provider Demographics)

One of the key questions we had when entering this project is who is providing care within the society? While a wide variety of previous literature has shown that care giving within society is often conducted by specific groups, such as women. In this project, we specifically seek to measure the time spent by different groups within society and understand how growth of care giving has shifted across groups and overtime. This section provides all code and replication files needed to understand how we aquired these estimates.

Provider Groups

The first step of this process is to identify the groups we think are of most interest. We call these groups provider groups. We reviewed the scholarly literature and the discussions throughout society to understand which groups might be expected to see imbalances and shifts in caregiving load/expectations. While this list of potential providers can be updated whenever desired, we start with what we have identified as the key groups of interest. We start by focusing on the following groups

  • Care Status Which will reflect an intersection of gender and parenthood. This is expected to see results based on gendered patterns in care giving as well as parents likely needing to propvide more care than non-parents.

  • Race Which represents the respondents identified race/ethnicity. Past literature has shown differnce in care-giving and access to formal care infrastructure across different races.

  • Marital Status Which represents the respondents relationship status. Depending on the relationship status of an individual, people might have access to more care help throughout their house.

  • Poverty Status Which represents the respondents level of poverty. Past literature has shown that individuals in poverty are likely to have decreased access to formal care infrastructure which might affect informal caregiving times.

The code chunk below creates the intitial parts of a dataframe which will be populated including data on these providers. We have a column called name which represents the name of each potential provider group and a column called id, which is the name in kebaba case which will be useful for anyone seeking to work programatically with this data.

name <- c("Care Status", "Race", "Marital Status", "Poverty Status")
id <- c("care-status", "race", "marital-status", "poverty")

provider_group <- data.frame(id = id, name = name, stringsAsFactors = FALSE)

datatable(provider_group, options = list(pageLength = 100))

Finally we will write this dataframe into a csv and dta table.

write.csv(provider_group, "provider_group/provider_group.csv", row.names = FALSE)

Providers

For each of these groups we next need to determine the specific catagories of each gorup and the characteristics of the group that we want to calculate. For now, the only group characteristic that we’re going to be interested in will be the population. This section will thus look at each provider group and tryo to calculate the population of each specific catagory withing that group.

To accomplish this task, we utilize the yearly ATUS survey to cacluate populations. The use of this survey is a significant decision, as the census bureau prefers the usage of the CPS monthly survey to analyze demographic shifts in the USA. We utilzie the ATUS survey due to the fact that in the next code section, we will impute time use statisticts relative to these providers. In future updates to the care board, we might seek to develop ways to fuse the monthly updates of the CPS survey and the yearly updates of the ATUS survye, but for now we will focus on using the ATUS survey to create these estimates.

The first step is to load in the ATUS data which we do via the code chunk below. Utilizing this data we then filter it to include the most recent five years of data (non inclusive of the yar 2020). The year 2020 is excluded due to the COVID pandemic significantly affecting the implenetation of the ATUS survey. We then create a variable called parenthood which is valued at either with or without children.

atus <- read.csv("03_ATUSdata.csv") %>%
  filter(YEAR == 2023) %>%
  mutate(parenthood = ifelse(care_status %in% c("Mothers", "Fathers"), 
                             "With Child", 
                             "Without Child"))

Following this,m we create a dataset for each of our provider groups that include the individual catagories and the population within each. We due this using the survey wight variable WT06 to get population level estimates. We divide this by 365 in order to get daily level estimates. For each of the provider groups identified above, we provide the code to create the population estimates and the results below.

The firs proivder group we analyze is the care stuatus group which represents Mothers, Fathers, Childless Men, and Childless Women. The code below compiles the data for this group.

care_pop <- atus %>%
  group_by(care_status, CASEID) %>%
  summarise(
    care = first(care_status),
    weight = first(WT06)
  ) %>%
  ungroup() %>%
  group_by(care) %>%
  summarise(
    population = sum(weight)/365
  ) %>%
  rename(name = care)
## `summarise()` has grouped output by 'care_status'. You can override using the
## `.groups` argument.
care_pop$id <- str_to_lower(str_replace_all(care_pop$name, "[^a-zA-Z0-9]+", "-"))
care_pop$provider_group_id <- "care-status"

datatable(care_pop, options = list(pageLength = 100))

The next group that we include is race and ethnicity. We provide population estimates for these groups in the below code.

race_pop <- atus %>%
  group_by(race_ethnicity, CASEID) %>%
  summarise(
    race = first(race_ethnicity),
    weight = first(WT06)
  ) %>%
  ungroup() %>%
  group_by(race) %>%
  summarise(
    population = sum(weight)/365
  ) %>%
  rename(name = race)
## `summarise()` has grouped output by 'race_ethnicity'. You can override using
## the `.groups` argument.
race_pop$id <- str_to_lower(str_replace_all(race_pop$name, "[^a-zA-Z0-9]+", "-"))
race_pop$provider_group_id <- "race"

datatable(race_pop, options = list(pageLength = 100))

The next provider group that we want to look at is poeverty status. The code below provides the information related to the poverty status group.

poverty_pop <- atus %>%
  group_by(poverty, CASEID) %>%
  summarise(
    poverty = first(poverty),
    weight = first(WT06)
  ) %>%
  ungroup() %>%
  group_by(poverty) %>%
  summarise(
    population = sum(weight)/365
  ) %>%
  rename(name = poverty) %>%
  filter(name != "NIU")
## `summarise()` has grouped output by 'poverty'. You can override using the
## `.groups` argument.
poverty_pop$id <- str_to_lower(str_replace_all(poverty_pop$name, "[^a-zA-Z0-9]+", "-"))
poverty_pop$provider_group_id <- "poverty"

datatable(poverty_pop, options = list(pageLength = 100))

The final group that we develop statistics for is marital status. The code below provides information related to this provider group.

marital_pop <- atus %>%
  group_by(marst, CASEID) %>%
  summarise(
    marital = first(marst),
    weight = first(WT06)
  ) %>%
  ungroup() %>%
  group_by(marital) %>%
  summarise(
    population = sum(weight)/365
  ) %>%
  rename(name = marital)
## `summarise()` has grouped output by 'marst'. You can override using the
## `.groups` argument.
marital_pop$id <- str_to_lower(str_replace_all(marital_pop$name, "[^a-zA-Z0-9]+", "-"))
marital_pop$provider_group_id <- "marital-status"

datatable(marital_pop, options = list(pageLength = 100))

Finally we will take these datasets and bind them together into a single provider table. This final table can be viewed below. After binding the rows together we write the table into a csv file.

provider <- rbind(care_pop, race_pop, poverty_pop, marital_pop)

write.csv(provider, "provider/provider.csv", row.names = FALSE)

datatable(provider, options = list(pageLength = 100))

Provider Informal

The next set of code focuses on looking at each individual provider and calcualting the time spent providing informal care giving. The code below focuses first on the care status provider group and looks at the time spent in care focuses for each of the provider groups. The provision interval column below represents the total amount of time spent by this group providing care in minutes. As can be seen, this is a very large number. In many figures we might expect to divide this by 60, in order to provide the estimates by hours. However, it will remain a very large number due to representing a population level estimate.

care_formal <- atus %>%
  group_by(care_status, FOCUS, CASEID) %>%
  summarise(
    workduration = sum(DURATION),
    weight = first(WT06)/365
  ) %>%
  ungroup() %>%
  group_by(care_status, FOCUS) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = FOCUS) %>%
  rename(provider_id = care_status)
## `summarise()` has grouped output by 'care_status', 'FOCUS'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'care_status'. You can override using the
## `.groups` argument.
care_formal$provider_id <- str_to_lower(str_replace_all(care_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(care_formal, options = list(pageLength = 100))

The next code provides the same information for the race ethnicity provider groups.

race_formal <- atus %>%
  group_by(race_ethnicity, FOCUS, CASEID) %>%
  summarise(
    workduration = sum(DURATION),
    weight = first(WT06)/365
  ) %>%
  ungroup() %>%
  group_by(race_ethnicity, FOCUS) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = FOCUS) %>%
  rename(provider_id = race_ethnicity)
## `summarise()` has grouped output by 'race_ethnicity', 'FOCUS'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'race_ethnicity'. You can override using
## the `.groups` argument.
race_formal$provider_id <- str_to_lower(str_replace_all(race_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(race_formal, options = list(pageLength = 100))

The next table provides intervals for the poverty provider status groups.

poverty_formal <- atus %>%
  group_by(poverty, FOCUS, CASEID) %>%
  summarise(
    workduration = sum(DURATION),
    weight = first(WT06)/365
  ) %>%
  ungroup() %>%
  group_by(poverty, FOCUS) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = FOCUS) %>%
  rename(provider_id = poverty)
## `summarise()` has grouped output by 'poverty', 'FOCUS'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'poverty'. You can override using the
## `.groups` argument.
poverty_formal$provider_id <- str_to_lower(str_replace_all(poverty_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(poverty_formal, options = list(pageLength = 100))

The final table provides provision interval estimates for marital status provider groups.

marital_formal <- atus %>%
  group_by(marst, FOCUS, CASEID) %>%
  summarise(
    workduration = sum(DURATION),
    weight = first(WT06)/365
  ) %>%
  ungroup() %>%
  group_by(marst, FOCUS) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = FOCUS) %>%
  rename(provider_id = marst)
## `summarise()` has grouped output by 'marst', 'FOCUS'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'marst'. You can override using the
## `.groups` argument.
marital_formal$provider_id <- str_to_lower(str_replace_all(marital_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(marital_formal, options = list(pageLength = 100))

To finalize the provider_informa table we bind all of these different tables together using an rbind function. We also filter out cases that we don’t need to include on the care board such as when the provider id is valued at niu or care_focus is valued at none. The code below then creates total population estimates by looking at provision across all groups within society and binding them to the other data. We then write this information to a csv file as a data table.

provider_informal_datum <- rbind(care_formal, race_formal, poverty_formal, marital_formal) %>%
  filter(provider_id != "niu") %>%
  filter(care_focus != "none")


write.csv(provider_informal_datum, "Provider Informal Datum/provider_informal_datum.csv", row.names = FALSE)

datatable(provider_informal_datum, options = list(pageLength = 100))

Provider formal data

The final step of creating the provider data involves looking at the time spent for each provider group in formal care giving. formal caregiving represents the time spent engaging in paid care work. Jobs such as nurses, teachers, chefs, barbers, and many others represent paid work for caregiving. The code below lookes at these different grous and determines the amount of time spent in the formal care sector for the different providers.

The first group that we look at is care status.

care_formal <- atus %>%
  group_by(care_status, formal_care_focus, CASEID) %>%
  summarise(
    workduration = sum(DURATION * PaidWork, na.rm = TRUE),
    weight = first(WT06)/365
  ) %>%
  ungroup() %>%
  group_by(care_status, formal_care_focus) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = formal_care_focus) %>%
  rename(provider_id = care_status)
## `summarise()` has grouped output by 'care_status', 'formal_care_focus'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'care_status'. You can override using the
## `.groups` argument.
care_formal$provider_id <- str_to_lower(str_replace_all(care_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(care_formal, options = list(pageLength = 100))

The next provider we look at is time spent in the formal economy based on race and ethnicity.

race_formal <- atus %>%
  group_by(race_ethnicity, formal_care_focus, CASEID) %>%
  summarise(
    workduration = sum(DURATION * PaidWork, na.rm = TRUE),
    weight = first(WT06)/365
  ) %>%
  ungroup() %>%
  group_by(race_ethnicity, formal_care_focus) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = formal_care_focus) %>%
  rename(provider_id = race_ethnicity)
## `summarise()` has grouped output by 'race_ethnicity', 'formal_care_focus'. You
## can override using the `.groups` argument.
## `summarise()` has grouped output by 'race_ethnicity'. You can override using
## the `.groups` argument.
race_formal$provider_id <- str_to_lower(str_replace_all(race_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(race_formal, options = list(pageLength = 100))

The next group that we want to look at is poverty status.

poverty_formal <- atus %>%
  group_by(poverty, formal_care_focus, CASEID) %>%
  summarise(
    workduration = sum(DURATION * PaidWork, na.rm = TRUE),
    weight = first(WT06)/365
  ) %>%
  ungroup() %>%
  group_by(poverty, formal_care_focus) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = formal_care_focus) %>%
  rename(provider_id = poverty)
## `summarise()` has grouped output by 'poverty', 'formal_care_focus'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'poverty'. You can override using the
## `.groups` argument.
poverty_formal$provider_id <- str_to_lower(str_replace_all(poverty_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(poverty_formal, options = list(pageLength = 100))

Finally we will look at the time spent on formal care activities for individuals based on marital status.

marital_formal <- atus %>%
  group_by(marst, formal_care_focus, CASEID) %>%
  summarise(
    workduration = sum(DURATION * PaidWork, na.rm = TRUE),
    weight = first(WT06)/365
  ) %>%
  ungroup() %>%
  group_by(marst, formal_care_focus) %>%
  summarise(
    population = sum(weight, na.rm = TRUE),
    provision_interval = sum(workduration*weight)
  ) %>%
  rename(care_focus = formal_care_focus) %>%
  rename(provider_id = marst)
## `summarise()` has grouped output by 'marst', 'formal_care_focus'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'marst'. You can override using the
## `.groups` argument.
marital_formal$provider_id <- str_to_lower(str_replace_all(marital_formal$provider_id, "[^a-zA-Z0-9]+", "-"))

datatable(marital_formal, options = list(pageLength = 100))

Finally, we bind these observations together and filter out all cases where the provider id is niu or the care_focus is none. These observations are not needed for the care board and as such are not inlcuded in the final data. We then write the data as a csv file to be exported and downloaded.

provider_formal_datum <- rbind(care_formal, race_formal, poverty_formal, marital_formal) %>%
  filter(provider_id != "niu") %>%
  filter(care_focus != "none")

write.csv(provider_formal_datum, "Provider Formal Datum/provider_formal_datum.csv", row.names = FALSE)

datatable(provider_formal_datum, options = list(pageLength = 100))

The code above provided a series of information about the different care providers studied and analyzed on the care board. For each care provider we calculated their societal population along with the time they spend in both formal and informal caregiving. The information is vital to understand the demographics of caregivers throughout society and understand the quesiton of who is provdiging caregiving.

#The Circle of Care (Activity Timeuse)

Acitiity Formal/Informal

= Count/Wage/hours worked

Care Gini

State Care_Focus

Care Ratio

State Childcare/Eldercare/Total

Labor Force Participation

State Care_Status

The Sandwhich Generation

Care_Status Formal/Informal Age_catagory Sex Race_Ethnicity Poverty Status *Employment Status

= Hours spent in care

Valuing the Care Economy

State Formal/Informal *Care_Focus

The share of the Formal/Informal Care Economy

Formal/Informal Care_Focus

APPENDIX: RAW ATUS

APPENDIX: RAW CPS

APPENDIX: RAW ASEC

APPENDIX: RAW NHIS

APPENDIX: LABELING CARE ACTIVITIES

Formal/Informal