Data

The American Time Use Survey (ATUS) is a time-use survey of Americans, which is sponsored by the Bureau of Labor Statistics (BLS) and conducted by the U.S. Census Bureau. Respondents of the survey are asked to keep a diary for one day carefully recording the amount of time they spend on various activities including working, leisure, childcare, and household activities. The survey has been conducted every year since 2003.

Included in the data are main demographic variables such as respondents’ age, sex, race, marital status, and education. The data also includes detailed income and employment information for each respondent. While there are some slight changes to the survey each year, the main questions asked stay the same. You can find the data dictionaries for each year on https://www.bls.gov/tus/dictionaries.htm

Accessing the Data

There are multiple ways to access the ATUS data; however, for this project, you’ll get the raw data directly from the source. The data for each year can be found at https://www.bls.gov/tus/#data. Once there, there is an option of downloading a multi-year file, which includes data for all of the years the survey has been conducted, but for the purposes of this project, let’s just look at the data for 2016. Under Data Files, click on American Time Use Survey--2016 Microdata files.

You will be brought to a new screen. Scroll down to the section 2016 Basic ATUS Data Files. Under this section, you’ll want to click to download the following two files: ATUS 2016 Activity summary file (zip) and ATUS-CPS 2016 file (zip).

Once they’ve been downloaded, you’ll need to unzip the files. Once unzipped, you will see the dataset in a number of different file formats including .sas, .sps, and .dat files. We’ll be working with the .dat files.

Loading the Data into R

Use the first approach explained above to download and access the ATUS data for 2016. Download the CPS and Activity Summary files in a folder and unzip them and within each folder upload the files ending in .dat to data/raw_data filder on RStudio.cloud. To load the data in, run the code in the atus-data code chunk to create an object called atus.all.

Importing data

atus.cps <- read.delim('data/raw_data/atuscps_2016.dat', sep=",")
atus.sum <- read.delim('data/raw_data/atussum_2016.dat', sep=",")
atus.all <- atus.sum %>%  ## joining all 3 files together by respondents' ID
  left_join(atus.cps %>% filter(TULINENO==1), by = c("TUCASEID"))

Exploratory Analysis of Child Care Data

### Add Code Here
str(atus.all)
## 'data.frame':    10493 obs. of  798 variables:
##  $ TUCASEID  : num  2.02e+13 2.02e+13 2.02e+13 2.02e+13 2.02e+13 ...
##  $ TUFINLWGT : num  24588650 5445941 8782622 3035910 6978586 ...
##  $ TRYHHCHILD: int  -1 -1 0 8 -1 4 5 0 -1 7 ...
##  $ TEAGE     : int  62 69 24 31 59 16 43 34 63 39 ...
##  $ TESEX     : int  2 1 2 2 2 2 2 2 1 2 ...
##  $ PEEDUCA.x : int  39 37 39 40 39 36 43 39 46 40 ...
##  $ PTDTRACE.x: int  1 2 2 1 1 3 1 1 1 1 ...
##  $ PEHSPNON.x: int  2 2 2 2 2 1 2 2 2 2 ...
##  $ GTMETSTA.x: int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TELFS     : int  5 5 5 1 1 5 5 5 5 1 ...
##  $ TEMJOT    : int  -1 -1 -1 2 1 -1 -1 -1 -1 2 ...
##  $ TRDPFTPT  : int  -1 -1 -1 2 2 -1 -1 -1 -1 1 ...
##  $ TESCHENR  : int  -1 -1 2 2 -1 1 2 2 -1 2 ...
##  $ TESCHLVL  : int  -1 -1 -1 -1 -1 1 -1 -1 -1 -1 ...
##  $ TRSPPRES  : int  1 1 3 3 1 3 1 3 1 1 ...
##  $ TESPEMPNOT: int  2 2 -1 -1 2 -1 1 -1 1 1 ...
##  $ TRERNWA   : int  -1 -1 -1 46944 30250 -1 -1 -1 -1 -1 ...
##  $ TRCHILDNUM: int  0 0 2 3 0 4 3 3 0 2 ...
##  $ TRSPFTPT  : int  -1 -1 -1 -1 -1 -1 1 -1 1 1 ...
##  $ TEHRUSLT  : int  -1 -1 -1 32 12 -1 -1 -1 -1 46 ...
##  $ TUDIARYDAY: int  6 1 1 1 1 1 3 1 1 1 ...
##  $ TRHOLIDAY : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ TRTEC     : int  -1 30 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ TRTHH     : int  0 0 380 705 0 0 120 615 0 520 ...
##  $ t010101   : int  690 600 940 635 500 565 435 645 510 670 ...
##  $ t010102   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t010201   : int  25 20 120 20 80 55 10 20 0 0 ...
##  $ t010299   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t010301   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t010399   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t010401   : int  0 0 0 0 0 0 0 0 0 90 ...
##  $ t010499   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020101   : int  75 60 0 20 30 0 0 180 0 120 ...
##  $ t020102   : int  6 0 0 50 25 0 0 90 0 0 ...
##  $ t020103   : int  0 0 0 65 0 0 0 0 0 0 ...
##  $ t020104   : int  0 0 30 60 0 0 0 0 0 0 ...
##  $ t020199   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020201   : int  50 150 75 90 0 90 80 5 60 40 ...
##  $ t020202   : int  0 0 0 0 0 10 0 0 0 0 ...
##  $ t020203   : int  45 0 0 50 0 0 0 30 0 0 ...
##  $ t020299   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020301   : int  0 0 0 60 0 0 0 0 0 0 ...
##  $ t020302   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020303   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020399   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020401   : int  0 20 0 0 0 0 0 0 0 0 ...
##  $ t020402   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020499   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020501   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020502   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020601   : int  6 0 0 0 145 0 0 0 30 0 ...
##  $ t020602   : int  8 0 0 0 0 0 0 0 0 0 ...
##  $ t020699   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020701   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020799   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020801   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020899   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020901   : int  0 0 0 0 10 0 0 0 0 0 ...
##  $ t020902   : int  0 0 0 0 25 0 135 0 0 0 ...
##  $ t020903   : int  0 0 0 0 15 0 0 0 15 0 ...
##  $ t020904   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020905   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t020999   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t029999   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030101   : int  0 0 0 10 0 0 160 115 0 20 ...
##  $ t030102   : int  0 0 0 20 0 0 0 0 0 0 ...
##  $ t030103   : int  0 0 0 0 0 0 0 60 0 0 ...
##  $ t030104   : int  0 0 0 0 0 0 0 0 0 30 ...
##  $ t030105   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030106   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030108   : int  0 0 0 30 0 0 0 0 0 0 ...
##  $ t030109   : int  0 0 0 0 0 0 0 5 0 0 ...
##  $ t030110   : int  0 0 0 0 0 0 90 0 0 0 ...
##  $ t030111   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030112   : int  0 0 0 0 0 0 60 0 0 0 ...
##  $ t030199   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030201   : int  0 0 0 0 0 0 20 0 0 0 ...
##  $ t030202   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030203   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030204   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030299   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030301   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030302   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030303   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030399   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030401   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030402   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030403   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030404   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030405   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030499   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030501   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030502   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030503   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030504   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t030599   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t039999   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t040101   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ t040102   : int  0 0 0 0 0 0 0 0 0 0 ...
##   [list output truncated]
mean(atus.all$t120101)
## [1] 38.06481
atus.all <- atus.all %>% 
    mutate(CHILDCARE = t030101 + t030102 + t030103 + t030104 + t030105 + t030106 + t030108 + t030109 + t030110 + t030111 + t030112 + t030199 %>%
  glimpse(CHILDCARE))
##  int [1:10493] 0 0 0 0 0 0 0 0 0 0 ...
ggplot(atus.all, aes(CHILDCARE, na.rm=FALSE)) +
  geom_density() +
  theme_classic()

Obersavtions:

atus.all %>% 
    group_by(TESEX) %>%  # gender variable
    summarise(avg_parent_childcare=mean(CHILDCARE))
## # A tibble: 2 x 2
##   TESEX avg_parent_childcare
##   <int>                <dbl>
## 1     1                 19.0
## 2     2                 33.2

Observations:

## replace -1 in the variable TRDPFTPT with NA.
atus.all$TRDPFTPT[atus.all$TRDPFTPT==-1] <- NA %>%
  sum(is.na(atus.all$TRDPFTPT))

grep("TRHHCHILD", names(atus.all))
## integer(0)
## find amount of missing values in the column
sum(is.na(atus.all$TRDPFTPT))
## [1] 4119
class(atus.all$TRYHHCHILD)
## [1] "integer"
## add your exploratory analysis code here
adults_atLeast_one_child <- atus.all %>%
  select(CHILDCARE, TEAGE, TRYHHCHILD, HEFAMINC, TRCHILDNUM, PEMARITL, TRDPFTPT, TESEX) %>%
  filter(TRCHILDNUM > 0)

ggplot(adults_atLeast_one_child, aes(x = TEAGE, y = CHILDCARE)) +
      geom_point(aes(color = factor(TEAGE)), size = 1) +
      theme(legend.position = "none") +
      labs( x = "RESPONDENT'S AGE \n years", y = "CHILDCARE \n minutes per week", title = "Do younger people spend more time with \n their children than older people?") +
  theme_classic()

Observations:

** Note that this data only includes respondents with at least one child within the household that they take care of. And this variable will stay constant for the next three graphs!**

Regression Analysis

## add your regression analysis code here
reg_model <- lm(CHILDCARE ~ TEAGE + HEFAMINC + PEMARITL + TRDPFTPT + TESEX, data = adults_atLeast_one_child)
summary(reg_model)
## 
## Call:
## lm(formula = CHILDCARE ~ TEAGE + HEFAMINC + PEMARITL + TRDPFTPT + 
##     TESEX, data = adults_atLeast_one_child)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -109.43  -54.78  -31.32   24.38  721.07 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 109.5200    12.3222   8.888  < 2e-16 ***
## TEAGE        -1.6832     0.1720  -9.785  < 2e-16 ***
## HEFAMINC      0.6336     0.4770   1.328    0.184    
## PEMARITL     -7.9348     0.9462  -8.386  < 2e-16 ***
## TRDPFTPT     -4.0802     4.1355  -0.987    0.324    
## TESEX        21.4070     3.3851   6.324 2.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 89.73 on 3116 degrees of freedom
##   (1191 observations deleted due to missingness)
## Multiple R-squared:  0.05027,    Adjusted R-squared:  0.04874 
## F-statistic: 32.98 on 5 and 3116 DF,  p-value: < 2.2e-16

Exploratory Analysis of Age and Activities

atus.wide <- atus.all %>%
    mutate(act01 = rowSums(atus.all[,grep("t01", names(atus.all))]),
           act02 = rowSums(atus.all[,grep("t02", names(atus.all))]),
           act03 = rowSums(atus.all[,grep("t03", names(atus.all))]),
           act04 = rowSums(atus.all[,grep("t04", names(atus.all))]),
           act05 = rowSums(atus.all[,grep("t05", names(atus.all))]),
           act06 = rowSums(atus.all[,grep("t06", names(atus.all))]),
           act07 = rowSums(atus.all[,grep("t07", names(atus.all))]),
           act08 = rowSums(atus.all[,grep("t08", names(atus.all))]),
           act09 = rowSums(atus.all[,grep("t09", names(atus.all))]),
           act10 = rowSums(atus.all[,grep("t10", names(atus.all))]),
           act11 = rowSums(atus.all[,grep("t11", names(atus.all))]),
           act12 = rowSums(atus.all[,grep("t12", names(atus.all))]),
           act13 = rowSums(atus.all[,grep("t13", names(atus.all))]),
           act14 = rowSums(atus.all[,grep("t14", names(atus.all))]),
           act15 = rowSums(atus.all[,grep("t15", names(atus.all))]),
           act16 = rowSums(atus.all[,grep("t16", names(atus.all))]),
           # act17 = , there is no category 17 in the data
           act18 = rowSums(atus.all[,grep("t18", names(atus.all))])) %>% 
    select(TUCASEID, TEAGE, HEFAMINC, starts_with("act"))

    head(atus.wide)
##      TUCASEID TEAGE HEFAMINC act01 act02 act03 act04 act05 act06 act07
## 1 2.01601e+13    62        3   715   190     0     0     0     0     0
## 2 2.01601e+13    69        6   620   230     0     0     0     0     0
## 3 2.01601e+13    24        4  1060   105     0     0     0     0    60
## 4 2.01601e+13    31        8   655   395    60     0     0     0     0
## 5 2.01601e+13    59       13   580   250     0     0     0     0    18
## 6 2.01601e+13    16        5   620   100     0     0     0     0     0
##   act08 act09 act10 act11 act12 act13 act14 act15 act16 act18
## 1     0     0     0    40   465     0     0     0     0    30
## 2     0     0     0    30   560     0     0     0     0     0
## 3     0     0     0    75    20     0     0     0     0    60
## 4     0     0     0   165   120     0     0     0    45     0
## 5     0     0     0    30   177     0    60   130   120    75
## 6     0     0     0   120   355    50     0     0     0    35
atus.long <- atus.wide %>% 
  # use code to convert the wide format to long.
  gather(ACTIVITY, MINS, act01:act18)
head(atus.long)
##      TUCASEID TEAGE HEFAMINC ACTIVITY MINS
## 1 2.01601e+13    62        3    act01  715
## 2 2.01601e+13    69        6    act01  620
## 3 2.01601e+13    24        4    act01 1060
## 4 2.01601e+13    31        8    act01  655
## 5 2.01601e+13    59       13    act01  580
## 6 2.01601e+13    16        5    act01  620
atus.long %>% 
    group_by(ACTIVITY, TEAGE) %>% 
    summarise(AVGMINS = mean(MINS)) %>% 
    ggplot(aes(TEAGE, AVGMINS)) +
  geom_bar(stat = "identity", aes(color=factor(TEAGE))) +
  facet_grid(rows = vars(ACTIVITY)) +
  coord_flip() +
  labs(title = "Average amount of time spent \n per person's age") +
  theme(text = element_text(size = 10),
  axis.text.x = element_text(angle = 90, hjust = 1))

Exploratory Analysis of Income and Activities

atus.long %>% 
  group_by(ACTIVITY, HEFAMINC) %>% 
  ## add the rest of the code here
  summarise(AVGMINS_WRK = mean(MINS)) %>%
  mutate(SumMins = sum(AVGMINS_WRK)) %>%
  mutate(AvgSumMins = AVGMINS_WRK/SumMins)%>%
  #plot the graph
  ggplot(aes(x = ACTIVITY,y = AvgSumMins)) +
    geom_bar(stat = "identity", aes(fill = factor(HEFAMINC))) +
        scale_fill_hue(h = c(180, 450)) +
        coord_flip()+
         labs(title = "Amount of time spent on activities \n by income") +
  theme_classic()

Observations:

## save the plot above
ggsave(filename = "activity_by_income.png", plot = last_plot(), path = "/cloud/project/atus_survey_analysis/figures/explanatory_figures" )
## Saving 7 x 5 in image