Determinants of Sleep Duration in the American Time Use Survey
Author
Dennis Baidoo
Published
June 21, 2025
Abstract
This study investigates the factors influencing sleep duration using data from the American Time Use Survey (ATUS) between 2003 and 2021. Employing linear regression with stepwise selection based on Akaike Information Criterion (AIC), we analyzed demographic, socioeconomic, and behavioral predictors of sleep duration. The initial model incorporated main effects, with subsequent refinement through stepwise selection. Key findings reveal significant associations between sleep duration and age, sex, education level, work hours, household composition, and an interaction between time spent alone and time spent with family members. Diagnostic procedures included outlier removal and variable transformations to ensure model validity. The results demonstrate how analytical choices—including sample selection, starting model specification, and selection criteria—affect model outcomes, highlighting the importance of methodological transparency in observational studies of human behavior.
1. Introduction
Sleep constitutes a fundamental biological process with profound implications for physical health, cognitive function, and overall quality of life. Understanding the determinants of sleep duration remains an important area of research in public health and social science. The American Time Use Survey (ATUS), conducted by the U.S. Bureau of Labor Statistics, provides comprehensive data on how Americans allocate their time across various activities, including sleep. This rich dataset enables researchers to examine how demographic characteristics, socioeconomic factors, and daily behaviors relate to sleep patterns.
The current analysis examines sleep duration using a subset of ATUS data from 2003 to 2021, focusing specifically on employed individuals who reported sleep durations within a biologically plausible range. Beyond identifying significant predictors of sleep duration, this study emphasizes how methodological decisions throughout the analytical process—including sample selection criteria, choice of starting model, and model selection approach—can influence research findings. This focus on analytical transparency contributes to ongoing discussions about reproducibility and best practices in statistical modeling of behavioral data.
2. Methods
2.1 Data Source and Sample Selection
The analysis utilized data from the ATUS 2003–2021 Multi-Year Microdata Files, accessed through the erikdata R package. The initial dataset included information on time use, demographic characteristics, and employment metrics for a nationally representative sample of U.S. residents. The analytical sample was restricted to 500 randomly selected observations of employed individuals who reported positive sleep durations between 5 and 12 hours per day, representing biologically plausible values for adult sleep. Additional exclusions removed influential observations (case IDs: 87, 250, 371, 436) identified through diagnostic procedures.
2.2 Variables and Transformations
The response variable was daily sleep duration in hours (t0101), derived from original minute values divided by 60. Predictor variables included demographic characteristics (sex, age, metropolitan residence status), socioeconomic factors (educational attainment, hourly earnings), and behavioral measures (weekly work hours, household composition, time allocation).
Notable data transformations included converting the hourly earnings variable (TRERNHLY) into a binary indicator (≤ $5,000 vs. > $5,000) to address right-skewness in the distribution. Educational attainment (PEEDUCA) was converted from categorical to numerical values representing approximate years of schooling. All continuous predictors were examined for outliers and extreme values that might violate model assumptions.
2.3 Analytical Approach
The modeling strategy followed a structured process informed by personalized analysis conditions derived from researcher characteristics. The analysis began with a main effects model (specified by birth date conditions), which included all candidate predictors without interaction terms. Model selection proceeded via stepwise regression using AIC (Akaike Information Criterion) as the selection metric, as determined by birth month conditions.
The stepwise procedure evaluated both forward and backward steps, allowing terms to enter or leave the model based on their contribution to model fit as measured by AIC. This approach balanced model complexity with explanatory power, following the principle of parsimony. Final model selection considered both statistical criteria and substantive interpretability of the predictors.
Diagnostic procedures included examination of residual plots, tests for heteroscedasticity, variance inflation factors for multicollinearity assessment, and leverage statistics to identify influential observations. The Box-Cox transformation procedure confirmed that no power transformation of the response variable was necessary (λ ≈ 1), supporting the use of untransformed sleep hours in the linear model.
3. Results
3.1 Final Model Specification
The selected model explained approximately 9.5% of the variance in sleep duration (R² = 0.095, adjusted R² = 0.078). This moderate explanatory power suggests that while the identified predictors have statistically significant associations with sleep duration, substantial variability remains unexplained—a common finding in behavioral research where numerous unmeasured factors may influence outcomes.
remotes::install_github("erikerhardt/erikdata")
Skipping install of 'erikdata' from a github remote, the SHA1 (4933d345) has not changed since last install.
Use `force = TRUE` to force installation
# example: EE and 14th becomes 050514, where each E = 05th letter of the alphabetcondition_1_seed <-040217
# example: December is 12th month, giving "BIC", that's the 2nd indexcondition_3_criterion <-c("AIC", "BIC")[1]
n_analysis <-500
library(erikmisc)library(tidyverse)ggplot2::theme_set(ggplot2::theme_bw()) # set theme_bw for all plotslibrary(erikdata) # ATUS data, install with devtools::install_github("erikerhardt/erikdata")library(labelled) # for variabel labels, use: var_label(dat_atus$TUCASEID)
set.seed(condition_1_seed) # must run prior to dplyr::slice_sample() to draw the same sampledat_atus <- erikdata::dat_atus |> dplyr::select( TUCASEID , t0101, TESEX, TEAGE, GTMETSTA, PEEDUCA, TRERNHLY, TEHRUSL1, TEHRUSL2, TRHHCHILD, TRTALONE, TRTHHFAMILY )# list of variables with their labelslabels_dat_atus |> dplyr::filter( Var %in%names(dat_atus) )
# A tibble: 12 × 2
Var Label
<chr> <chr>
1 TUCASEID ATUS Case ID (14-digit identifier)
2 TESEX Edited: sex
3 TEAGE Edited: age
4 GTMETSTA Metropolitan status (2000 or 2010 definitions, see note)
5 PEEDUCA Edited: what is the highest level of school you have completed o…
6 TRERNHLY Hourly earnings (2 implied decimals)
7 TRHHCHILD Presence of household children < 18
8 TRTALONE Total time respondent spent alone (in minutes)
9 TRTHHFAMILY Total time respondent spent with household family members (in mi…
10 TEHRUSL1 Edited: how many hours per week do you usually work at your main…
11 TEHRUSL2 Edited: how many hours per week do you usually work at your othe…
12 t0101 Sleeping
dat_atus <- dat_atus |> dplyr::filter( TRERNHLY >0# only people who work and earn an hourly wage , t0101 >0# only people who went to sleep ) |> dplyr::mutate(t0101 = t0101 /60# convert minutes to hours , PEEDUCA_num =case_when( PEEDUCA =="Less than 1st grade"~0# 1 , PEEDUCA =="1st, 2nd, 3rd, or 4th grade"~2.5# 2 , PEEDUCA =="5th or 6th grade"~5.5# 3 , PEEDUCA =="7th or 8th grade"~7.5# 4 , PEEDUCA =="9th grade"~9# 5 , PEEDUCA =="10th grade"~10# 6 , PEEDUCA =="11th grade"~11# 7 , PEEDUCA =="12th grade - no diploma"~12# 8 , PEEDUCA =="High school graduate - diploma or equivalent (GED)"~12# 9 , PEEDUCA =="Some college but no degree"~13# 10 , PEEDUCA =="Associate degree - occupational/vocational"~14# 11 , PEEDUCA =="Associate degree - academic program"~14# 12 , PEEDUCA =="Bachelor's degree (BA, AB, BS, etc.)"~16# 13 , PEEDUCA =="Master's degree (MA, MS, MEng, MEd, MSW, etc.)"~18# 14 , PEEDUCA =="Professional school degree (MD, DDS, DVM, etc.)"~21# 15 , PEEDUCA =="Doctoral degree (PhD, EdD, etc.)"~21# 16 , TRUE~NA|>as.numeric() )# set the "Not identified" Metropolitan areas to NA , GTMETSTA = GTMETSTA |>factor(levels =# keep the levels that are not "Not identified" stringr::str_subset(string =levels(dat_atus$GTMETSTA) , pattern ="Not identified" , negate =TRUE ) )# hours worked at all jobs , TEHRUSL_all = TEHRUSL1 + TEHRUSL2 ) |> dplyr::select(-PEEDUCA , -TEHRUSL1 , -TEHRUSL2 ) |># drop rows with any missing values tidyr::drop_na() |># select your sample of rows for analysis dplyr::slice_sample(n = n_analysis )# label new variableslabelled::var_label(dat_atus[[ "PEEDUCA_num" ]]) <- labelled::var_label(dat_atus[[ "PEEDUCA" ]])# relabel variables that were modified in a way that removes the label attributelabelled::var_label(dat_atus[[ "GTMETSTA" ]]) <- labels_dat_atus |>filter(Var =="GTMETSTA") |>pull(Label)labelled::var_label(dat_atus[[ "TEHRUSL_all" ]]) <- labels_dat_atus |>filter(Var =="TEHRUSL1") |>pull(Label)# wrap all labels for plotsfor (i_var inseq_len(ncol(dat_atus))) { labelled::var_label(dat_atus[, i_var]) <- labelled::var_label(dat_atus[, i_var]) |>str_wrap(width =30)}
Warning in stri_split_lines(str): argument is not an atomic vector; coercing
## filter and mutate data here to satisfy model assumptionsdat_atus <- dat_atus |> dplyr::filter( t0101 >=5 , t0101 <=12 , !(TUCASEID %in%c(87,250,371, 436)) # Can use this to exclude observations by ID number ) |> dplyr::mutate(TRERNHLY = TRERNHLY <=5000 )str(dat_atus)
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
p <-ggpairs( dat_atus |> dplyr::select(-TUCASEID) , title ="ATUS Sleeping" , mapping = ggplot2::aes(colour = TESEX, alpha =0.5) , diag =list(continuous =wrap(c("densityDiag", "barDiag", "blankDiag")[1] , alpha =1/2 ) , discrete =c("barDiag", "blankDiag")[1] )# scatterplots on top so response as first variable has y on vertical axis , upper =list(continuous =wrap(c("points", "smooth", "smooth_loess", "density", "cor", "blank")[2] , se =FALSE , alpha =1/2 , size =1 ) , discrete =c("ratio", "facetbar", "blank")[2] , combo =wrap(c("box", "box_no_facet", "dot", "dot_no_facet", "facethist", "facetdensity", "denstrip", "blank")[2]#, bins = 10 # for facethist ) ) , lower =list(continuous =wrap(c("points", "smooth", "smooth_loess", "density", "cor", "blank")[5]#, se = FALSE#, alpha = 1/2#, size = 1 ) , discrete =c("ratio", "facetbar", "blank")[2] , combo =wrap(c("box", "box_no_facet", "dot", "dot_no_facet", "facethist", "facetdensity", "denstrip", "blank")[5] , bins =10# for facethist ) ) , progress =FALSE , legend =1# create legend )p <- p +theme_bw()p <- p +theme(legend.position ="bottom")print(p)
# Mean modelif (condition_2_init_model =="Mean") { lm_fit_init <-lm( t0101 ~1 , data = dat_atus )}# Main-effects modelif (condition_2_init_model =="Main effects") { lm_fit_init <-lm( t0101 ~ TESEX + TEAGE + GTMETSTA + PEEDUCA_num + TRERNHLY + TEHRUSL_all + TRHHCHILD + TRTALONE + TRTHHFAMILY , data = dat_atus )}# Two-way interaction modelif (condition_2_init_model =="Two-way interaction") { lm_fit_init <-lm( t0101 ~ (TESEX + TEAGE + GTMETSTA + PEEDUCA_num + TRERNHLY + TEHRUSL_all + TRHHCHILD + TRTALONE + TRTHHFAMILY)^2 , data = dat_atus )# If the two-way interaction model has NA coefficients,# then there were probably pairs of categories that had no observations so could not be estimated.# In this case, set the argument "singular.ok = TRUE" in the car::Anova() function below.}lm_fit_init
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
e_plot_model_contrasts: Skipping "TRTALONE" since involved in interactions.
e_plot_model_contrasts: Skipping "TRTHHFAMILY" since involved in interactions.
# Since plot interactions have sublists of plots, we want to pull those out# into a one-level plot list.# The code here works for sw_TWI_plots_keep = "singles"# which will make each plot the same size in the plot_grid() below.# For a publications, you'll want to manually choose which plots to show.# index for plot list,# needed since interactions add 2 plots to the list, so the number of terms# is not necessarily the same as the number of plots.i_list <-0# initialize a list of plotsp_list <-list()for (i_term in1:length(p_cont$plots)) {## i_term = 1if ( length(p_cont$plots) ==0 ) {print("Skip printing contrasts if intercept-only model")next }# extract the name of the plot n_list <-names(p_cont$plots)[i_term]# test whether the name has a colon ":"; if so, it's an interactionif (stringr::str_detect(string = n_list, pattern = stringr::fixed(":"))) {# an two-way interaction has two plots# first plot i_list <- i_list +1 p_list[[ i_list ]] <- p_cont$plots[[ i_term ]][[ 1 ]]# second plot i_list <- i_list +1 p_list[[ i_list ]] <- p_cont$plots[[ i_term ]][[ 2 ]] } else {# not an interaction, only one plot i_list <- i_list +1 p_list[[ i_list ]] <- p_cont$plots[[ i_term ]] } # if# Every 4 plots, print themif (i_list >=4) { p_arranged <- cowplot::plot_grid(plotlist = p_list , nrow =NULL , ncol =2 , labels ="AUTO" ) p_arranged |>print() i_list <-0next }# if last term, print the plotsif (i_term ==length(p_cont$plots)) { p_arranged <- cowplot::plot_grid(plotlist = p_list , nrow =NULL , ncol =2 , labels ="AUTO" ) p_arranged |>print() }} # for
The final model included several main effects and one two-way interaction. Age showed a significant negative association with sleep duration (β = -0.0187, p = 0.0019), indicating that older individuals tended to report slightly shorter sleep durations after accounting for other factors. Sex differences emerged as statistically significant, with females sleeping approximately 0.32 hours more than males on average (p = 0.043). The presence of household children under 18 related to shorter sleep durations, with adults in childless households sleeping about 0.54 hours longer (p = 0.0045).
Educational attainment, measured in approximate years of schooling, showed a modest negative association with sleep duration (β = -0.068, p = 0.016). Weekly work hours similarly demonstrated a small but significant negative relationship (β = -0.0149, p = 0.0085), suggesting that individuals working longer hours tended to sleep slightly less.
The model identified one significant interaction between time spent alone (TRTALONE) and time spent with household family members (TRTHHFAMILY). This interaction (β = -4.01 × 10⁻⁶, p = 0.032) suggests that the relationship between solitude and sleep duration depends on the broader context of family time, though the small effect size warrants cautious interpretation.
3.2 Model Diagnostics and Assumption Checking
Comprehensive diagnostic procedures supported the validity of the final model. Residual plots showed no systematic patterns that would indicate violations of linearity assumptions. The distribution of residuals approximated normality, with no severe skewness or kurtosis evident in the quantile-quantile plot. Tests for heteroscedasticity (Breusch-Pagan test, p = 0.28) indicated no significant concerns about non-constant variance.
Examination of variance inflation factors (VIFs) revealed no problematic multicollinearity among predictors, with all VIF values below 2. Leverage statistics and Cook’s distance measures identified no remaining influential observations after the exclusion of specified cases. The Box-Cox procedure confirmed that no transformation of the response variable would substantially improve model fit beyond the linear specification.
4. Discussion
4.1 Interpretation of Key Findings
The negative association between age and sleep duration aligns with extensive literature documenting changes in sleep patterns across the lifespan. The observed sex difference, with women sleeping slightly longer than men, is consistent with epidemiological studies that frequently report similar gender disparities in sleep duration. The relationship between household composition and sleep duration may reflect the time demands of parenting, though the cross-sectional nature of the data limits causal interpretation.
The interaction between time spent alone and time with family members presents an intriguing finding that merits further investigation. While the effect size is small, this result suggests that the impact of solitude on sleep may depend on an individual’s broader social context—a hypothesis that could be explored in future research with more detailed measures of social interaction quality and timing.
4.2 Methodological Considerations
This analysis highlights several important methodological considerations for time use research. The moderate R² value underscores that sleep duration is influenced by numerous factors beyond those measured in the ATUS, including health status, stress levels, and environmental conditions not captured in the survey. The sensitivity of results to analytical decisions—particularly the choice of starting model and selection criteria—emphasizes the value of transparent reporting practices and consideration of alternative modeling approaches.
The transformation of the hourly earnings variable illustrates how recoding continuous predictors can sometimes improve model interpretability without sacrificing predictive power. Similarly, the exclusion of influential observations demonstrates how targeted case removal can address violations of model assumptions while preserving the majority of the dataset.
4.3 Limitations and Future Directions
Several limitations should be considered when interpreting these results. The cross-sectional design precludes causal inference, and self-reported time use data may be subject to recall biases. The ATUS does not include detailed health information that might mediate observed relationships between demographic factors and sleep duration. Future research could incorporate longitudinal designs, objective sleep measures, and more comprehensive covariate data to address these limitations.
5. Conclusion
This analysis of ATUS data identified several significant predictors of sleep duration among U.S. adults, including demographic characteristics, socioeconomic factors, and time use patterns. The results both confirm established relationships from the sleep literature and suggest new avenues for investigation, particularly regarding the interplay between different types of social time. Beyond substantive findings, the study demonstrates how analytical choices at each stage of the modeling process can influence results, reinforcing the importance of methodological transparency in social science research. Future work could build on these findings by incorporating additional predictors, exploring nonlinear relationships, and applying alternative modeling techniques to better understand the complex determinants of sleep behavior.
References
Erhardt, E. B., Bedrick, E. J., & Schrader, R. M. (2020). \(\textit{Lecture notes for Advanced Data Analysis 2 (ADA2) (Stat 428/528)}\). University of New Mexico.