Adolescent health is shaped by interlocking patterns of sleep, physical activity, and cardiovascular regulation, but whether these passively recorded behaviors and physiological signals reflect distinct latent dimensions or collapse onto fewer underlying factors is an open question. We apply EFA to wearable-derived metrics from the Adolescent Brain Cognitive Development (ABCD) Study (Release 5.1, Year 2 follow-up Fitbit data) to ask whether sleep, activity, and resting heart rate organize into separable health-behavior dimensions in early adolescence.
We load the required libraries and read in the data before working through the five steps of EFA.
# install.packages(c("psych", "GPArotation", "corrplot", "tidyverse", "data.table"))
library(psych) # fa(), fa.parallel(), fa.plot(), describe(), KMO(), cortest.bartlett()
library(GPArotation) # oblimin / promax rotations
library(corrplot) # correlation heatmap
library(data.table) # fast read of ~50K-row weekly files
library(tidyverse) # data wrangling + ggplot2
# ABCD Release 5.1 weekly Fitbit files (novel-technologies)
base <- "/Users/eu/Library/CloudStorage/Box-Box/mooddata_nophi/ABCD/abcd-data-release-5.1/core/novel-technologies"
# NDA exports use literal "NULL" / "" for missing — must be coerced to NA
act <- fread(file.path(base, "nt_y_fitb_act_w.csv"),
na.strings = c("", "NA", "NULL", "null"))
slp <- fread(file.path(base, "nt_y_fitb_slp_w.csv"),
na.strings = c("", "NA", "NULL", "null"))
# Force numeric on the analytic columns (otherwise fread leaves them as character
# whenever it sees a "NULL" literal in the column).
num_act <- c("fit_ss_fitbit_rest_hr",
"fit_ss_wk_avg_sedentary_min",
"fit_ss_wk_avg_farily_at_min", # NDA truncates "_active" to "_at"
"fit_ss_wk_avg_very_active_min",
"fit_ss_wk_avg_sleep_min",
"fit_ss_meet_abcd_rule")
num_slp <- c("fit_ss_sleep_avg_deep_minutes",
"fit_ss_sleep_avg_wake_minutes")
act[, (num_act) := lapply(.SD, as.numeric), .SDcols = num_act]
slp[, (num_slp) := lapply(.SD, as.numeric), .SDcols = num_slp]
# Year-2 follow-up only; activity QC flag = 1; physiological RHR window 40–120 bpm.
# Aggregate within person across valid weeks (mean and SD); require ≥ 3 weeks.
act_q <- act[eventname == "2_year_follow_up_y_arm_1" & fit_ss_meet_abcd_rule == 1]
act_q[, rhr_clean := ifelse(fit_ss_fitbit_rest_hr >= 40 &
fit_ss_fitbit_rest_hr <= 120,
fit_ss_fitbit_rest_hr, NA_real_)]
act_q[, mvpa_min := fit_ss_wk_avg_farily_at_min + fit_ss_wk_avg_very_active_min]
person_act <- act_q[, .(
n_weeks_act = .N,
rhr_mean = mean(rhr_clean, na.rm = TRUE),
rhr_sd = sd(rhr_clean, na.rm = TRUE),
mvpa_mean = mean(mvpa_min, na.rm = TRUE),
mvpa_sd = sd(mvpa_min, na.rm = TRUE),
sed_mean = mean(fit_ss_wk_avg_sedentary_min, na.rm = TRUE),
sleep_mean = mean(fit_ss_wk_avg_sleep_min, na.rm = TRUE),
sleep_sd = sd(fit_ss_wk_avg_sleep_min, na.rm = TRUE)
), by = src_subject_id]
slp_q <- slp[eventname == "2_year_follow_up_y_arm_1"]
person_slp <- slp_q[, .(
n_weeks_slp = .N,
deep_mean = mean(fit_ss_sleep_avg_deep_minutes, na.rm = TRUE),
wake_mean = mean(fit_ss_sleep_avg_wake_minutes, na.rm = TRUE)
), by = src_subject_id]
person <- merge(person_act, person_slp, by = "src_subject_id")
person <- person[n_weeks_act >= 3 & n_weeks_slp >= 3]
fa_vars <- c("rhr_mean", "rhr_sd",
"mvpa_mean", "mvpa_sd", "sed_mean",
"sleep_mean", "sleep_sd", "deep_mean", "wake_mean")
dat <- person[complete.cases(person[, ..fa_vars]), ..fa_vars] |> as.data.frame()
dim(dat)
## [1] 4506 9
We construct nine person-level summaries from the Year-2 weekly Fitbit data, organized around three a priori health-behavior domains: cardiac (resting heart rate mean and within-person SD), physical activity (MVPA mean and SD, sedentary minutes), and sleep (total sleep mean and SD, deep sleep minutes, wake-during-sleep minutes). MVPA is the sum of fairly-active and very-active minutes per day. We deliberately excluded the average-METs index because it correlates |r| ≈ .78 with both MVPA and sedentary time in this sample — its inclusion would create near-collinearity among activity indicators and risk distorting the rotated solution. The resulting nine variables span behaviorally and physiologically distinct constructs while preserving enough redundancy within each hypothesized domain to support factor recovery.
psych::describe(dat)
## vars n mean sd median trimmed mad min max range
## rhr_mean 1 4506 72.30 7.86 72.21 72.20 7.98 48.27 109.43 61.17
## rhr_sd 2 4506 1.78 1.11 1.52 1.64 0.86 0.00 12.17 12.17
## mvpa_mean 3 4506 34.69 29.66 26.87 30.32 24.12 0.00 245.01 245.01
## mvpa_sd 4 4506 19.35 15.43 15.64 17.09 11.90 0.00 160.98 160.98
## sed_mean 5 4506 556.22 90.59 548.73 551.47 81.04 246.37 1020.80 774.43
## sleep_mean 6 4506 486.88 62.45 495.85 492.17 52.16 73.33 715.42 642.08
## sleep_sd 7 4506 52.36 39.21 41.59 45.93 25.36 0.00 334.78 334.78
## deep_mean 8 4506 89.39 14.97 89.53 89.59 14.78 24.50 136.14 111.64
## wake_mean 9 4506 57.25 11.11 56.94 57.07 10.79 17.00 102.22 85.22
## skew kurtosis se
## rhr_mean 0.16 -0.02 0.12
## rhr_sd 1.91 7.49 0.02
## mvpa_mean 1.54 3.06 0.44
## mvpa_sd 2.03 7.45 0.23
## sed_mean 0.71 1.53 1.35
## sleep_mean -1.21 3.48 0.93
## sleep_sd 2.25 7.26 0.58
## deep_mean -0.17 0.24 0.22
## wake_mean 0.18 0.38 0.17
The descriptive statistics show that all nine variables have substantial variation across adolescents. Resting heart rate averages ~72 bpm with reasonable spread; daily MVPA averages ~35 minutes with high between-person variability; nightly sleep averages ~487 minutes (~8.1 hours), in line with population norms for early adolescents. The variables span very different scales (minutes, bpm), so the correlation matrix — not the covariance matrix — drives the factor extraction.
The correlation heatmap gives a first visual impression of how the nine indicators relate to one another. We expect block structure consistent with the three a priori domains: tightly intercorrelated activity indicators, a sleep cluster, and a comparatively isolated cardiac block.
R <- cor(dat, use = "pairwise.complete.obs")
corrplot(R, order = "hclust", tl.cex = 0.8, tl.col = "black")
The hierarchical clustering recovers three visible blocks. Sleep variables (sleep mean, deep, wake; sleep SD with reversed sign) cluster together; activity variables (MVPA mean/SD, sedentary with reversed sign) form a tight second block; and resting heart rate sits between, with a moderate fitness link to activity (r ≈ −.34 with MVPA mean) but otherwise weak associations. This pattern previews a three-factor structure.
Before extracting factors, we confirm factorability with the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy and Bartlett’s test of sphericity. KMO values above .60 are considered adequate; a significant Bartlett test rejects the null that the correlation matrix is an identity, indicating that the variables share enough common variance to warrant FA.
KMO(dat)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = dat)
## Overall MSA = 0.58
## MSA for each item =
## rhr_mean rhr_sd mvpa_mean mvpa_sd sed_mean sleep_mean sleep_sd
## 0.68 0.56 0.56 0.57 0.69 0.56 0.50
## deep_mean wake_mean
## 0.70 0.58
cortest.bartlett(R, n = nrow(dat))
## $chisq
## [1] 8483.07
##
## $p.value
## [1] 0
##
## $df
## [1] 36
We extract factors using principal axis factoring
(fm = "pa"), pulling five factors to obtain a full
eigenvalue sequence for use in Step 3. With nine variables, five factors
is near the practical upper bound for stable PAF extraction.
efa_all <- fa(r = dat, nfactors = 5, rotate = "none", fm = "pa")
# Eigenvalues from the extraction
round(efa_all$values, 3)
## [1] 1.924 1.495 0.507 0.340 0.153 0.025 0.010 -0.007 -0.030
The first two or three eigenvalues are substantially larger than the rest, and the values then drop sharply. This pattern is consistent with a small number of common factors organizing the nine indicators, in line with the three-domain hypothesis.
We use the Kaiser rule (eigenvalues > 1), the scree test, and parallel analysis as the primary criteria for deciding how many factors to retain, guided by the substantive expectation of three behavior domains.
# Kaiser rule: number of eigenvalues > 1
sum(efa_all$values > 1)
## [1] 2
# Scree plot
scree(dat, factors = TRUE, pc = FALSE,
main = "Scree Plot — ABCD Fitbit Indicators")
# Parallel analysis (compares observed eigenvalues to those from random data)
fa.parallel(dat, fa = "fa", fm = "pa", n.iter = 50,
main = "Parallel Analysis — ABCD Fitbit Indicators")
## Parallel analysis suggests that the number of factors = 3 and the number of components = NA
The scree elbow and parallel analysis converge on a small number of factors. To choose among the candidates, we fit and compare 1-, 2-, and 3-factor solutions on the standard EFA fit indices.
efa_1f <- fa(dat, nfactors = 1, rotate = "oblimin", fm = "pa")
efa_2f <- fa(dat, nfactors = 2, rotate = "oblimin", fm = "pa")
efa_3f <- fa(dat, nfactors = 3, rotate = "oblimin", fm = "pa")
data.frame(
nfactors = 1:3,
TLI = c(efa_1f$TLI, efa_2f$TLI, efa_3f$TLI),
RMSEA = c(efa_1f$RMSEA[1], efa_2f$RMSEA[1], efa_3f$RMSEA[1]),
BIC = c(efa_1f$BIC, efa_2f$BIC, efa_3f$BIC)
)
## nfactors TLI RMSEA BIC
## 1 1 0.4990232 0.16150364 2973.2121
## 2 2 0.8460116 0.08953324 545.4531
## 3 3 0.8684958 0.08273278 281.1520
The Kaiser rule (eigenvalues > 1) returns two factors, but the fit indices favor three: TLI rises from .85 to .87, RMSEA falls from .090 to .083, and BIC drops by more than 250 points moving from two to three factors. The three-factor solution also aligns with the a priori cardiac / activity / sleep structure motivated by the substantive question, which is the kind of theory-driven consideration EFA explicitly invites. We proceed with three factors, noting that a more parsimonious two-factor reading is also defensible if interpretability rather than fit is the priority.
We first inspect the unrotated three-factor solution. The factor space plot below shows each indicator as a labeled point positioned according to its loadings on the first two factors. In the unrotated solution, the first factor absorbs a general “active and well-rested” signal that mixes activity and sleep contributions, making it difficult to read off domain-specific loadings. Rotation is needed to spread the indicators toward simple structure.
efa_unrotated <- fa(dat, nfactors = 3, rotate = "none", fm = "pa")
print(efa_unrotated, cut = 0.30, sort = TRUE, digits = 2)
## Factor Analysis using method = pa
## Call: fa(r = dat, nfactors = 3, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## item PA1 PA2 PA3 h2 u2 com
## mvpa_mean 3 1.00 1.052 -0.052 1.1
## mvpa_sd 4 0.71 0.541 0.459 1.1
## sed_mean 5 -0.33 -0.31 0.204 0.796 2.0
## sleep_mean 6 0.91 0.908 0.092 1.2
## wake_mean 9 0.49 0.254 0.746 1.1
## deep_mean 8 0.36 0.162 0.838 1.5
## sleep_sd 7 -0.36 0.136 0.864 1.2
## rhr_mean 1 -0.47 0.63 0.623 0.377 1.8
## rhr_sd 2 0.047 0.953 1.6
##
## PA1 PA2 PA3
## SS loadings 1.94 1.44 0.54
## Proportion Var 0.22 0.16 0.06
## Cumulative Var 0.22 0.38 0.44
## Proportion Explained 0.49 0.37 0.14
## Cumulative Proportion 0.49 0.86 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 1.88 with Chi Square = 8483.07
## df of the model are 12 and the objective function was 0.08
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.05
##
## The harmonic n.obs is 4506 with the empirical chi square 307.42 with prob < 1.3e-58
## The total n.obs was 4506 with Likelihood Chi Square = 382.11 with prob < 2.3e-74
##
## Tucker Lewis Index of factoring reliability = 0.868
## RMSEA index = 0.083 and the 90 % confidence intervals are 0.076 0.09
## BIC = 281.15
## Fit based upon off diagonal values = 0.98
# Factor space plot — unrotated solution (F1 vs F2)
fa.plot(efa_unrotated,
labels = rownames(efa_unrotated$loadings),
main = "Factor space — unrotated (F1 vs F2)",
cex = 0.8)
Because health-behavior domains are unlikely to be perfectly
independent — for instance, more active adolescents tend to have lower
resting heart rates — we use an oblique rotation (oblimin)
that allows the factors to correlate. For comparison, we also examine
the orthogonal (varimax) solution, which forces the factors to be
uncorrelated. The factor space plots below show how each rotation
repositions the axes relative to the indicator cloud — the oblique axes
are free to angle toward the natural clusters in the data, while the
orthogonal axes are constrained to remain perpendicular.
efa_final <- fa(dat, nfactors = 3, rotate = "oblimin", fm = "pa")
print(efa_final, cut = 0.30, sort = TRUE, digits = 2)
## Factor Analysis using method = pa
## Call: fa(r = dat, nfactors = 3, rotate = "oblimin", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## item PA1 PA2 PA3 h2 u2 com
## mvpa_mean 3 1.02 1.052 -0.052 1.0
## mvpa_sd 4 0.74 0.541 0.459 1.0
## sed_mean 5 -0.37 0.204 0.796 1.9
## sleep_mean 6 0.95 0.908 0.092 1.0
## wake_mean 9 0.48 0.254 0.746 1.3
## deep_mean 8 0.37 0.162 0.838 1.3
## sleep_sd 7 -0.37 0.136 0.864 1.1
## rhr_mean 1 0.79 0.623 0.377 1.0
## rhr_sd 2 0.047 0.953 1.1
##
## PA1 PA2 PA3
## SS loadings 1.74 1.46 0.73
## Proportion Var 0.19 0.16 0.08
## Cumulative Var 0.19 0.36 0.44
## Proportion Explained 0.44 0.37 0.19
## Cumulative Proportion 0.44 0.81 1.00
##
## With factor correlations of
## PA1 PA2 PA3
## PA1 1.00 -0.12 -0.41
## PA2 -0.12 1.00 0.10
## PA3 -0.41 0.10 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 1.88 with Chi Square = 8483.07
## df of the model are 12 and the objective function was 0.08
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.05
##
## The harmonic n.obs is 4506 with the empirical chi square 307.42 with prob < 1.3e-58
## The total n.obs was 4506 with Likelihood Chi Square = 382.11 with prob < 2.3e-74
##
## Tucker Lewis Index of factoring reliability = 0.868
## RMSEA index = 0.083 and the 90 % confidence intervals are 0.076 0.09
## BIC = 281.15
## Fit based upon off diagonal values = 0.98
# Factor correlation matrix (Phi) — non-trivial correlations support oblique rotation
round(efa_final$Phi, 2)
## PA1 PA2 PA3
## PA1 1.00 -0.12 -0.41
## PA2 -0.12 1.00 0.10
## PA3 -0.41 0.10 1.00
efa_varimax <- fa(dat, nfactors = 3, rotate = "varimax", fm = "pa")
print(efa_varimax, cut = 0.30, sort = TRUE, digits = 2)
## Factor Analysis using method = pa
## Call: fa(r = dat, nfactors = 3, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## item PA1 PA2 PA3 h2 u2 com
## mvpa_mean 3 1.01 1.052 -0.052 1.1
## mvpa_sd 4 0.72 0.541 0.459 1.1
## sed_mean 5 -0.37 0.204 0.796 1.9
## sleep_mean 6 0.95 0.908 0.092 1.0
## wake_mean 9 0.47 0.254 0.746 1.4
## deep_mean 8 0.38 0.162 0.838 1.3
## sleep_sd 7 -0.36 0.136 0.864 1.1
## rhr_mean 1 0.76 0.623 0.377 1.2
## rhr_sd 2 0.047 0.953 1.1
##
## PA1 PA2 PA3
## SS loadings 1.74 1.46 0.72
## Proportion Var 0.19 0.16 0.08
## Cumulative Var 0.19 0.36 0.44
## Proportion Explained 0.44 0.37 0.18
## Cumulative Proportion 0.44 0.82 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 36 with the objective function = 1.88 with Chi Square = 8483.07
## df of the model are 12 and the objective function was 0.08
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.05
##
## The harmonic n.obs is 4506 with the empirical chi square 307.42 with prob < 1.3e-58
## The total n.obs was 4506 with Likelihood Chi Square = 382.11 with prob < 2.3e-74
##
## Tucker Lewis Index of factoring reliability = 0.868
## RMSEA index = 0.083 and the 90 % confidence intervals are 0.076 0.09
## BIC = 281.15
## Fit based upon off diagonal values = 0.98
# Factor space plot — oblimin (oblique) rotation (F1 vs F2)
fa.plot(efa_final,
labels = rownames(efa_final$loadings),
main = "Factor space — oblimin (oblique)",
cex = 0.8)
# Factor space plot — varimax (orthogonal) rotation (F1 vs F2)
fa.plot(efa_varimax,
labels = rownames(efa_varimax$loadings),
main = "Factor space — varimax (orthogonal)",
cex = 0.8)
To determine which rotation better suits these data, we look for simple structure in the factor space plots — indicators should cluster tightly near the factor axes rather than floating at intermediate angles. In the oblimin plot, the axes are free to angle toward the natural clusters, so activity indicators (MVPA mean/SD, sedentary) align with one axis while sleep indicators (sleep mean/SD, deep, wake) align with another. In the varimax plot, the axes are constrained to remain perpendicular, which can pull the factors away from the natural clusters when those clusters are not orthogonal in factor space. The factor correlation matrix above provides a complementary diagnostic: non-trivial correlations among factors confirm that the health-behavior dimensions covary in everyday life, supporting the oblique rotation. The oblimin solution is therefore preferred. Its rotated loadings reveal three interpretable groupings: an activity factor (positive loadings on MVPA mean and SD, negative loading on sedentary minutes), a sleep factor (positive loadings on sleep mean, deep, and wake minutes; negative on sleep SD, indicating that adolescents who sleep more also sleep more consistently), and a cardiac factor anchored by mean resting heart rate. The cardiorespiratory-fitness link between cardiac and activity is captured not as a cross-loading but as a substantial inter-factor correlation in the Phi matrix (Phi ≈ −0.41), which is exactly what an oblique rotation lets us see: lower resting heart rate co-varies with higher activity at the level of the latent dimensions.
Two diagnostics deserve a note. First, mvpa_mean has a
rotated loading slightly above 1 — an “ultra-Heywood” warning from
fa(). This reflects the within-domain redundancy flagged in
Step 1: MVPA mean and SD correlate r ≈ .76, so most of MVPA mean’s
variance is shared with its SD on the same factor, pushing the rotated
loading past the boundary. The substantive interpretation is unchanged
but the loading magnitude on mvpa_mean should not be over-interpreted.
Second, RHR within-person SD does not load above 0.30 on any factor —
its communality is very low, meaning it carries little common variance
with the other indicators. RHR variability appears to be a largely
independent signal in this Fitbit indicator set.
A tile plot of the final oblimin loading pattern summarizes the simple structure across all nine indicators.
loadings_df <- as.data.frame(unclass(efa_final$loadings)) %>%
rownames_to_column("item") %>%
pivot_longer(-item, names_to = "factor", values_to = "loading")
ggplot(loadings_df, aes(x = factor, y = reorder(item, loading), fill = loading)) +
geom_tile(color = "white") +
geom_text(aes(label = sprintf("%.2f", loading)), size = 3) +
scale_fill_gradient2(low = "#2166AC", mid = "white", high = "#B2182B",
midpoint = 0, limits = c(-1, 1)) +
labs(x = NULL, y = NULL, fill = "Loading",
title = "EFA rotated loadings — ABCD Fitbit (oblimin, 3 factors)") +
theme_minimal(base_size = 11)
Factor scores are estimated for each adolescent, weighting the nine Fitbit indicators by their factor loadings. These scores capture each participant’s standing on the three health-behavior dimensions and can be used as predictors or outcomes in downstream analyses — for example, testing whether the cardiac, activity, and sleep dimensions differentially predict internalizing symptoms or depression risk.
efa_final <- fa(dat, nfactors = 3, rotate = "oblimin", fm = "pa",
scores = "tenBerge")
scores_df <- as.data.frame(efa_final$scores)
head(scores_df, 4)
## PA1 PA2 PA3
## 1 -0.20280149 -0.4702322 0.5894258
## 2 0.00717749 0.7509720 2.4572991
## 3 -0.21331724 -0.3629898 -1.0271197
## 4 -0.64764369 -3.3497752 0.9010783
summary(scores_df)
## PA1 PA2 PA3
## Min. :-1.3098 Min. :-6.3452 Min. :-3.52970
## 1st Qu.:-0.7362 1st Qu.:-0.5131 1st Qu.:-0.68204
## Median :-0.2757 Median : 0.1245 Median :-0.01195
## Mean : 0.0000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.4452 3rd Qu.: 0.6612 3rd Qu.: 0.67142
## Max. : 7.3387 Max. : 3.6930 Max. : 4.70427
The EFA results suggest that the nine Fitbit indicators are organized around three interpretable dimensions that align with the a priori health-behavior domains: a physical-activity dimension (MVPA mean and SD, sedentary minutes), a sleep dimension capturing both quantity and architecture (sleep mean, sleep SD, deep, wake), and a cardiac dimension anchored by mean resting heart rate. The moderate negative correlation between activity and cardiac factors (Phi ≈ −0.41) captures the cardiorespiratory-fitness link at the latent-variable level, and the small positive correlation between sleep and cardiac factors is consistent with the recovery role of sleep. RHR within-person SD remained largely outside the common-factor space, suggesting that day-to-day cardiac variability indexes a different process than the average level. These factor scores provide a compact, theory-grounded summary of passively recorded health behavior in the ABCD cohort that can support subsequent analyses of mental-health outcomes — for example, testing whether the activity, sleep, and cardiac dimensions differentially predict adolescent internalizing symptoms.
Reference
Garavan, H., Bartsch, H., Conway, K., Decastro, A., Goldstein, R. Z., Heeringa, S., … & Zahs, D. (2018). Recruiting the ABCD sample: Design considerations and procedures. Developmental Cognitive Neuroscience, 32, 16–22.