Introduction to CoDA in Time-Use Studies

Alex Crisp

What are compositional data?


Compositional data represent parts of a whole, where only the ratios between parts matter, not their absolute values.

  • List of positive numbers (e.g., percentages, proportions, counts).

  • Always add up to a fixed total (e.g., 100%, 1, or 24 hours in a day)

Imagine a cocktail composed of gin, tonic water, and lime juice. The total volume is fixed — if you add more gin, you must reduce tonic or lime to keep the drink balanced. What matters is the proportion between ingredients, not their absolute amounts

Key Characteristics


1. Parts of a Whole

  • Example: A cocktail made of gin, tonic water, and lime juice.

  • The total is always fixed = one drink (e.g., 250 mL).

Key Characteristics


2. Ratio not absolute values:

  • Suppose the drink is composed of 40% gin, 50% tonic water, and 10% lime juice.

  • Relative: The proportion of each ingredient determines the drink’s flavor and strength.

  • This is why we focus on proportions or percentages in CoDA, rather than raw amounts.

  • Not absolute: Milliliters (e.g., 100 mL of gin) depend on the size of the glass.

Key Characteristics


3. Fixed total:

  • Original: 40% gin, 50% tonic water, 10% lime juice.

  • If you remove lime juice (10%), the new proportions adjust:

  • Gin becomes ~44.4%

  • Tonic water becomes ~55.6% (total still = 100%).

Examples of Compositional Structures

Diet Composition

Body Composition

24-h Activities

Why is 24-hour time-use data compositional?

When daily time is split into mutually exclusive categories (e.g., sleep, SED, LPA, MVPA), the resulting data form a composition.

  • Total time is fixed (24 hours = 1,440 minutes)

  • Increasing time in one domain requires decreasing time in another.

  • The parts are not independent.

Why is 24-hour time-use data compositional?


Example: If a person sleeps 8 h (33.3 % of the day), is sedentary for 12 h 30 min (52.1 % of the day) and performs light activity for 3 h (12.5 % of the day), only 30 min (2.1 % of the day) remain for MVPA.

You can’t increase MVPA without reducing something else (e.g., sleep or sedentary time).

Statistical implications


  • Standard regression models assume independent predictors (or predictors vary independent).

  • However, in compositional data, this assumption is violated due to the closure constraint (all parts sum to a constant), which induces dependencies among components.

  • Changing one necessarily affects the others.

Why not just include time variables in a standard linear regression?


  1. Multicollinearity
    • The closure constraint introduces strong negative correlations among components.
    • This inflates standard errors and reduces model interpretability.

Example: If you know someone’s sleep, SED, and LPA time, you can calculate MVPA exactly (MVPA = 24h - sleep - sed - lpa).

MVPA isn’t a “free variable” → Model can’t estimate its coefficient properly .

Problems with traditional approaches


  1. Misleading interpretation
    • Coefficients reflect changes assuming other variables are held constant, which is impossible in a fixed-sum context.

Suppose the model says: “More MVPA improves health” and “Less Sedentary time improves health”.

The model can’t tell if the benefit comes from adding MVPA or removing sedentary time

In fixed-sum data, “increasing” one behavior automatically “decreases” another

Problems with traditional approaches


  1. Risk of spurious inference
    • Ignoring the compositional structure can produce biased or invalid results.

Hypothetical example: “More social media use correlates with higher anxiety.”

  • If someone spends more time on social media, they might reduce sleep or exercise time (which protect mental health).

  • The “harm” of social media could be due to the displacement of healthier activities, not social media itself.

What Should You Do Instead?


- Apply CoDA transformations (e.g. isometric log-ratio, ilr) to remove the constant-sum constraint and generate variables suitable for regression.

  • Log-ratio transformations emphasize relative comparisons (e.g. sleep vs. sedentary) rather than absolute amounts, respecting the geometry of the compositional space and ensuring valid statistical inference.

Naïve model: Health ~ Sleep + Sed + PA

Compositional model: Health ~ log(Sleep/Sed) + log(PA/Sed)

How to Compute ILR Coordinates


Example: Fat = 0.70 | Lean = 0.25 | Bone = 0.05

  1. Compute geometric mean of the “other two” \[ \sqrt{\mathrm{Lean}\times \mathrm{Bone}} = \sqrt{0.25 \times 0.05} = 0.112 \]

  2. Ratio: \[ R_1 = \frac{\mathrm{Fat}}{\sqrt{\mathrm{Lean}\times \mathrm{Bone}}} = \frac{0.70}{0.112} \approx 6.26 \]

  3. Take the log: \[ \mathrm{ilr}_1 = \ln\bigl(R_1\bigr) \approx \ln(6.26) \approx 1.84 \]

  4. Simple ratio \(\mathrm{ilr}_2\)
    \[ R_2 = \frac{\mathrm{Lean}}{\mathrm{Bone}} = \frac{0.25}{0.05} = 5 \]

  5. Take the log \(\mathrm{ilr}_2\)
    \[ \mathrm{ilr}_2 = \ln\bigl(R_2\bigr) = \ln(5) \approx 1.61 \]

  6. Result
    \[ (\mathrm{ilr}_1,\,\mathrm{ilr}_2)\approx (1.84,\,1.61) \]

Isometric Log-ratio transform

Simulated Study Case:


This case is based on a simulated dataset representing pregnant women in the second trimester of gestation.


Our aim is to explore how daily time-use (Sleep, SED, LPA and MVPA) composition is associated with the 1-hour 50-gram oral glucose screening test.

Simulated Study Case:

Sample:
- n = 411 simulated observations

Variables:
- Sleep (min/day)
- Sedentary behavior (min/day)
- Light physical activity – LPA (min/day)
- Moderate-to-vigorous physical activity – MVPA (min/day)
- Glucose screening test (Medical record)
- Age, Pre-pregnancy BMI, Study Site

Data Simulation


The dataset was simulated using normal distributions based on realistic parameters (mean and standard deviation) drawn from a real cohort study involving pregnant women in the second trimester. Negative values were set to zero to maintain plausible durations.



Download the dataset (Google Drive)

Preview of the simulated dataset

Code
knitr::kable(head(data, n = 3), digits = 1)
sleep sed lpa mvpa age bmi race site glucose
485.1 639.4 257.0 0 36 29.6 Black A 154.1
503.3 676.9 336.7 22 27 27.7 Black A 131.7
601.5 601.3 240.2 0 21 30.4 Black A 161.8

Data Simulation

Code
data |> 
  pivot_longer(cols = c(sleep, sed, lpa, mvpa), names_to = "behavior", values_to = "minutes") |> 
  ggplot(aes(x = behavior, y = minutes)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "24-h Time-use behaviors", y = "Minutes per day", x = "") +
  theme_minimal()

Code
data |>
  summarise(
    Sleep_Mean = mean(sleep), Sleep_SD = sd(sleep),
    Sed_Mean   = mean(sed),   Sed_SD   = sd(sed),
    LPA_Mean   = mean(lpa),   LPA_SD   = sd(lpa),
    MVPA_Mean  = mean(mvpa),  MVPA_SD  = sd(mvpa)
  ) |>
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") |>
  separate(Metric, into = c("Behavior", "Statistic"), sep = "_") |>
  pivot_wider(names_from = Statistic, values_from = Value) |>
  knitr::kable(digits = 1, caption = "Mean and SD by behavior")
Mean and SD by behavior
Behavior Mean SD
Sleep 516.5 52.9
Sed 614.3 97.6
LPA 267.1 93.7
MVPA 27.3 16.3

Glucose Response

Code
ggplot(data, aes(x = glucose)) +
  geom_histogram(fill = "skyblue", color = "white", bins = 30) +
  geom_vline(aes(xintercept = mean(glucose)), color = "red", linetype = "solid", size = 1) +
  geom_vline(aes(xintercept = quantile(glucose, 0.25)), color = "darkgreen", linetype = "dashed") +
  geom_vline(aes(xintercept = quantile(glucose, 0.75)), color = "darkgreen", linetype = "dashed") +
  labs(
       x = "Glucose (mg/dL)", y = "Count") +
  theme_minimal()

Code
data |>
  summarise(
    Glucose_Mean = mean(glucose), Glucose_SD = sd(glucose),
    BMI_Mean = mean(bmi), BMI_SD = sd(bmi),
    Age_Mean = mean(age), Age_SD = sd(age)
  ) |>
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") |>
  separate(Metric, into = c("Variable", "Statistic"), sep = "_") |>
  pivot_wider(names_from = Statistic, values_from = Value) |>
  knitr::kable(digits = 1, caption = "")
Variable Mean SD
Glucose 119.6 28.7
BMI 27.6 3.7
Age 30.2 5.1

R Packages


In this case study, we use a set of R packages designed to compositional data analysis and model interpretation.

Code
library(compositions)   # For compositional data analysis
library(zCompositions)  # For handling zero-replacement in compositional data
library(car)            # For Type II ANOVA and model diagnostics
library(performance)    # For model performance checks
library(tidyverse)      # For data manipulation and visualization
library(ggtern)         # For ternary plots (exploratory)
library(parameters)     # Extract model parameters

📚 Key references on compositions package:

CRAN vignette

Package overview

Compositions manual

First Steps with Compositional Data

Normalization

  • Ensure that, for each participant, the sum of all time-use components equals exactly 24 hours (1,440 minutes), preserving the compositional structure of the data.

Code
# Select the compositional variables
CODA.data <- data |> dplyr::select(sleep, sed, lpa, mvpa)

row_sums <- rowSums(CODA.data)
summary(row_sums)  # Ensure row sums are close to 1440
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1005    1334    1428    1425    1528    1916 
Code
# Normalize the data to ensure each row sums to 1440
CODA.data <- CODA.data / row_sums * 1440

summary(rowSums(CODA.data))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1440    1440    1440    1440    1440    1440 

Checking for zeros values

  • Log transformations are undefined for zero

  • Check for zeros before applying log-ratio methods

Code
summary(CODA.data) # Summarize the compositional data
     sleep            sed             lpa              mvpa      
 Min.   :351.0   Min.   :396.7   Min.   : 40.14   Min.   : 0.00  
 1st Qu.:485.0   1st Qu.:575.2   1st Qu.:213.07   1st Qu.:15.21  
 Median :521.7   Median :622.9   Median :269.81   Median :27.81  
 Mean   :525.7   Mean   :620.3   Mean   :266.24   Mean   :27.83  
 3rd Qu.:568.8   3rd Qu.:669.1   3rd Qu.:318.95   3rd Qu.:39.15  
 Max.   :781.0   Max.   :799.0   Max.   :458.21   Max.   :85.83  

Imputing zeros

-Replaces zeros using a Bayesian-multiplicative method for compositional data.

Code
comp_data <- zCompositions::cmultRepl(CODA.data, output = "p-counts") # Replace zeros with a Bayesian-multiplicative approach
No. adjusted imputations:  29 
Code
summary(comp_data)  # Summarize the modified data
     sleep            sed             lpa              mvpa         
 Min.   :351.0   Min.   :396.7   Min.   : 40.14   Min.   : 0.02048  
 1st Qu.:485.0   1st Qu.:575.2   1st Qu.:213.07   1st Qu.:15.20846  
 Median :521.7   Median :622.9   Median :269.81   Median :27.81215  
 Mean   :525.7   Mean   :620.3   Mean   :266.24   Mean   :27.82837  
 3rd Qu.:568.8   3rd Qu.:669.1   3rd Qu.:318.95   3rd Qu.:39.14651  
 Max.   :781.0   Max.   :799.0   Max.   :458.21   Max.   :85.83144  

Create a compositional object

  • Required for applying log-ratio transformations
  • Changes the class to enable CoDA-specific methods
Code
head(comp_data, 3)
     sleep      sed      lpa        mvpa
1 505.6735 666.4289 267.8976  0.02047755
2 470.9306 633.4337 315.0426 20.59319744
3 600.2396 600.0918 239.6686  0.02047755
Code
comp <- compositions::acomp(comp_data)
round(head(comp, 3),5)
     sleep     sed       lpa       mvpa     
[1,] "0.35116" "0.46279" "0.18604" "0.00001"
[2,] "0.32704" "0.43988" "0.21878" "0.01430"
[3,] "0.41683" "0.41672" "0.16643" "0.00001"
attr(,"class")
[1] "acomp"

Compositional mean

  • Calculates the compositional mean in proportions - Converted to minutes per day (1440 total) for interpretability
Code
comp_mean <- mean(comp)
comp_mean # (in proportions)
       sleep          sed          lpa         mvpa 
"0.37152067" "0.43824898" "0.17983075" "0.01039961" 
attr(,"class")
[1] "acomp"
Code
# Adjust the mean to show minutes in a 24-hour day (1440 minutes)
round(compositions::clo(comp_mean, total = 1440), digits = 1)
sleep   sed   lpa  mvpa 
535.0 631.1 259.0  15.0 

Compositional Mean

In compositional data analysis, the mean composition is calculated using the geometric mean across observations, preserving the relative structure of the data.

First, calculate the geometric mean for each part:

\[ \overline{p}_i = \left( \prod_{j=1}^n p_{ij} \right)^{1/n} \]

where:
- ( i ) = compositional mean proportion for part (i),
- ( p
{ij} ) = observed proportion for part (i) in observation (j),
- ( n ) = number of observations.

Then, rescale to the total time:

\[ \overline{m}_i = \overline{p}_i \times T \]

where ( T ) is the total time (e.g., 1,440 minutes).

Example with 2 observations:

  • Observation 1:
    8h sleep (32%), 10h sedentary (41.67%), 4h LPA (16.67%), 2h MVPA (8.33%).

  • Observation 2:
    7h sleep (29.17%), 12h sedentary (50%), 3h LPA (12.5%), 2h MVPA (8.33%).

Calculate the geometric mean for each part

  • Sleep:

\[ \sqrt{32\% \times 29.17\%} = \sqrt{0.32 \times 0.2917} = \sqrt{0.093344} \approx 0.3055 = 30.55\% \]

  • Sedentary:

\[ \sqrt{41.67\% \times 50\%} = \sqrt{0.4167 \times 0.5} = \sqrt{0.20835} \approx 0.4565 = 45.65\% \]

  • LPA:

\[ \sqrt{16.67\% \times 12.5\%} = \sqrt{0.1667 \times 0.125} = \sqrt{0.0208375} \approx 0.1444 = 14.44\% \]

  • MVPA:

\[ \sqrt{8.33\% \times 8.33\%} = \sqrt{0.0833 \times 0.0833} = \sqrt{0.006944} \approx 0.0833 = 8.33\% \]

Step 2: Rescale to 1,440 minutes

  • Sleep: ( 30.55% * 1440 = 439.9 minutes)
  • Sedentary: ( 45.65% * 1440 = 657.4 minutes)
  • LPA: ( 14.44% * 1440 = 208.0 minutes)
  • MVPA: ( 8.33% * 1440 = 120.0 minutes)

Why Use the Geometric Mean?

  • A simple arithmetic mean (e.g., averaging minutes directly) ignores the relative nature of compositions. For example, if one person has 1000 min Sed and another has 200 min, the arithmetic mean (600 min) doesn’t respect the 1440-min constraint.

  • The geometric mean in log-space preserves ratios (e.g., Sleep/Sed), which is key for CoDA, ensuring the mean composition is valid (sums to 1 or 1440).

  • This respect for relative structure is what distinguishes CoDA from traditional descriptive statistics.

Analogy: Budgeting for Two Friends

Two friends have different monthly budgets (total = $3,000).

Alice’s budget :   
    Rent: $1,500 (50%)  
    Groceries: $900 (30%)  
    Fun: $600 (20%)
     
Bob’s budget :   
    Rent: $900 (30%)  
    Groceries: $900 (30%)  
    Fun: $1,200 (40%)


Problem with Arithmetic Mean

Step 1: Calculate the arithmetic mean of absolute dollars :

  • Mean Rent: $(1,500 + $900)/2 = $1,200

  • Mean Groceries: $(900 + $900)/2 = $900

  • Mean Fun: $(600 + $1,200)/2 = $900

Step 2: Sum the means:
$1,200 (Rent) + $900 (Groceries) + $900 (Fun) = $3,000Looks okay?

But the ratios are distorted!

  • Alice’s Rent/Groceries ratio : 50%/30% = 1.67:1

  • Bob’s Rent/Groceries ratio : 30%/30% = 1:1

  • Mean’s Rent/Groceries ratio : 1,200/900 = 1.33:1

The arithmetic mean doesn’t preserve the relative trade-offs between rent and groceries. It averages the absolute values but ignores how categories depend on each other.

The geometric mean finds a central ratio that respects the compositional structure.

Variation matrix

  • A log-ratio variance matrix shows how pairs of components vary together in a composition
  • Lower values = stronger proportional association
  • Helps assess internal structure of the composition

Code
round(compositions::variation(comp), digits = 3)
      sleep   sed   lpa  mvpa
sleep 0.000 0.041 0.172 3.982
sed   0.041 0.000 0.185 3.987
lpa   0.172 0.185 0.000 4.256
mvpa  3.982 3.987 4.256 0.000


\[ \mathrm{Var}\bigl(\ln\tfrac{C_j}{C_i}\bigr) \]

Example: If “sleep” and “sed” have low variance, people’s sleep/sedentary ratios don’t change much.

Ternary plot of the composition

Code
# Prepare compositional data with four components
comp_data <- data.frame(
  MVPA = comp[,"mvpa"],
  LPA = comp[,"lpa"],
  SED = comp[,"sed"],
  Sleep = comp[,"sleep"]  
)

# Calculate the mean composition
mean_comp <- colMeans(comp_data)

subtitle_text <- sprintf(
  "Red point = mean composition (Sleep: %.2f, SED: %.2f, LPA: %.2f)",
  mean_comp["Sleep"],
  mean_comp["SED"],
  mean_comp["LPA"]
)

# Plot with ggtern
ggtern::ggtern(data = comp_data, aes(x = Sleep, y = SED, z = LPA, color = MVPA)) + 
  geom_point(size = 2, alpha = 0.8) +  
  annotate("point",
           x = mean_comp["Sleep"],
           y = mean_comp["SED"],
           z = mean_comp["LPA"],
           color = "red", size = 3, shape = 18) +  
  theme_rgbw() +
  theme(
    text = element_text(size = 11),
    tern.axis.title = element_text(size = 13, face = "bold"),
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 14)
  ) +
  labs(
    x = "Sleep",
    y = "SED",
    z = "LPA",
    color = "MVPA (4th Component)",
    subtitle = subtitle_text
  ) +
  scale_color_gradient(low = "lightblue", high = "darkblue") +
  guides(color = guide_colorbar(barwidth = 0.8, barheight = 8))

Covariates

Code
glucose <- data$glucose

covariates <- data.frame(
  age = data$age,
  bmi = data$bmi,
  race = factor(data$race),
  local = factor(data$site))

summarytools::dfSummary(covariates)
Data Frame Summary  
covariates  
Dimensions: 411 x 4  
Duplicates: 0  

--------------------------------------------------------------------------------------------------------
No   Variable    Stats / Values           Freqs (% of Valid)    Graph               Valid      Missing  
---- ----------- ------------------------ --------------------- ------------------- ---------- ---------
1    age         Mean (sd) : 30.2 (5.1)   27 distinct values            :           411        0        
     [numeric]   min < med < max:                                     : :           (100.0%)   (0.0%)   
                 18 < 30 < 44                                         : :   :                           
                 IQR (CV) : 7 (0.2)                                 : : : : : .                         
                                                                . : : : : : : : .                       

2    bmi         Mean (sd) : 27.6 (3.7)   409 distinct values         . : .         411        0        
     [numeric]   min < med < max:                                   . : : :         (100.0%)   (0.0%)   
                 18.5 < 27.7 < 40.1                                 : : : :                             
                 IQR (CV) : 4.7 (0.1)                               : : : : :                           
                                                                : : : : : : : :                         

3    race        1. White                 247 (60.1%)           IIIIIIIIIIII        411        0        
     [factor]    2. Black                  80 (19.5%)           III                 (100.0%)   (0.0%)   
                 3. Hispanic               52 (12.7%)           II                                      
                 4. Asian                  19 ( 4.6%)                                                   
                 5. Other                  13 ( 3.2%)                                                   

4    local       1. A                     164 (39.9%)           IIIIIII             411        0        
     [factor]    2. B                     145 (35.3%)           IIIIIII             (100.0%)   (0.0%)   
                 3. C                     102 (24.8%)           IIII                                    
--------------------------------------------------------------------------------------------------------

ILR coordinates

Code
# Calculate ILR coordinates for all parts simultaneously
ilr_comp <- compositions::ilr(comp)

head(ilr_comp)
              [,1]       [,2]      [,3]
[1,]  0.1951913153 -0.6314031 -8.655551
[2,]  0.2096179915 -0.4492569 -2.679971
[3,] -0.0001741719 -0.7495021 -8.642630
[4,]  0.0688752324 -0.7775520 -2.584114
[5,] -0.2726525586 -0.5099092 -2.565553
[6,] -0.0067887848 -1.5102301 -1.869668
attr(,"class")
[1] "rmult"


Our data shows 411 people splitting their 24-hour day into Sleep, SED, LPA, and MVPA. These proportions sum to 100%, so they’re interdependent—more Sleep means less SED.

ILR coordinates transform these proportions into numbers that:

  • Keep the ratios (e.g., Sleep/SED).

  • Remove the 100% constraint.

  • Work in standard models.

Fit the regression model with ILR coordinates


Code
# Create a data frame for regression, including the dependent variable (glucose)

model_full <- lm(glucose ~ ilr_comp + age + bmi + race + local, data = covariates)

# Analyze the model
car::Anova(model_full, type = "II")
Anova Table (Type II tests)

Response: glucose
          Sum Sq  Df  F value Pr(>F)    
ilr_comp  193616   3 184.0695 <2e-16 ***
age          826   1   2.3551 0.1257    
bmi          366   1   1.0442 0.3075    
race         345   4   0.2463 0.9119    
local        760   2   1.0838 0.3393    
Residuals 139898 399                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Extract model parameters
parameters::model_parameters(model_full)
Parameter       | Coefficient |   SE |           95% CI | t(399) |      p
-------------------------------------------------------------------------
(Intercept)     |       67.56 | 9.38 | [ 49.11,  86.00] |   7.20 | < .001
ilr comp1       |       91.01 | 6.50 | [ 78.23, 103.79] |  14.00 | < .001
ilr comp2       |      -25.93 | 2.84 | [-31.51, -20.35] |  -9.14 | < .001
ilr comp3       |       -9.09 | 0.54 | [-10.16,  -8.03] | -16.85 | < .001
age             |       -0.28 | 0.18 | [ -0.64,   0.08] |  -1.53 | 0.126 
bmi             |        0.26 | 0.26 | [ -0.24,   0.77] |   1.02 | 0.307 
race [Black]    |        0.75 | 2.44 | [ -4.05,   5.56] |   0.31 | 0.758 
race [Hispanic] |       -1.49 | 2.88 | [ -7.16,   4.18] |  -0.52 | 0.605 
race [Asian]    |        1.51 | 4.53 | [ -7.39,  10.41] |   0.33 | 0.738 
race [Other]    |       -3.39 | 5.37 | [-13.95,   7.17] |  -0.63 | 0.528 
local [B]       |       -2.91 | 2.16 | [ -7.17,   1.34] |  -1.35 | 0.179 
local [C]       |       -0.04 | 2.38 | [ -4.73,   4.65] |  -0.02 | 0.987 


Results: The way people split their day (ILR coordinates) strongly predicts glucose levels, unlike age, BMI, or race, which aren’t significant here. Time-use patterns are key!

Limitation: We can’t isolate each activity’s effect (e.g., Sleep alone). For that, we’ll explore methods like isotemporal substitution.

Equation:

Code
equatiomatic::extract_eq(model_full, use_coefs = TRUE)

\[ \operatorname{\widehat{glucose}} = 67.56 + 91.01(\operatorname{ilr\_comp}_{\operatorname{1}}) - 25.93(\operatorname{ilr\_comp}_{\operatorname{2}}) - 9.09(\operatorname{ilr\_comp}_{\operatorname{3}}) - 0.28(\operatorname{age}) + 0.26(\operatorname{bmi}) + 0.75(\operatorname{race}_{\operatorname{Black}}) - 1.49(\operatorname{race}_{\operatorname{Hispanic}}) + 1.51(\operatorname{race}_{\operatorname{Asian}}) - 3.39(\operatorname{race}_{\operatorname{Other}}) - 2.91(\operatorname{local}_{\operatorname{B}}) - 0.04(\operatorname{local}_{\operatorname{C}}) \]

Checking model assumptions

Code
performance::check_model(model_full)

CoDA isotemporal substitution


As an example, let’s theoretically exchange 30 minutes of SED for MVPA, keeping the total time (1440 minutes) fixed, to predict changes in glucose.
Steps:

  • Compute ILR for the mean sample composition (baseline).
  • Create a new composition (-30 min SED, +30 min MVPA).
  • Compute ILR for the new composition.

CoDA isotemporal substitution


Prepare the data and run a new model to make future prediction steps easier

  • ILR coordinates: Replace raw time-use data to remove the “total constraint.”
  • Covariates : Adjust for age, BMI, race, and site (but focus on composition effects ).
Code
# Load outcome variables and covariates from the dataset
glucose <- data$glucose   # Outcome variable: glucose from screening test
bmi <- data$bmi           # Covariate: body mass index
race <- factor(data$race) # Covariate: race category
local <- factor(data$site) # Covariate: study site

# Extract isometric log-ratio (ILR) coordinates from the compositional data
ilr_data <- as.data.frame(ilr_comp)
colnames(ilr_data) <- c("ilr1", "ilr2", "ilr3")

# Create dataframe for the regression model
# Combining outcome, ILR coordinates, and covariates
model_df <- data.frame(
  glucose = glucose,
  ilr1 = ilr_data[,1],
  ilr2 = ilr_data[,2],
  ilr3 = ilr_data[,3],
  bmi = bmi,
  race = race,
  local = local
)

# Fit the regression model with all covariates
model <- lm(glucose ~ ilr1 + ilr2 + ilr3 + bmi + race + local, data = model_df)

CoDA isotemporal substitution


Obtain the compositional mean of the sample and transform it into an isometric log-ratio

Code
# Get the compositional mean (reference composition)
comp_mean <- mean(comp)
comp_mean_min <- clo(comp_mean, total = 1440) # Scale to minutes per day (24h = 1440 min)

# Convert to proper format (named numeric vector in minutes)
comp_ref <- as.numeric(comp_mean_min)
names(comp_ref) <- c("sleep", "sed", "lpa", "mvpa")

# Display mean composition in minutes
print(round(comp_ref, 1))
sleep   sed   lpa  mvpa 
535.0 631.1 259.0  15.0 
Code
# 1. Calculate ILR coordinates for mean composition
ref_comp_acomp <- acomp(matrix(comp_ref, nrow=1, byrow=TRUE))
ref_ilr <- ilr(ref_comp_acomp)
ref_ilr
[1]  0.1168018 -0.6598760 -2.9349900
attr(,"class")
[1] "rmult"

CoDA isotemporal substitution


Prepares the theoretical substitution data and transforms it into isometric log-ratio

  • Isotemporal : Total time remains constant (no extra time added).

  • ILR Coordinates : Necessary to input into the regression model.


Code
# 2. Create new composition (SED -30 min, MVPA +30 min)
new_comp <- comp_ref
new_comp["sed"] <- new_comp["sed"] - 30
new_comp["mvpa"] <- new_comp["mvpa"] + 30

# 3. Verify that the sum is still 1440 minutes
sum(new_comp) # Should equal 1440
[1] 1440
Code
# 4. Calculate ILR coordinates of the new composition
new_comp_acomp <- acomp(matrix(new_comp, nrow=1, byrow=TRUE))
new_ilr <- ilr(new_comp_acomp)
new_ilr
[1]  0.08236237 -0.63999237 -1.96855747
attr(,"class")
[1] "rmult"

CoDA isotemporal substitution


Prepare the data for prediction

Code
# Automatically detect the most common categories for categorical variables
race <- names(which.max(table(race)))
local <- names(which.max(table(local)))

# 5. Create dataframes for prediction
# For reference composition
data_ref <- data.frame(
  ilr1 = ref_ilr[1],
  ilr2 = ref_ilr[2],
  ilr3 = ref_ilr[3],
  bmi = mean(bmi),
  race = factor(race, levels = levels(data$race)),  # Most common race category
  local = factor(local, levels = levels(data$site))  # Most common site
)

head(data_ref)
       ilr1      ilr2     ilr3      bmi  race local
1 0.1168018 -0.659876 -2.93499 27.60357 White     A
Code
# For new composition
data_new <- data.frame(
  ilr1 = new_ilr[1],
  ilr2 = new_ilr[2],
  ilr3 = new_ilr[3],
  bmi = mean(bmi),
  race = factor(race, levels = levels(data$race)),
  local = factor(local, levels = levels(data$site))
)

head(data_new)
        ilr1       ilr2      ilr3      bmi  race local
1 0.08236237 -0.6399924 -1.968557 27.60357 White     A

CoDA isotemporal substitution


Make predictions with the model

Code
# 6. Make predictions with confidence intervals
pred_ref <- predict(model, newdata = data_ref, interval = "confidence", level = 0.95)
pred_new <- predict(model, newdata = data_new, interval = "confidence", level = 0.95)

pred_ref
       fit      lwr      upr
1 120.6163 117.3079 123.9247
Code
pred_new
       fit      lwr      upr
1 108.1861 104.6688 111.7035
Code
# 7. Calculate the effect (difference in predicted values)
effect <- pred_new[1, "fit"] - pred_ref[1, "fit"]

effect
[1] -12.43015

CoDA isotemporal substitution (SED -30 min, MVPA +30 min)