Methods

Study Design

We performed a longitudinal meta-analysis to synthesize repeated measurements reported across studies with heterogeneous follow-up durations. The primary aim was to (i) estimate pooled trajectories over time and (ii) quantify change-from-baseline patterns, including post-discontinuation dynamics (weight regain / rebound), while accounting for non-aligned follow-up schedules.

Data extraction

For each study and timepoint we extracted: sample size (n), mean and standard deviation (SD), follow-up timing (weeks from baseline), treatment status at each timepoint (on-treatment vs post-discontinuation), when applicable, study-level descriptors (design, population characteristics), If medians (IQR) were reported, they were converted to mean (SD) using validated approaches.

Handling heterogeneous timepoints

Because follow-up timing varied across studies, we did not force exact alignment into a single “T1/T2” grid when timepoints were non-comparable. Instead, follow-up time (in weeks) was modeled as a continuous moderator in meta-regression, allowing estimation of pooled trends and a more homogeneous comparison across heterogeneous schedules.

Effect Size Definition

Two complementary longitudinal syntheses were performed.

Pooled Means Over Time

At each observed follow-up time, study-specific means were synthesized using random-effects models.

Change From Baseline (Overtime)

For each study \(i\) at follow-up time \(t\), the change score was defined as: \[ \Delta_{it} = \bar{Y}_{it} - \bar{Y}_{i0} \]

where \(\bar{Y}_{it}\) is the mean outcome at time \(t\) and \(\bar{Y}_{i0}\) is the baseline mean. Because most studies did not report the standard deviation of within-subject change, conservative variance handling based on aggregate information was pre-specified.

Sampling Variance

The sampling variance of the mean at each study-timepoint was computed as:

\[ v_{it} = \frac{SD_{it}^2}{n_{it}} \]

Meta-Analytic Model

Random-effects models (DerSimonian–Laird) were used to estimate pooled means at each timepoint. Between-study heterogeneity was quantified through: I² statistic, Tau² estimator

Meta-regression to Harmonize Follow-up

To address heterogeneous follow-up durations, mixed-effects meta-regression models were fitted with follow-up time as a continuous moderator:

\[ y_{it} = \beta_0 + \beta_1 \cdot time_{it} + u_i + \varepsilon_{it} \]

Non-linear temporal patterns were explored using restricted cubic splines:

\[ y_{it} = \beta_0 + f(time_{it}) + u_i + \varepsilon_{it} \]

Discontinuation and Rebound Modeling

When post-discontinuation data were available, a binary indicator was defined:

\[ discont_{it} = \begin{cases} 0, & \text{during active treatment} \\ 1, & \text{after treatment discontinuation} \end{cases} \]

The following interaction model was evaluated: \[ y_{it} = \beta_0 + \beta_1 \cdot time_{it} + \beta_2 \cdot discont_{it} + \beta_3 (time_{it} \times discont_{it}) + u_i + \varepsilon_{it} \]

Dependence Structure

Because multiple timepoints per study violate independence assumptions, analyses accounted for within-study dependence through random-effects structures at the study level.

Software

All analyses were conducted in R (version 4.3) using the packages metafor, ggplot2, splines, and dplyr.

Statistical Analysis

dados <- tribble(
~study, ~group, ~outcome, ~time_weeks, ~n, ~mean, ~sd, ~post_discontinuation,
"Andersen_2021", "Overall", "sperm_conc", 0, 47, 78.5, 74.9, 0,
"Andersen_2021", "Overall", "sperm_conc", 8, 47, 91.7, 70.8, 0,
"Gregoric_2024", "Overall", "sperm_conc", 0, 30, 79.2, 55.6, 0,
"Gregoric_2024", "Overall", "sperm_conc", 12, 30, 96.7, 62.3, 0
)

dados <- dados %>%
mutate(vi = (sd^2) / n)

kable(dados, caption = "Extracted longitudinal data")
Extracted longitudinal data
study group outcome time_weeks n mean sd post_discontinuation vi
Andersen_2021 Overall sperm_conc 0 47 78.5 74.9 0 119.3619
Andersen_2021 Overall sperm_conc 8 47 91.7 70.8 0 106.6519
Gregoric_2024 Overall sperm_conc 0 30 79.2 55.6 0 103.0453
Gregoric_2024 Overall sperm_conc 12 30 96.7 62.3 0 129.3763
dados_delta <- dados %>%
group_by(study, group, outcome) %>%
mutate(
baseline_mean = mean[time_weeks == 0][1],
delta = mean - baseline_mean
) %>%
ungroup()

res_meta_reg <- rma.uni(
yi = mean,
vi = vi,
mods = ~ time_weeks,
method = "REML",
data = dados
)

summary(res_meta_reg)
## 
## Mixed-Effects Model (k = 4; tau^2 estimator: REML)
## 
##   logLik  deviance       AIC       BIC      AICc   
##  -6.5633   13.1266   19.1266   15.2060   43.1266   
## 
## tau^2 (estimated amount of residual heterogeneity):     0 (SE = 112.3273)
## tau (square root of estimated tau^2 value):             0
## I^2 (residual heterogeneity / unaccounted variability): 0.00%
## H^2 (unaccounted variability / sampling variability):   1.00
## R^2 (amount of heterogeneity accounted for):            0.00%
## 
## Test for Residual Heterogeneity:
## QE(df = 2) = 0.0074, p-val = 0.9963
## 
## Test of Moderators (coefficient 2):
## QM(df = 1) = 2.1006, p-val = 0.1472
## 
## Model Results:
## 
##             estimate      se     zval    pval    ci.lb    ci.ub      
## intrcpt      78.9776  7.3012  10.8171  <.0001  64.6676  93.2876  *** 
## time_weeks    1.5166  1.0464   1.4494  0.1472  -0.5343   3.5675      
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Software

All analyses and visualizations were performed in R (version 4.3) using the packages metafor, ggplot2, splines, and dplyr.