Reproducibility Report for Poli et al. (2024, Developmental Science)
Introduction
My current line of research focuses on individual differences in early information seeking. To study infant learning strategies, Poli and colleagues (2020; 2024) have developed an analytic pipeline that leverages an infant-friendly visual learning (VL) task from which information-theoretic measures (e.g., predictability and information gain) can be derived for each trial in sequences of visual events.
These measures are then incorporated into a hierarchical Bayesian model fitted to real infants’ looking behavior (look-aways, looking time, and saccadic latency) in the VL task to infer the values of latent parameters that represent specific cognitive functions: processing speed, learning performance, curiosity, and sustained attention (see Figure 1).
For the proposed project, I have four primary aims:
(1) Reproduce the findings using the original data with automatic differentiation variational inference (ADVI).
(2) Simulate infant looking behavior in the visual learning task from the posterior distribution of the original data (e.g., use individual latent parameters from the original data to generate a simulated dataset of infant looking times). Note: this is a posterior predictive simulation.
(3) Fit the original HBM to the simulated dataset to estimate the latent parameters of interest: curiosity (information gain x looking time).
(4) Compare model fit and analyses from simulated data to the original data (Poli et al., 2024) by indexing the number of infants from the original study and the simulated dataset who display a significant coefficient for curiosity.
Working through the described pipeline will help me develop the analytic toolkit and computational understanding needed for my own research program.
Key analysis of interest: Posterior predictive check (PPC) to validate a hierarchical Bayesian model of infant attention during a probabilistic learning task. Specifically, testing whether the model can (1) generate realistic data matching observed distributions of looking time and saccadic latency, and (2) recover known parameters when fitted to simulated data, thereby confirming the model adequately captures infant learning behavior.
Anticipated challenges
Simulating this data and analytic pipeline involves three high-level stages: input generation, hierarchical modeling, and outcome replication, each of which will pose their own computational challenges.
The code to generate the sequences for the visual learning task is publicly available. I will follow Poli et al. (2020 and 2024) to attempt to derive the information-theoretic measures for each trial in the visual learning task. This may be complex and time-consuming to undertake from scratch given the scope of the current project. In the event that this step hinders the completion of the subsequent step, I will reference the original analysis script, found here, and/or simulate just a portion of the looking behavior (e.g., looking time) to simplify the process.
Links
Methods
Reproduction:
Data Preparation (included in original code)
- Load the raw data for the visual learning task.
- Combine the two datasets,
Roris_nostd.csvandRoris_smiley.csv.
- Combine the two datasets,
- Convert to model-friendly format (pytensor variables)..
- Z-score the dependent variables (looking time and saccadic latency)
- Z-score the independent variables (information gain, predictability, and surprise)
- Handle missing values.
Verify the following columns are present in the combined dataset:
subj: subject IDnseq: sequence numberntrialseq: trial number within sequencedwell: looking time to sequenceslat: saccadic latencyevent: look-away (binary, either 0 or 1)D: KL divergence or information gain (pre-computed)H: entropy or predictability (pre-computed)I: surprise (pre-computed)
Analysis Pipeline
Fit the hierarchical Bayesian model (HBM) using variational inference
- Specify relationships of interest (i.e., between looking behavior and latent parameters).
- Fit the model using ADVI.
- Derive individual estimates for:
- Learning Performance: β₁^SL (correlation between saccadic latency and predictability)
- Curiosity: β₁^LT (correlation between looking time and information gain)
Model validation
Check convergence.
R^ threshold should be < 1.004 if using MCMC. Qualitative check with ADVI.
(If MCMC) Compare model fit between hierarchical vs. simple (group-only) model
Extract individual differences
Once model has ran, compute posterior distributions for each parameter.
The HBM produces a probability distribution for each parameter for each infant (e.g., infant X’s “learning performance” isn’t a single number, but a distribution of plausible values within a CI with 89% confidence). The model represents this as 20,000 samples from the distribution.
posterior=pd.DataFrame() posterior["subjnum"]= markasgood.values.reshape(-1, ) posterior["LT0"]=np.median(trace["LT0"], axis=0) # Processing speed (looking time) posterior["LT1"]=np.median(trace["LT1"], axis=0) # Curiosity posterior["SL0"]=np.median(trace["SL0"], axis=0) # Processing speed (saccadic) posterior["SL1"]=np.median(trace["SL1"], axis=0) # Learning performance posterior["lambda0"]=np.median(trace["lambda0"], axis=0) # Sustained attention posterior["beta_LA"]=np.median(trace["beta_LA"], axis=0) # (not used in main analysis) posterior.to_csv('posterior_median.csv')- Save to an output file in which there is one row per infant; columns = latent parameters.
gen_summary_rep.csvcontains posterior summary statistics for all model parameters including the individual-level parameters of interest, learning performance (SL1) and curiosity (LT1).Identify infants with significant individual effects (89% credible intervals excludes zero).
Compare to original findings.
Simulation and Posterior Predictive Check:
Load posterior means from gen_summary_rep.csv as “true” generating parameters for simulation.
Generate simulated dataset using true parameters with original predictor structure: simulate looking time and saccadic latency from Student’s T distribution (nu=15), look-away events from Poisson. Preserve original missing data structure (2,253 NaN values) and filter invalid negative values.
Standardize simulated predictors (entropy, surprise, KL-divergence) using z-scoring with nan_policy=“omit”, matching original preprocessing.
Re-fit hierarchical Bayesian model to simulated data using identical ADVI procedure.
Compare the recovered parameters to true values using distributional fit via histograms, Q-Q plots, and individual-level significance patterns.
Specifically, I aim to validate the model recovers both population-level effects and individual differences by achieving a comparable distributional fit and parameter estimates with simulated data.
If the model adequately captures infant probabilistic learning mechanisms, it should generate data distributions matching observed patterns and recover known parameters from synthetic data.
Differences from original study
The computing environments are the same. Visualizations may be carried out in R, but the model and any pre-processing steps will be in Python. ADVI will be used instead of MCMC as outlined above.
Project Progress Check 3
Measure of success
The outcome measure will be a successful reproduction of their findings based on the latent parameters from the Bayesian model. I aim to reproduce the following as indexed by 89% credible intervals different from zero (where zero indicates lack of an effect) with both the original and the simulated data:
Learning Performance (saccadic latency x predictability): approximately 57 infants (40%) with a significant coefficient (Poli et al., 2024).
Curiosity (looking time x information gain): approximately 31 infants (22%) with a significant coefficient (Poli et al., 2024).
Pipeline progress
- Modernized the original modeling script:
- Updated pymc3 to pymc, theano to pytensor
- Removed masked arrays since PyMC handles missing values automatically
- Changed from true sampling to optimization for feasibility:
Switched to ADVI
- Original paper and findings use MCMC with 500K tuning + 10K sampling.
- I use ADVI with 30,000 optimization iterations and 50,000 posterior draws.
- This is primarily for computational feasibility.
Added convergence tracking, basically records the mean and SD of ADVI’s approximation at every iteration and allows it to improve its Gaussian approximation to the posterior; the tracking lets me see if it’s converging and allows me to assess model fit.
Commented out WAIC/LOO.
- ADVI produces approximate posterior samples, not exact MCMC samples. The trace from ADVI is generated from a variational approximation (i.e., a fitted normal distribution), not from the actual posterior distribution. ADVI doesn’t automatically compute and store pointwise log-likelihoods needed for WAIC/LOO.
Assessed ADVI convergence. The lines flattened out by iteration 10k, indicating that the model successfully reached an “optimal” approximation of the posteriors:
Figure 2. Convergence plot after fitting model over full dataset.
- Fit the model w/ ADVI sampling:
- I did this several times with data from just 3 subjects for testing purposes, and then with all subjects in the combined dataset (shown above). Fitting using ADVI took approximately 30-40 minutes for each full run.
Ran a basic analysis script (advi_analysis.py) to evaluate the significance of individual subject coefficients from the model based on 89% credible intervals w/out zero.
Preliminary Results w/ ADVI:
Learning Performance (saccadic latency x predictability): approximately 63 infants (43.45%) with a significant coefficient (SL1).
Curiosity (looking time x information gain): approximately 40 infants (27.59%) with a significant coefficient (LT1).
Simulated a new dataset using true parameters as the predictors and evaluate (1) distributional fit (Figures 3 and 4) and (2) individual-level subject significance patterns (just like in Step 4).
Preliminary Results w/ ADVI:
Learning Performance: approximately 65 infants (44.83%) with a significant coefficient (SL1).
Curiosity: approximately 51 infants (35.17%) with a significant coefficient (LT1).
Results
Data preparation
Data preparation following the analysis plan.
Key analysis
Exploratory analyses
Any follow-up analyses desired (not required).
Discussion
Summary of Reproduction Attempt
Open the discussion section with a paragraph summarizing the primary result from the key analysis and assess whether you successfully reproduced it, partially reproduced it, or failed to reproduce it.
Identify one insight your simulation gave you about the strengths or limitations of the original experimental design.
Across over 9K observations (i.e., all individual trials across infants), there were over 2K missing or NaN values (nearly 24.2%!). This initially presented as a limitation, but the simulation revealed that it accurately reflects lapses in infant attention or disengagement that are informative to the cognitive processes of interest. I preserved the missingness structure in the simulation and successfully recovered parameters despite it, demonstrating that the HBM framework can appropriately the inevitable discrepancies in infants’ looking data.
How would this simulation help you design a follow-up experiment with a similar paradigm?
This simulation was highly informative of the hierarchical Bayesian modeling process, which I intend to use to extract individual differences with the same paradigm in a new set of infants. Validating the model with simulated data serves as both a proof of concept and feasibility for my in-person replication.
Commentary
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis of the dataset, (b) assessment of the meaning of the successful or unsuccessful reproducibility attempt - e.g., for a failure to reproduce the original findings, are the differences between original and present analyses ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the reproducibility attempt (if you contacted them). None of these need to be long.