Mental Rotation Analysis

Sample size by site

Mental Rotation: Counts by Site
site	n_trials	n_users	n_runs
pilot_mpieva_de	17635	245	374
pilot_uniandes_co	7931	248	248
pilot_western_ca	6453	129	133

Raw data

(1) Ability - proportion correct

(2) Angle curves

IRT estimates

(3) Ability - thetas

(4) Difficulty by rotation angle (2PL scalar)

(5) Discrimination by rotation angle (2PL scalar)

Reaction time (RT)

(6) RT by age

(7) Accuracy by RT (raw scores) broken down by age group

(8) Accuracy by RT (IRT scores) broken down by age group

(9) Rotation angle by RT

(10) Rotation angle by RT and age

(11) Histogram - Reaction Time

PHASE 1: Diagnostic Checks (Understanding the Data)

Step 1: Document the 3D Selection Problem

Why: Need to establish that excluding 3D is justified, not arbitrary

A. Correlation: 3D exposure × ability

   r = 0.644 (95% CI: [0.599, 0.685])

   p < .001

   → INTERPRETATION: LARGE correlation - 3D strongly predicts ability

B. Mean ability by 3D exposure

   Saw 3D:    M = -0.07 (SD = 0.83, n = 534)

   No 3D:     M = -1.75 (SD = 0.57, n = 173)

   Difference: 1.68 SD (Cohen's d = 2.36)

   → INTERPRETATION: HUGE effect - 3D exposure is not random

C. Selection mechanism (logistic regression)

   Cumulative accuracy: OR = 3.32 per 100% accuracy

   Trial position:      OR = 2.03 per 10 trials

   → INTERPRETATION: 3D exposure is ADAPTIVE - depends on performance

VERDICT: 3D items CANNOT be used as a predictor

  - Correlation r = 0.644 is too large to ignore

  - Cohen's d = 2.36 is a huge effect size

  - Adaptive selection (OR = 3.32) creates endogeneity

  → SOLUTION: Exclude 3D items from RT residualization

Step 2: Check Response Acceleration

Why: Need to know if we should control for trial_number in RT model

RESEARCH QUESTION: Do children speed up during the test?

Hypothesis: May be confounded by item difficulty changes

A. Simple linear model: log(RT) ~ trial_number

   β = 0.00286 (p < .001)

B. Controlled model: log(RT) ~ trial_number + angle + 2D/3D

   β_trial = -0.00352

✓ Include trial_number in RT residualization

Step 3: Check Item Position Effects on Accuracy

Why: Items might be harder later independent of their difficulty

Model: accuracy ~ trial_number + angle + 2D/3D + person + item

  β_position = -0.00096 (per trial)

  -> INTERPRETATION: No strong position effects on accuracy

PHASE 2: Core Analysis

Step 4: Refit RT Model with Corrections (Option B: Simple Residuals)

Why: Incorporate all the diagnostics into one clean model using simple residualization.

======================================================================

STEP 4: CORRECTED RT RESIDUALIZATION (2D Items Only)

======================================================================

Sample for RT model:

  - Trials: 19051 (2D items only)

  - People: 621

Fitting: log(RT) ~ angle + trial_number + (1|item)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: log_rt ~ angle + trial_number + (1 | item)
   Data: mrot_rt_2d

REML criterion at convergence: 35497.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2969 -0.6318 -0.0192  0.6590  2.8639 

Random effects:
 Groups   Name        Variance Std.Dev.
 item     (Intercept) 0.007461 0.08638 
 Residual             0.376188 0.61334 
Number of obs: 19051, groups:  item, 8

Fixed effects:
               Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)   8.167e-01  7.639e-02  5.973e+00  10.691 4.07e-05 ***
angle         2.609e-03  6.904e-04  5.742e+00   3.778  0.00998 ** 
trial_number -9.569e-03  4.930e-04  6.342e+03 -19.408  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) angle 
angle       -0.903       
trial_numbr -0.130 -0.018


Person-level RT Residuals (Option B):

  Mean = 0.008 (should be approx 0)

  SD   = 0.394 (This is the between-person variance)

✓ RT residualization (Simple Method) complete

Step 5: Compare Original vs Corrected RT Residuals

======================================================================

STEP 5: COMPARING ORIGINAL VS CORRECTED RESIDUALS

======================================================================

Correlation: Original (2D+3D) vs Corrected (2D Only): r = 0.412

PHASE 3: Final Bivariate Model

Step 6: Final Bivariate Model with Corrected Residuals

======================================================================

STEP 6: FINAL BIVARIATE MODEL

======================================================================

Final sample: 706 person-runs

Fitting bivariate model...

  Formula: mvbind(ability, rt_resid) ~ site + age

 Family: MV(gaussian, gaussian) 
  Links: mu = identity
         mu = identity 
Formula: ability ~ site + age 
         rt_resid ~ site + age 
   Data: model_data_corrected (Number of observations: 706) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                              Estimate Est.Error l-95% CI u-95% CI Rhat
ability_Intercept                -1.86      0.13    -2.10    -1.61 1.00
rtresid_Intercept                 0.47      0.06     0.35     0.59 1.00
ability_sitepilot_uniandes_co    -1.32      0.07    -1.45    -1.19 1.00
ability_sitepilot_western_ca     -0.43      0.08    -0.59    -0.28 1.00
ability_age                       0.20      0.01     0.17     0.22 1.00
rtresid_sitepilot_uniandes_co    -0.14      0.03    -0.20    -0.08 1.00
rtresid_sitepilot_western_ca     -0.03      0.04    -0.10     0.05 1.00
rtresid_age                      -0.05      0.01    -0.06    -0.03 1.00
                              Bulk_ESS Tail_ESS
ability_Intercept                 8218     3030
rtresid_Intercept                 7491     3230
ability_sitepilot_uniandes_co     7105     3293
ability_sitepilot_western_ca      7212     3418
ability_age                       8980     3240
rtresid_sitepilot_uniandes_co     7518     3134
rtresid_sitepilot_western_ca      6786     2821
rtresid_age                       7579     3304

Further Distributional Parameters:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_ability     0.77      0.02     0.73     0.81 1.00    10435     3131
sigma_rtresid     0.38      0.01     0.36     0.40 1.00     7655     3018

Residual Correlations: 
                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
rescor(ability,rtresid)     0.10      0.04     0.03     0.17 1.00     8539
                        Tail_ESS
rescor(ability,rtresid)     3357

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 1.009 seconds (Warm-up)
Chain 1:                0.841 seconds (Sampling)
Chain 1:                1.85 seconds (Total)
Chain 1: 
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 1.05 seconds (Warm-up)
Chain 2:                0.847 seconds (Sampling)
Chain 2:                1.897 seconds (Total)
Chain 2: 
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 1.058 seconds (Warm-up)
Chain 3:                0.853 seconds (Sampling)
Chain 3:                1.911 seconds (Total)
Chain 3: 
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 1.046 seconds (Warm-up)
Chain 4:                0.864 seconds (Sampling)
Chain 4:                1.91 seconds (Total)
Chain 4:

Joint Analysis of Spatial Ability and Processing Speed
Posterior estimates from Bivariate Bayesian Regression
Predictor	Beta	Lower 95% CI	Upper 95% CI	Finding
Ability (SD)
Site: Colombia (vs. DE)	−1.32	−1.45	−1.19	Significantly Lower Accuracy
Site: Canada (vs. DE)	−0.43	−0.59	−0.28	Moderately Lower Accuracy
Age (per year)	0.20	0.17	0.22	Developmental Growth
Speed (log s)
Site: Colombia (vs. DE)	−0.14	−0.20	−0.08	Significantly Faster (13%)
Site: Canada (vs. DE)	−0.03	−0.10	0.05	No Difference in Speed
Age (per year)	−0.05	−0.06	−0.03	Developmental Speedup

Results: Joint Analysis of Ability and Processing Speed

To investigate the relationship between spatial ability and processing speed, we fit a bivariate Bayesian regression model estimating latent spatial ability (\(\theta\)) and residualized reaction times (RT).

Cross-Cultural Differences Results contradicted a “cultural caution” hypothesis. Relative to the German baseline, Colombian children showed significantly lower accuracy (\(\beta =\) -1.32 SD, 95% CI [-1.45, -1.19]) yet responded significantly faster (\(\beta =\) -0.14 log-seconds, 95% CI [-0.20, -0.08]; 13.0% faster). Canadian children displayed lower accuracy (\(\beta =\) -0.43 SD, 95% CI [-0.59, -0.28]) with processing speeds statistically equivalent to the German group (\(\beta =\) -0.03, 95% CI [-0.10, 0.05]). Thus, the lowest-performing group exhibited the fastest response style.

Developmental Trajectories Age robustly predicted simultaneous gains in accuracy and efficiency. Each additional year was associated with increased spatial ability (\(\beta =\) 0.20 SD, 95% CI [0.17, 0.22]) and faster response times (\(\beta =\) -0.05 log-seconds, 95% CI [-0.06, -0.03]).

The Speed-Accuracy Trade-off We found a significant positive residual correlation between ability and response time (\(r =\) 0.10, 95% CI [, ]). Since higher RT values indicate slower responses, this confirms a speed-accuracy trade-off: controlling for age and site, children who allocated more time to the task achieved higher accuracy.

Developmental Trajectories and Strategic Trade-offs

Our analysis reveals that cognitive maturation drives simultaneous improvements in both the precision and efficiency of spatial reasoning. Age was a strong, positive predictor of spatial ability (\(\beta =\) 0.20 SD, 95% CI [0.17, 0.22]) and a significant negative predictor of response time (\(\beta =\) -0.05 log-seconds, 95% CI [-0.06, -0.03]). This indicates that as children grow older, they not only solve mental rotation tasks more accurately but also do so significantly faster.

Furthermore, we identified a significant positive residual correlation between ability and response time (\(r =\) 0.10, 95% CI [, ]). Since higher RT values indicate slower responses, this finding highlights a functional speed-accuracy trade-off: independent of age and site, children who allocated more time to the task achieved higher accuracy. This suggests that successful performance in mental rotation relies partly on a deliberative strategy where resisting the impulse to respond quickly allows for more successful spatial transformation.

Analysis 3: Strategy Effectiveness (Slope Differences)

To test whether the effectiveness of the “slow-down” strategy varies across cultures, we fit a Bayesian regression model predicting spatial ability from the interaction between site and processing speed (ability ~ age + site * rt_resid).

# Fit the Interaction Model
# We ask: Does the slope of RT -> Ability change depending on the Site?
m_strategy <- brm(
  ability ~ age + site * rt_resid, 
  data = model_data_corrected,
  chains = 4, 
  cores = 4, 
  iter = 2000, 
  seed = 123
)

# Extract Fixed Effects to check the Interaction terms
summary(m_strategy)

Developmental Trajectories and Strategic Trade-offs

Developmental Gains Our analysis reveals that cognitive maturation drives simultaneous improvements in both precision and efficiency. Age was a robust predictor of spatial ability (\(\beta =\) 0.20 SD) and response time (\(\beta =\) -0.05 log-seconds). This confirms that as children grow older, they not only solve mental rotation tasks more accurately but also do so significantly faster.

Strategy Effectiveness We identified a functional speed-accuracy trade-off that varies by context. While German children maintained high accuracy regardless of their speed, children in Colombia and Canada showed a significant “return on investment” for slowing down (Interaction \(\beta > 0.50\)). For these groups, resisting the impulse to respond quickly was strongly predictive of higher performance, suggesting that their lower average accuracy is partly attributable to a faster, less deliberative response style.