Introduction

Choice of experiment

Hart et al. (2018) used a simple planar geometric task to examine why people’s intuitive geometric reasoning deviated from deterministic, rule-based Euclidean representations and provided a computational account for such deviation. This is particularly relevant to my own work in which I seek to model the acquisition of geometric intuitions and mathematical reasoning.

Description of paper

Across 5 variations of the Triangle Completion Task, Hart et al. (2018) found that the distribution of participants’ error in estimating the missing top corner of a fragmented triangle was scale-dependent, contrary to a static Euclidean representation. The study then showed that the statistical characteristics of a correlated random walk model captured the simulation-based strategies that might have guided the geometric reasoning process in the Triangle Completion Task.

I aim to replicate the main behavioral findings of Experiment 2 of Hart et al. (2018) (Figure S4 A, B), which showed:
1. the error of the vertical mean estimate of the missing vertex of the triangle was biased toward the base and increased linearly with side length.
2. the y-coordinate standard deviation scaled sub-linearly with the side length.

Stimuli and procedures

Participants were recruited via Amazon’s Mechanical Turk. Participants were shown 15 different incomplete isosceles triangles, each for 10 times, in a random order. Participants were assigned randomly into two equal-sized groups. Triangles composed of 3 different base angle sizes (Group 1: 30, 45 and 60 degrees; Group 2: 36, 51, and 66 degrees) and with 5 different base lengths (0.1, 0.25, 0.5, 0.75 and 1). For each triangle, participants were asked to position a dot in the estimated location of the missing vertex. A practice trial preceded the experiment.

Anticipated challenges

  1. Programming the experiment and ensure browser compatibility.
  2. Resolving the pixel scaling between the two groups as described in the original paper.

Methods

Materials

Materials as quoted from the original paper:

All participants were shown 15 (fragmented isosceles) triangles, at base lengths of 0.1, 0.25, 0.5, 0.75, 1 of the maximum base length (which was 900 pixels for group 1 and 1300 pixels for group 2). Each base length presented 3 different angle sizes (30, 45 and 60 degrees for group 1 and 36, 51 and 66 degrees for group 2)…group 1 saw a y-coordinate length scale of 900 pixels and group 2 saw a y-coordinate length scale of 1300 pixels.

The stimuli used in this replication project will follow the materials quoted above, except the images will be 2175px by 1425px, and both groups will see the same y-coordinate length. In addition, the html image container is sized 435px by 285px
The stimuli images are generated using code in this ipython Jupyter notebook: https://github.com/psych251/hart2018/tree/master/code/generate_image.ipynb.
The images used in this replication project can also be found in the git repo: https://github.com/psych251/hart2018/tree/master/images.

Procedure

The task procedure used in this replication project will precisely follow the description in the original paper:

… we showed participants 15 different incomplete isosceles triangles 10 times in a random order (for a total of 150 triangles for each participant)…For each triangle, we asked participants to position a dot in the estimated location of the missing vertex. Before the experiment began, participants had one practice trial, in which the location of the missing vertex was indicated by a dot of a different color, and they were asked to position their dot on the indicated position.

Analysis Plan

I plan to use python for data-cleaning and pre-processing, and use the nlsLM() function in R to perform nonlinear regression.

Main variables of interest
δ, the mean estimated y-coordinate deviation from the true y-coordinate location of the missing vertex
σ, the standard deviation of participants’ distribution of responses
L, triangle side length

Key analyses of interest
1. δ biases towards the base (no statistical test)
2. δ ~ Lx1, δ increases linearly with L (x1 close to 1)
3. σ ~ Lx2, σ scales sub-linearly with L (x2 < 1)

In addition, I will use an one-sample t-test to as the statistical test for the sublinearity of the scaling exponent.

Additional data analysis steps from the paper that will be followed: “The mean deviation from the true location of the missing vertex… and the standard deviation were calculated for each participant and then averaged across participants.”

Power Analysis

I performed the analysis pipeline on the open-access data from the original paper to find out the standard deviation of the key analysis variable, σ, as well as to perform the planned t-test in the replication analysis pipeline. Although the mean scaling exponent from this reproduction attempt did not align with the original one (reproduced: 0.80, original: 0.64), the effect was in the correct direction and significant.

library(tidyverse)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.2     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(minpack.lm)

# data pre-processed using Python ()
data = read.csv('../data/original_exp2data.csv')

data = data %>%
  group_by(SubjectID, L) %>%
  summarize(EstYstd = sd(EstYAdjusted))

subjects = unique(data$SubjectID)

bs = numeric()
for (s in subjects) {
  x = data[data$SubjectID==s,]
  fit = nlsLM(EstYstd ~ a*L^b, data=x, start=list(a=1, b=1))
  bs <- c(bs, coef(fit)['b'])
}

mean(bs)
## [1] 0.8048793
sd(bs)
## [1] 0.3070846
(1-mean(bs))/sd(bs)
## [1] 0.6353973
t.test(bs, mu=1)
## 
##  One Sample t-test
## 
## data:  bs
## t = -6.354, df = 99, p-value = 6.445e-09
## alternative hypothesis: true mean is not equal to 1
## 95 percent confidence interval:
##  0.7439470 0.8658115
## sample estimates:
## mean of x 
## 0.8048793

This analysis revealed that the effect size of the sublinearity of the scaling exponent is 0.64 (Cohen’s d). Power analysis (using G*Power) revealed that we need the following sample size to detect this effect size:
to achieve 80% power, we need N = 22;
to achieve 90% power, we need N = 29;
to achieve 95% power, we need N = 35.

Planned Sample

The full task takes about 10 minutes (7 minutes when I tried + allowing buffer time for consent process and/or slower responses) to complete. Therefore, in order to be consistent with California minimum wage ($7.25/hour), each participant should get paid $1.21. To allow for a buffer for additional exlusion, we will recruit N = 40 participants.

Differences from Original Study

Sample: Both this replication project and the original study recruit particpants from MTurk, but the specific sample will differ. This should not affect the results.

Materials and procedure: Some details of the experimental design might need to be extrapolated, although I am actively contacting the original author for support/clarification. Specifically,

  1. The criteria for the targeted participant pool (i.e. educated US adults), for example, having finished high school or college.
  2. The displayed length of the line segments that form the base angles.
  3. The size of the dot to be moved, relative to the triangles.
  4. The task parameters (i.e., base angle, base length) used in the practice trial.

In this replication project, I implemented the following for each of the above:
1. Mturk participants are filtered by 92% HIT approval rate, completed more than 5000 HITs, located in the US.
2. The displayed length of the line segments is 0.04. This is chosen so that the smallest triangle appears as incomplete.
3. The size of the dot to be moved is 10px by 10px.
4. The practice trial triangle had a base length of 0.66, and base angle of 56 degrees.

These should not affect the results.

Exclusion: I will implement the following exclusion criteria:
1. Participants who failed to complete all 150 trials will be excluded.
2. Participants whose y-axis estimate fell below the triangle base (y=0) for more than 20% of the trials will be excluded.
3. Participants whose x-axis estimate fell to the left of the x-coordinate of the left vertex or to the right of the x-coordinate of the right vertex for more than 20% of the trials will be excluded.

Analysis:
1. A particular data analysis step from the paper will not be applicable: “To match the scales for the two groups, we divided the estimates of the second group by a ratio of 13/9.”
2. The original paper performed regression using some function in Mathematica 11.0, while I plan to use nlsLM() in R. This should not affect the results.

Methods Addendum (Post Data Collection)

Actual Sample

Our actual sample size is N=40, with 20 in Group 1 and 20 in Group 2. After applying the exclusion criteria, a total of 3 subjects were excluded from all subsequent analyses: 2 subjects were excluded from Group 1 and 1 subject was excluded from Group 2.

Differences from pre-data collection methods plan

I fixed some bugs from the previous implementation (pilot-b stage) of the analysis steps, but no difference from pre-data collection methods plan was introduced.

Results

Data preparation

Data preparation is performed using python in a Jupyter notebook.
Code can be found in the project git repo: https://github.com/psych251/hart2018/tree/master/code/preprocess.ipynb

Main confirmatory analysis

Note that following the original paper, the error bars in the plots indicate the standard deviation of the measures across subjects.

Load data and exclude subjects:

library(tidyverse)
library(minpack.lm)

# raw mturk data is preprocessed using iPython Jupyter Notebook
# preprocessing steps include: 
#   extracting relevant columns
#   organizing into tidy form
#   add triangle parameters for each trial
# Link: https://github.com/psych251/hart2018/tree/master/code/preprocess.ipynb
data = read.csv('../data/final_postprocess_corrected.csv')
data  = data %>%
  mutate(deltaY=scaledY-topY) %>% # create a column for y-axis estimate deviation
  filter(trial!='practice') # exclude practice trials

exclude_subject= function(s) {
  x = data[data$subject==s, ]
  # exclusion 1. Participants who failed to complete all 150 trials will be excluded.
  if (nrow(x)!= 150) { return(TRUE) }
  if (sum(is.na(x$estY))>0) { return(TRUE) }
  # exclusion 2. Participants whose y-axis estimate fell below the triangle base (y=0) for more than 20% of the trials will be excluded.
  if ((sum(x$estY<0)/length(x$estY)) > 0.2) { return(TRUE) }
  # exclusion 3. Participants whose x-axis estimate fell to the left of the x-coordinate of the left vertex or to the right of the x-coordinate of the right vertex for more than 20% of the trials will be excluded.
  left = sum(x$scaledX < x$leftX)
  right = sum(x$scaledX > x$rightX)
  if ((left + right)/length(x$scaledX) > 0.2) { return(TRUE) }
  return(FALSE)
}

subjects = unique(data$subject)
exclude = numeric()
for (s in subjects) {
  if (exclude_subject(s)) {exclude=c(exclude, s)}
}

data = data %>% filter(! subject %in% exclude)

Non-linear regression: δ ~ a1 * Lb1

delta = data %>%
  group_by(subject, sideLen) %>%
  summarize(meanDeltaY = mean(deltaY))

subjects = unique(delta$subject)

as = numeric()
b1 = numeric()
for (s in subjects) {
  x = delta[delta$subject==s,]
  fit = nlsLM(meanDeltaY ~ a*(sideLen)^b, data=x, start=list(a=1, b=1))
  as <- c(as, coef(fit)['a'])
  b1 <- c(b1, coef(fit)['b'])
}

mean(as)
## [1] -0.2557866
mean(b1)
## [1] 2.789955
t.test(b1, mu=1)
## 
##  One Sample t-test
## 
## data:  b1
## t = 6.4629, df = 36, p-value = 1.68e-07
## alternative hypothesis: true mean is not equal to 1
## 95 percent confidence interval:
##  2.228254 3.351656
## sample estimates:
## mean of x 
##  2.789955

Non-linear regression: σ ~ a2 * Lb2

sigma = data %>%
  group_by(subject, sideLen) %>%
  summarize(estYstd = sd(scaledY))

subjects = unique(sigma$subject)

as = numeric()
b2 = numeric()
for (s in subjects) {
  x = sigma[sigma$subject==s,]
  fit = nlsLM(estYstd ~ a*sideLen^b, data=x, start=list(a=1, b=1))
  as<- c(as, coef(fit)['a'])
  b2 <- c(b2, coef(fit)['b'])
}

mean(as)
## [1] 0.1524901
mean(b2)
## [1] 0.5710148
t.test(b2, mu=1)
## 
##  One Sample t-test
## 
## data:  b2
## t = -6.4501, df = 36, p-value = 1.747e-07
## alternative hypothesis: true mean is not equal to 1
## 95 percent confidence interval:
##  0.4361293 0.7059003
## sample estimates:
## mean of x 
## 0.5710148

Plot 1: Mean error of y-coordinate estimation (δ) as a function of triangle side length

plotdata1 = data %>%
  group_by(subject, baseAngle, sideLen) %>%
  summarize(meanDeltaY=mean(deltaY)) %>% # within subject mean y-axis deviation
  group_by(baseAngle, sideLen) %>%
  summarize(stdDeltaY=sd(meanDeltaY), meanDeltaY=mean(meanDeltaY)) # across subject mean and std y-axis deviation

plotdata1$baseAngle = as.factor(plotdata1$baseAngle) # for coloring
ggplot(plotdata1, aes(x=sideLen, y=meanDeltaY, color=baseAngle)) + 
  geom_point() +
  geom_errorbar(aes(ymin=meanDeltaY-stdDeltaY/2, ymax=meanDeltaY+stdDeltaY/2)) + 
  labs(x='L, triangle side length', y='δ, bias', color='Base Angle')

Plot 2: Mean standard deviation of y-coordinate estimation (σ) as a function of triangle side length

plotdata2 = data %>%
  group_by(subject, baseLen, sideLen) %>%
  summarize(sigmaY=sd(scaledY)) %>% # within subject std y-axis estimates
  group_by(baseLen, sideLen) %>%
  summarize(stdSigmaY=sd(sigmaY), meanSigmaY=mean(sigmaY)) # across subject std y-axis estsimates

plotdata2$baseLen = as.factor(plotdata2$baseLen) # for coloring
ggplot(plotdata2, aes(x=sideLen, y=meanSigmaY, color=baseLen)) + 
  geom_point() +
  geom_errorbar(aes(ymin=meanSigmaY-stdSigmaY/2, ymax=meanSigmaY+stdSigmaY/2)) + 
  labs(x='L, triangle side length', y='σ, std', color='Base Length')

Original paper (Figure S4, mturk study, N=100):
Note that the delta and sigma (y-axes values) in the original paper’s analyses were calculated using pixel-space y-estimates, whereas this replication calculated these values in the scaled-space.
Alt PaperFigureS4

Discussion

Summary of Replication Attempt

This replication was a partial success. The first key analysis did not replicate. In the replication data, δ scaled superlinearly w.r.t. L (M=2.79, SD=1.68, t(36)=6.4629, p<0.001). The second key analysis replicated, supporting the sublinearity scaling between σ and L (M=0.57, SD=0.40, t(36)=-6.4501, p<0.001).

Commentary

It was a short time frame to generate the stimuli, code up the paradigm and the analysis, run pilot experiments and perform final data collection, but this opportunity to replicate a full experiment was truly a rewarding experience. A challenge that I had not anticipated was the intricacy of generating the task images so that the triangle bases and corners sit exactly at the border pixel rows of the image. This way, we can ensure that the click responses computed w.r.t. the HTML image container match with the coordinate system used to generate the triangle images. In addition, it took some thought to design the experiment so that it won’t be a pain for the participants to perform the task. For example, rather than having the partcipants manually click on the small dot and drag it, I implemented it in a way that the dot will move with the cursor, and a click functions to drop the dot. Although part of the original results did not replicate, the takeaway is the same: people are pretty bad at standard geometric form reasoning, and they do not appear to stick to Euclidean rules when deploying geometric intuitions.