Replication of Experiment 2 in Hart et al. (2018, Scientific Reports)

Introduction

Choice of experiment

Hart et al. (2018) used a simple planar geometric task to examine why people’s intuitive geometric reasoning deviated from deterministic, rule-based Euclidean representations and provided a computational account for such deviation. This is particularly relevant to my own work in which I seek to model the acquisition of geometric intuitions.

Description of paper

Across 5 variations of the Triangle Completion Task, Hart et al. (2018) found that the distribution of participants’ error in estimating the missing corner of a fragmented triangle was scale-dependent, contrary to a static Euclidean representation. The study then showed that the statistical characteristics of a correlated random walk model captured the simulation-based strategies that might have guided the geometric reasoning process in the Triangle Completion Task.

I aim to replicate the main behavioral findings of Experiment 2 of Hart et al. (2018) (Figure S4 A, B), where the study showed that 1). the error of the vertical mean estimate of the missing vertex of the triangle was biased toward the base and increased linearly with side length, and 2). the y-coordinate standard deviation scaled sub-linearly with the side length.

Stimuli and procedures

Participants were recruited via Amazon’s Mechanical Turk. Participants were shown 15 different incomplete isosceles triangles, each for 10 times, in a random order. Participants were assigned randomly into two equal-sized groups. Triangles composed of 3 different base angle sizes (Group 1: 30, 45 and 60 degrees; Group 2: 36, 51, and 66 degrees) and with 5 different base lengths (0.1, 0.25, 0.5, 0.75 and 1). For each triangle, participants were asked to position a dot in the estimated location of the missing vertex. A practice trial preceded the experiment.

Anticipated challenges

Programming the experiment and ensure browser compatibility.
Resolving the pixel scaling between the two groups as described in the original paper.

Links

Project Repo: https://github.com/psych251/hart2018
Original Paper: https://github.com/psych251/hart2018/tree/master/original_paper
Experiment Implementation: http://web.stanford.edu/~liyuxuan/mturk/experiment.html

Methods

Materials

Materials as quoted from the original paper:

“All participants were shown 15 (fragmented isosceles) triangles, at base lengths of 0.1, 0.25, 0.5, 0.75, 1 of the maximum base length (which was 900 pixels for group 1 and 1300 pixels for group 2). Each base length presented 3 different angle sizes (30, 45 and 60 degrees for group 1 and 36, 51 and 66 degrees for group 2)…group 1 saw a y-coordinate length scale of 900 pixels and group 2 saw a y-coordinate length scale of 1300 pixels.”

The stimuli used in this replication project will follow the materials quoted above, except the images will be 2175px by 1425px, and both groups will see the same y-coordinate length.

Procedure

The task procedure used in this replication project will precisely follow the description in the original paper:

“… we showed participants 15 different incomplete isosceles triangles 10 times in a random order (for a total of 150 triangles for each participant)…For each triangle, we asked participants to position a dot in the estimated location of the missing vertex. Before the experiment began, participants had one practice trial, in which the location of the missing vertex was indicated by a dot of a different color, and they were asked to position their dot on the indicated position.”

Analysis Plan

I plan to use python for data-cleaning, pre-processing, and plotting, and use the nls() function in R to perform nonlinear regression.

Main variables of interest
δ, the mean estimated y-coordinate deviation from the true y-coordinate location of the missing vertex
σ, the standard deviation of participants’ distribution of responses
L, triangle side length

Key analyses of interest
1. δ biases towards the base (no statistical test)
2. δ ~ L^x1, δ increases linearly with L (x1 close to 1)
3. σ ~ L^x2, σ scales sub-linearly with L (x2 < 1)

In addition, I will use a one-sample t-test to as the statistical test for the sublinearity of the scaling exponent.

Additional data analysis steps from the paper that will be followed: “The mean deviation from the true location of the missing vertex… and the standard deviation were calculated for each participant and then averaged across participants.”

Power Analysis

I performed the analysis pipeline on the open-access data from the original paper to find out the standard deviation of the key analysis variable, σ, as well as to perform the planned t-test in the replication analysis pipeline. Although the mean scaling exponent from this reproduction attempt did not align with the original one (reproduced: 0.79, original: 0.65), the effect was in the correct direction and significant.

library(tidyverse)

## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang

## ── Attaching packages ───────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.2     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ──────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(minpack.lm)

# data pre-processed using Python ()
data = read.csv('../data/original_exp2data.csv')

data = data %>%
  group_by(SubjectID, L) %>%
  summarize(EstYstd = sd(EstYAdjusted))

subjects = unique(data$SubjectID)

bs = numeric()
for (s in subjects) {
  x = data[data$SubjectID==s,]
  fit = nlsLM(L ~ a*EstYstd^b, data=x, start=list(a=1, b=1))
  bs <- c(bs, coef(fit)['b'])
}

mean(bs)

## [1] 0.7943758

sd(bs)

## [1] 0.260697

(1-mean(bs))/sd(bs)

## [1] 0.7887477

t.test(bs, mu=1)

## 
##  One Sample t-test
## 
## data:  bs
## t = -7.8875, df = 99, p-value = 4.178e-12
## alternative hypothesis: true mean is not equal to 1
## 95 percent confidence interval:
##  0.7426479 0.8461038
## sample estimates:
## mean of x 
## 0.7943758

This analysis revealed that the effect size of the sublinearity of the scaling exponent is 0.79 (Cohen’s d). Power analysis revealed that we need the following sample size to detect this effect size: to achieve 80% power, we need N = 15; to achieve 90% power, we need N = 19; to achieve 95% power, we need N = 23.

Planned Sample

The full task takes about 10 minutes (7 minutes when I tried + allowing buffer time for consent process and/or slower responses) to complete. Therefore, in order to be consistent with California minimum wage ($7.25/hour), each participant should get paid $1.21. To allow for a buffer for additional exlusion, we will recruit N = 40 participants.

Differences from Original Study

Sample: Both this replication project and the original study recruit particpants from MTurk, but the specific sample will differ. This should not affect the results.

Materials and procedure: Some details of the experimental design might need to be extrapolated, although I am actively contacting the original author for support/clarification. Specifically,

The criteria for the targeted participant pool (i.e. educated US adults), for example, having finished high school or college.
The displayed length of the line segments that form the base angles.
The size of the dot to be moved, relative to the triangles.
The task parameters (i.e., base angle, base length) used in the practice trial.

These should not affect the results.

Exclusion: I will implement the following exclusion criteria: 1. Participants who failed to complete all 150 trials will be excluded. 2. Participants whose y-axis estimate fell below the triangle base (y=0) for more than 20% of the trials will be excluded. 3. Participants whose x-axis estimate fell to the left of the x-coordinate of the left vertex or to the right of the x-coordinate of the right vertex for more than 20% of the trials will be excluded.

Analysis: 1. A particular data analysis step from the paper will not be applicable: “To match the scales for the two groups, we divided the estimates of the second group by a ratio of 13/9.”
2. The original paper performed regression using some function in Mathematica 11.0, while I plan to use nls() in R. This should not affect the results.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation is performed using python in a Jupyter notebook.
Code can be found in the project git repo: https://github.com/psych251/hart2018/tree/master/code/preprocess.ipynb

Confirmatory analysis

In a sandbox mock run with 30 trials, I intentionally clicked on the left vertex for each trial. The plot below shows that the “estimated” left vertex locations is nicely aligned with the true left vertex locations (x-axis jittered).
Alt SanityCheck1

I completed a second sandbox mock run with the full 150 trials. The plot below shows the comparison of the main two analyses between this test data and the original paper’s.

Sandbox test data:
Alt SanityCheck2

Original paper (Figure S4, mturk study, N=100):
Alt PaperFigureS4

Main analysis

library(tidyverse)
library(minpack.lm)

# raw mturk data is preprocessed using iPython Jupyter Notebook
# preprocessing steps include: 
#   extracting relevant columns
#   organizing into tidy form
#   add triangle parameters for each trial
# Link: https://github.com/psych251/hart2018/tree/master/code/preprocess.ipynb
data = read.csv('../data/pilotb_postprocess.csv')
data  = data %>%
  mutate(deltaY=scaledY-topY) %>% # create a column for y-axis estimate deviation
  filter(trial!='practice') # exclude practice trials

exclude_subject= function(s) {
  x = data[data$subject==s, ]
  # exclusion 1. Participants who failed to complete all 150 trials will be excluded.
  if (nrow(x)!= 150) { return(TRUE) }
  if (sum(is.na(x$estY))>0) { return(TRUE) }
  # exclusion 2. Participants whose y-axis estimate fell below the triangle base (y=0) for more than 20% of the trials will be excluded.
  if ((sum(x$estY<0)/length(x$estY)) > 0.2) { return(TRUE) }
  # exclusion 3. Participants whose x-axis estimate fell to the left of the x-coordinate of the left vertex or to the right of the x-coordinate of the right vertex for more than 20% of the trials will be excluded.
  left = sum(x$scaledX < x$leftX)
  right = sum(x$scaledX > x$rightX)
  if ((left + right)/length(x$scaledX) > 0.2) { return(TRUE) }
  return(FALSE)
}

subjects = unique(data$subject)
exclude = numeric()
for (s in subjects) {
  if (exclude_subject(s)) {exclude=c(exclude, s)}
}
data = data %>% filter(! subject %in% exclude)

Non-linear regression: side length ~ y-axis deviation (throws an error, needs to be fixed)

delta = data %>%
  group_by(subject, sideLen) %>%
  summarize(meanDeltaY = mean(deltaY))

subjects = unique(delta$subject)

bs = numeric()
for (s in subjects) {
  x = delta[delta$subject==s,]
  fit = nlsLM(sideLen ~ a*meanDeltaY^b, data=x, start=list(a=1, b=1))
  bs <- c(bs, coef(fit)['b'])
}

mean(bs)
sd(bs)
(1-mean(bs))/sd(bs)
t.test(bs, mu=1)

Non-linear regression: side length ~ std y-axis estimates

sigma = data %>%
  group_by(subject, sideLen) %>%
  summarize(estYstd = sd(estY))

subjects = unique(sigma$subject)

bs = numeric()
for (s in subjects) {
  x = sigma[sigma$subject==s,]
  fit = nlsLM(sideLen ~ a*estYstd^b, data=x, start=list(a=1, b=1))
  bs <- c(bs, coef(fit)['b'])
}

mean(bs)

## [1] 0.05866644

sd(bs)

## [1] 0.8671245

(1-mean(bs))/sd(bs)

## [1] 1.085581

t.test(bs, mu=1)

## 
##  One Sample t-test
## 
## data:  bs
## t = -2.1712, df = 3, p-value = 0.1183
## alternative hypothesis: true mean is not equal to 1
## 95 percent confidence interval:
##  -1.321122  1.438455
## sample estimates:
##  mean of x 
## 0.05866644

Plot 1

plotdata1 = data %>%
  group_by(subject, baseAngle, sideLen) %>%
  summarize(meanDeltaY=mean(deltaY)) %>% # within subject mean y-axis deviation
  group_by(baseAngle, sideLen) %>%
  summarize(stdDeltaY=sd(meanDeltaY), meanDeltaY=mean(meanDeltaY)) # across subject mean and std y-axis deviation

plotdata1$baseAngle = as.factor(plotdata1$baseAngle) # for coloring
ggplot(plotdata1, aes(x=sideLen, y=meanDeltaY, color=baseAngle)) + 
  geom_point() +
  geom_errorbar(aes(ymin=meanDeltaY-stdDeltaY/2, ymax=meanDeltaY+stdDeltaY/2)) + 
  labs(x='L, triangle side length', y='δ, bias', color='Base Angle')

Plot 2

plotdata2 = data %>%
  group_by(subject, baseLen, sideLen) %>%
  summarize(sigmaY=sd(scaledY)) %>% # within subject std y-axis estimates
  group_by(baseLen, sideLen) %>%
  summarize(stdSigmaY=sd(sigmaY), meanSigmaY=mean(sigmaY)) # across subject std y-axis estimates

plotdata2$baseLen = as.factor(plotdata2$baseLen) # for coloring
ggplot(plotdata2, aes(x=sideLen, y=meanSigmaY, color=baseLen)) + 
  geom_point() +
  geom_errorbar(aes(ymin=meanSigmaY-stdSigmaY/2, ymax=meanSigmaY+stdSigmaY/2)) + 
  labs(x='L, triangle side length', y='σ, std', color='Base Length')

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.