Replication of initial experiment from Mark Ho et al. (2022). People construct simplified mental representations to plan. Nature.

Author

Justin Yang (justin.yang@stanford.edu)

Published

Invalid Date

Introduction

I will replicate the results shown in Figure 3 of the paper People construct simplified mental representations to plan by Mark Ho et al. (2022).

In this paper, the authors propose task construal – a computational framework of planning that allows for constructing simplified task representations before planning, claiming that this capacity to control our mental representations better mirrors human behavior. Specifically, the authors derive a model of value-guided construal under the resource-rational perspective that an ideal, cognitively limited decision-maker should form value-guided construals that balance the complexity of a representation and its use for planning and acting. I am interested in understanding how we might construct and use flexible representations for mental simulation, so this work in the planning domain is highly relevant. In particular, their use of memory traces to probe whether an object was represented during planning is methodologically relevant to my work as I look for ways to explore the content of our mental representations in my research.

For this replication, I will recreate the initial experiment where the participants are first asked to navigate a maze composed of tetromino-shaped obstacles and then probed on their awareness of each obstacle after completing the navigation. Specifically, I will be reproducing the finding in Figure 3a, where they plot a histogram of the participants’ mean awareness of each obstacle, splitting on whether or not the value-guided construal model assigned less than or greater than 50% probability of including the obstacle in the construal.

Rather than using psiTurk and jsPsych (v.6.0.1), I will be using Prolific on its own and migrating the code to the most recent version of jsPsych (v.7.3.3) for ease of future use. Rather than regenerating the scores from the computational models, I will use the same stimuli in the original paper. To generate the figures, I will directly compare the results with the model outputs from the original paper (i.e., I will not be recreating the model from scratch). The primary challenge will lie in recreating the behavioral experiment.

Link to repository
Link to paper
Link to behavioral experiment (doesn’t exist yet)

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

Will be covered next week. For the primary analysis, the authors report \(\chi^2(1, N=84)=23.03\), \(p = 1.6 \times 10^{-6}\), effect size \(w = 0.52\)

Planned Sample

In the original paper, the authors requested 200 participants on Prolific and wrote that a trial was excluded if any of the following occured during navigation:

  • >5,000 ms was spent at the initial state
  • >2,000 ms was spent at any non-initial state
  • >20,000 ms was spent on the entire trial
  • >1,500 ms was spent in the last three steps

Participants with <80% of trials after exclusions or who failed 2 of 3 comprehension questions were excluded, which resulted in n = 161 participants’ data being analysed. The authors reported the following demographics: median age of 28; 81 male, 75 female, 5 neither

Materials

Materials relevant to the study can be found here. The preregistration is here. The primary material needed is the set of 12 mazes. which is found here.

Procedure

Our initial experiment used a maze-navigation task in which the participants moved a circle from a starting location on a grid to a goal location using the arrow keys. The set of initial mazes consisted of twelve 11×11 mazes with seven blue tetronimo-shaped obstacles and centre walls arranged in a cross that blocked movement. On each trial, the participants were first shown a screen displaying only the centre walls. When they pressed the spacebar, the circle they controlled, the goal and the obstacles appeared, and they could begin moving immediately. Moreover, to ensure that the participants remained focused on moving, we placed a green square on the goal that shrank and would disappear after 1,000 ms but reset whenever an arrow key was pressed, except at the beginning of the trial when the green square took longer to shrink (5,000 ms). The participants received US$0.10 for reaching the goal without the green square disappearing (in addition to the base pay of US$0.98). The mazes were pseudorandomly rotated or flipped, so the start and end state was constantly changing, and the order of mazes was pseudorandomized. After completing each trial, the participants received awareness probes, which showed a static image of the maze they had just navigated, with one of the obstacles shown in light blue. The participants were asked “How aware of the highlighted obstacle were you at any point?” and could respond using an eight-point scale (which was rescaled to 0–1 for analyses). Probes were presented for the seven obstacles in a maze. None of the probes were associated with a bonus.

Analysis Plan

I aim to follow the following analysis plan:

The key analysis of interest is the \(\chi^2\) test for independence. The authors first split the obstacles across mazes on the basis of whether the value-guided construal model assigned a probability of less than or equal to 0.5 or greater than 0.5. Then after also splitting on the basis of whether mean awareness responses were less than or equal to 0.5 or greater than 0.5, they used a \(\chi^2\) test for independence and found that this awareness split was predicted by value-guided construal (\(\chi^2(1, N=84)=23.03\), \(p = 1.6 \times 10^{-6}\), effect size \(w = 0.52\)).

I also aim to reconstruct the plot from figure 3a, which is a histogram of participant mean awareness responses, splitting the data on whether value-guided construal assigned a probability of less than or equal to 0.5 or greater than 0.5.

Differences from Original Study

I will aim to recreate the experiment precisely using jsPsych version 7.3, however differences in versions may result in small changes in visual display (e.g., font) in the experiment. Aside from that, a separate random sample of participants will be recruited on Prolific.

Results

Data preparation

Can this be done in python? The original data preparation code is written in python here.

import pandas as pd
import numpy as np
import seaborn as sns
from vgc_project import utils, gridutils
from msdm.domains import GridWorld
from vgc_project.modelinterface import create_modeling_interface

import json
from functools import lru_cache
from itertools import combinations, product
import warnings
from tqdm import tqdm
sem = lambda c: np.std(c)/np.sqrt(np.sum(c != np.nan))
import matplotlib.pyplot as plt
import sys
import logging
logger = logging.getLogger()
logging.basicConfig(stream=sys.stdout)
logger.setLevel(logging.WARNING)

import os
DATA_DIRECTORY = "../experiments"

# ========= #
# Utilities #
# ========= #
@lru_cache()
def make_gridworld(tile_array):
    gw = GridWorld(
        tile_array,
        initial_features="S",
        absorbing_features="G",
        wall_features="#0123456789",
        step_cost=-1
    )
    def plot(*args, **kwargs):
        kwargs = {
            "featurecolors": {
                "#": 'k',
                "G": "green",
                **{k: 'mediumblue' for k in "0123456789"}
            },
            "plot_walls": False,
            **kwargs
        }
        return GridWorld.plot(gw, *args, **kwargs)
    gw.plot = plot
    return gw

def calc_nav_mindist(navtrial):
    flocs = make_gridworld(mazes[navtrial['grid']]).feature_locations
    dists = {}
    for obs, locs in flocs.items():
        if obs not in "0123456789":
            continue
        locs = [(l['x'], l['y']) for l in locs]
        dist = gridutils.min_dist(navtrial['state_traj'], locs)
        dists[f"obs-{obs}"] = dist['mindist']
    return pd.Series(dists)

def calc_nav_mindist_timestep(navtrial):
    flocs = make_gridworld(mazes[navtrial['grid']]).feature_locations
    steps = {}
    for obs, locs in flocs.items():
        if obs not in "0123456789":
            continue
        locs = [(l['x'], l['y']) for l in locs]
        dist = gridutils.min_dist(navtrial['state_traj'], locs, sourcename='traj')
        mindist_loc = (dist['mintrajloc.x'], dist['mintrajloc.y'])
        mindist_step = max(i for i, loc in enumerate(navtrial['state_traj']) if tuple(loc) == mindist_loc)
        steps[f"obs-{obs}"] = mindist_step
    return pd.Series(steps)

# ================= #
# Model Predictions #
# ================= #
mazes_0_11 = json.load(open(os.path.join(DATA_DIRECTORY, "mazes/mazes_0-11.json"), 'r'))
mazes_12_15 = json.load(open(os.path.join(DATA_DIRECTORY, "mazes/mazes_12-15.json"), 'r'))
mazes = {
    **{"-".join(k.split('-')[:-1]): tuple(v) for k,v in mazes_0_11.items()},
    **{k: tuple(v) for k,v in mazes_12_15.items()}
}

mods = create_modeling_interface(joblib_cache_location="./_analysiscache")
model_preds = []
for grid, tile_array in mazes.items():
    for obs in sorted(set("0123456789") & set.union(*[set(r) for r in tile_array])):
        preds = {
            "grid": grid,
            "obstacle": f"obs-{obs}",
            **mods.predictions(tile_array, obs, seed=72193880),
        }
        model_preds.append(preds)
model_preds = pd.DataFrame(model_preds)

to_zscore = [
    # 'vgc_weight',
    'static_vgc_weight',
    'dynamic_vgc_weight',
    'log_traj_based_hitcount',
    'graph_based_hitcount',
    'goal_dist',
    'start_dist',
    'optpolicy_dist',
    'walls_dist',
    'center_dist',
    'bottleneck_dist',
    'sr_occ'
]
for col in to_zscore:
    model_preds[col+"_Z"] = utils.zscore(model_preds[col])

# ================= #
# Experiment 1 Data #
# ================= #
@lru_cache()
def get_exp1_nt():
    def parse_exp1_navtrial(t):
        t = t.sort_values('trialnum')
        return pd.Series({
            'state_traj': list(t['state']),
            'initial_rt': t['rt'].iloc[0],
            'total_rt': t['rt'].sum()
        })
    exp1_nt = pd.DataFrame(json.load(open(os.path.join(DATA_DIRECTORY, "exp1/data/navtrials.json"), 'r')))
    exp1_nt = exp1_nt.groupby(['sessionId', 'round', 'grid', 'trans'])\
        .apply(parse_exp1_navtrial).reset_index()
    exp1_nt['grid'] = exp1_nt['grid'].apply(lambda gi: f"grid-{gi}")
    return exp1_nt

@lru_cache()
def get_exp1_navdist():
    exp1_nt = get_exp1_nt()
    exp1_navdist = pd.concat([
        exp1_nt[['sessionId', 'round', 'grid']],
        exp1_nt.apply(calc_nav_mindist, axis=1)
    ], axis=1).melt(id_vars=['sessionId', 'round', 'grid'], var_name="obstacle", value_name="nav_mindist")
    return exp1_navdist

@lru_cache()
def get_exp1_navdist_timestep():
    exp1_nt = get_exp1_nt()
    exp1_navdist_timestep = pd.concat([
        exp1_nt[['sessionId', 'round', 'grid']],
        exp1_nt.apply(calc_nav_mindist_timestep, axis=1)
    ], axis=1).melt(id_vars=['sessionId', 'round', 'grid'], var_name="obstacle", value_name="nav_mindist_timestep")
    return exp1_navdist_timestep

@lru_cache()
def get_exp1_at():
    print("Loading Experiment 1 Attention Trials")
    exp1_at = pd.DataFrame(json.load(open(os.path.join(DATA_DIRECTORY, "exp1/data/attentiontrials.json"), 'r')))
    exp1_at = exp1_at[['sessionId', 'round', 'grid', 'trans', 'probeobs', 'queryround', 'attention', 'rt']]\
        .rename(columns={"probeobs": "obstacle", "queryround": "proberound"})
    exp1_at['grid'] = exp1_at['grid'].apply(lambda gi: f"grid-{gi}")
    exp1_at['obstacle'] = exp1_at['obstacle'].apply(lambda oi: f"obs-{oi}")
    exp1_at['attention_N'] = utils.normalize(exp1_at['attention'], minval=-4, maxval=4)
    exp1_navdist = get_exp1_navdist()
    exp1_navdist_timestep = get_exp1_navdist_timestep()
    exp1_at = exp1_at.\
        merge(exp1_navdist, on=['sessionId', 'round', 'grid', 'obstacle']).\
        merge(exp1_navdist_timestep, on=['sessionId', 'round', 'grid', 'obstacle'])
    exp1_at['nav_mindist_Z'] = utils.zscore(exp1_at['nav_mindist'])
    exp1_at['nav_mindist_timestep_Z'] = utils.zscore(exp1_at['nav_mindist_timestep'])
    exp1_at = exp1_at.merge(model_preds)

    #check that each column of data is unique
    for c1, c2 in product(exp1_at.columns, repeat=2):
        if c1 != c2:
            assert not all(exp1_at[c1] == exp1_at[c2])
    return exp1_at

Confirmatory analysis

The analysis code can be found here.

Side-by-side graph with original graph is ideal here

\(\chi^2\) test

exp1_means = exp1_at.groupby(['grid', 'obstacle'])[["attention_N", "static_vgc_weight"]].mean().reset_index()
exp1_contab = pd.crosstab(exp1_means["attention_N"] >= .5, exp1_means["static_vgc_weight"] >= .5)
chi2, pval, dof, exp = stats.chi2_contingency(exp1_contab, correction=False)
chi2_effectsize_w = np.sqrt(chi2/len(exp1_means))
pval = analysisutils.pval_to_string(pval)
exp1_chi2_res = f"$\chi^2({dof}, N={len(exp1_means)})={chi2:.2f}$, $p {pval}$, effect size $w = {chi2_effectsize_w:.2f}$"

with open("./inputs/exp1_chi2_res_svgc.tex", "w") as f:
    f.write(exp1_chi2_res)
print(exp1_chi2_res)

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.