I will be replicating the results shown in figures 5 and 6 of the paper Too Many Cooks: Bayesian Inference for Coordinating Multi‐Agent Collaboration by Sarah Wu et al. (2021). In this paper, the authors propose Bayesian Delegation - a multi-agent learning mechanism that uses inverse planning to infer the hidden intentions of others - as a computational model for coordination, showing that the model aligns with human behavior better than other competing models. As someone broadly interested in computational modeling and social cognition, this paper aligns well with my interests due to their use of Bayesian Theory of Mind (BToM). Furthermore, as someone interested in understanding how we represent knowledge in abstract ways, and more recently in task decomposition, their use of hierarchical planning at the level of actions and sub-goals makes this project a clear fit for my interests. Finally, their use of model ablation to interrogate the importance of specific features of Bayesian Delegation in Figure 6 is methodologically relevant to my work as I intend to do various benchmarks later on to probe cognitive function using computational modeling.
For this replication, I will recreate the scene judgment task where we present the participant with two agents working together in the overcooked domain. Rather than regenerating the scenes from the computational model, I will use the same stimuli in the original paper. Rather than using base HTML and javascript, however, I will be re-coding this as a jsPsych plugin for ease of future use. To generate the figures and perform model comparison, I will directly compare the results with the model outputs from the original paper (i.e., I will not be recreating the model from scratch). The primary challenge will lie in recreating the behavioral experiment, as the plots comparing model and human performance are easier to reproduce using existing model outputs.
Another student in the class may perform a computational replication of the models presented here. If that is the case, then I will also evaluate the outcomes on both the papers’ outputs and the replicated model outputs.
Link to repository: https://github.com/psych251/wu2021
Link to paper: https://sarahawu.github.io/assets/papers/wu2021cooks.pdf
Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.
Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.
All materials - can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.
Can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.
Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.
Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.
Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.
You can comment this section out prior to final report with data collection.
Sample size, demographics, data exclusions based on rules spelled out in analysis plan
Any differences from what was described as the original plan, or “none”.
Data preparation following the analysis plan.
The analyses as specified in the analysis plan.
Side-by-side graph with original graph is ideal here
Any follow-up analyses desired (not required).
Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.