Replication of Homo Silicus Study by John Horton (Work in progress)

Author

Joon Sung Park (joonspk@stanford.edu)

Published

October 8, 2023

Introduction

Many important theories in social science and policy design, such as the evolution of norms and the effects of policy interventions on a community, cannot be tested directly due to practical challenges of conducting large-scale longitudinal studies [1, 2, 3]. In response, one promising modern solution I have observed is the use of large language models to create proxies of human participants that may allow us to simulate the outcomes of studies that would otherwise be impossible to conduct. In my research program at the intersection of human-computer interaction and natural language processing, I have introduced methods to simulate general computational agents, known as generative agents [3, 4]. These agents leverage a large language model within a novel agent cognitive architecture to produce human-like behaviors at both the individual and group levels (e.g., user behaviors in online social media, NPC behaviors in Sims-inspired games). My current research interest focuses on demonstrating these agents as a scientific tool that can help us address many of the challenges in the social sciences that are best suited to being answered using simulations of human behavior.

In this replication study, I will delve into John Horton’s paper, “Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?”, which replicates existing social science experiments using large language models as proxies for human participants [5]. Horton’s work is among the notable early works [3, 4, 5, 9, 10] that aim to leverage the power of language models to simulate human participants in behavioral experiments. In his study, he replicates the findings of three experiments derived from Charness and Rabin (2002) [6], Kahneman, Knetsch, and Thaler (1986) [7], and Samuelson and Zeckhauser (1988) [8] using a large language model. He finds that the language model-simulated participants, achieved by prompting the language model with a description of the study and then querying how a hypothetical participant might behave in such an experiment, roughly matched the behavior of human subjects. My goal is to replicate Horton’s findings from all three experiments that he used.

However, in formulating large language models as a method for simulating social science experiments, I have noticed three important challenges that remain unaddressed in this emerging field: 1) ensuring the robustness of the simulated outcomes across different models and minor changes in the prompt, 2) understanding the population we are representing in our simulated outcomes, and 3) the challenges of benchmarking language model-simulated outcomes against published experiments that may be known to the model. In my current research, I am developing a solution concept that may alleviate these concerns to help establish language model-based simulations as a reliable tool for social scientists to explore their theories. For this replication study, in particular, I aim to extend Horton’s replication study to better understand the first of the three challenges I listed above: 1) by replicating his results using variations of prompts that have semantic meaning in describing the experiments but are worded differently, and 2) by benchmarking different versions of large language models. The robustness of the results here, based on the changes in the model and prompt, is particularly important to ensure the replicability of the findings generated using a large language model.

Work Cited

[1] Thomas Schelling. Micromotives and Macrobehavior (1978).
[2] Eric Bonabeau. PNAS. Agent-based modeling: Methods and techniques for simulating human systems (2002)
[3] Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative agents: Interactive simulacra of human behavior.
[4] Joon Sung Park, Lindsay Popowski, Carrie Cai, Mered- ith Ringel Morris, Percy Liang, and Michael S Bernstein. 2022. Social simulacra: Creating populated prototypes for social computing systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pages 1– 18.
[5] John Horton. 2023. Large Language Models as Simulated Economic Agents: What can we learn from Homo Silicus?
[6] Charness, Gary and Matthew Rabin, “Understanding social preferences with simple tests,” The quarterly journal of economics, 2002, 117 (3), 817–869.
[7] Kahneman, Daniel, Jack L Knetsch, and Richard Thaler, “Fairness as a constraint on profit seeking: Entitlements in the market,” The American economic review, 1986, pp. 728–741.
[8] Samuelson, William and Richard Zeckhauser, “Status quo bias in decision making,” Journal of risk and uncertainty, 1988, 1 (1), 7–59
[9] Lisa P Argyle, Ethan C Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, David Wingate. Out of one, many: Using language models to simulate human samples. Political Analysis (2023)
[10] Marcel Binz and Eric Schulz. Using cognitive psychology to understand GPT-3. PNAS (2023)