# Prepare convergence data
conv_summary <- convergence_data %>%
mutate(
agent_type = case_when(
experiment == "mechanical_baseline" ~ "Mechanical Baseline",
experiment == "standard_llm" ~ "Standard LLM",
experiment == "memory_llm" ~ "Memory LLM"
)
)
# A. Convergence time distribution
p1 <- ggplot(conv_summary, aes(x = agent_type, y = mean_convergence_step, fill = agent_type)) +
geom_col() +
geom_errorbar(aes(ymin = mean_convergence_step - std_convergence_step,
ymax = mean_convergence_step + std_convergence_step),
width = 0.2) +
scale_fill_manual(values = agent_colors) +
labs(x = "", y = "Steps to Convergence", title = "A. Convergence Time") +
theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))
# B. Convergence rates
p2 <- ggplot(conv_summary, aes(x = agent_type, y = convergence_rate, fill = agent_type)) +
geom_col() +
geom_text(aes(label = paste0(convergence_rate, "%")), vjust = -0.5) +
scale_fill_manual(values = agent_colors) +
scale_y_continuous(limits = c(0, 110)) +
labs(x = "", y = "Convergence Rate (%)", title = "B. Convergence Success") +
theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))
# C. Relative speed
baseline_steps <- conv_summary$mean_convergence_step[conv_summary$experiment == "mechanical_baseline"]
conv_summary <- conv_summary %>%
mutate(relative_speed = baseline_steps / mean_convergence_step)
p3 <- ggplot(conv_summary, aes(x = agent_type, y = relative_speed, fill = agent_type)) +
geom_col() +
geom_hline(yintercept = 1, linetype = "dashed", color = "red", alpha = 0.5) +
geom_text(aes(label = sprintf("%.1fx", relative_speed)), vjust = -0.5) +
scale_fill_manual(values = agent_colors) +
labs(x = "", y = "Relative Speed", title = "C. Speed vs Baseline") +
theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))
# Combine plots
p1 + p2 + p3Human-like Decision Making in Agent-Based Models: A Comparative Study of Large Language Model Agents versus Traditional Utility Maximization in the Schelling Segregation Model
We present a novel approach to agent-based modeling by replacing traditional utility-maximizing agents with Large Language Model (LLM) agents that make human-like residential decisions. Using the classic Schelling segregation model as our testbed, we compare three agent types: (1) traditional mechanical agents using best-response dynamics, (2) LLM agents making decisions based on current neighborhood context, and (3) LLM agents with persistent memory of past interactions and relationships. Our results reveal that LLM agents converge to stable residential patterns 2.2× faster than mechanical agents while achieving similar final segregation levels (~55% vs 58% like-neighbors). Notably, memory-enhanced LLM agents demonstrate the fastest convergence (84 steps vs 187 for mechanical agents) and a 53.8% reduction in extreme segregation (“ghetto” formation). These findings suggest that incorporating human-like decision-making through LLMs can produce more realistic dynamics in agent-based models of social phenomena, with important implications for urban planning and policy analysis.
agent-based modeling, large language models, segregation, Schelling model, artificial intelligence, complex systems
Introduction
The Schelling segregation model (Schelling 1971) has been a cornerstone of agent-based modeling (ABM) for over five decades, demonstrating how mild individual preferences for similar neighbors can lead to stark residential segregation. Traditional implementations use utility-maximizing agents that relocate when the proportion of like neighbors falls below a threshold. While mathematically elegant, this approach may not capture the complexity of human residential decision-making, which involves social relationships, personal history, and contextual factors beyond simple utility calculations.
Systematic study of Schelling model variants was advanced significantly by Pancs and Vriend (2007), who developed standardized metrics specifically designed for grid-based segregation simulations. Their framework addressed the critical problem that traditional urban segregation indices perform poorly on small-scale agent-based models, providing the quantitative foundation necessary for rigorous comparison across different agent implementations.
Recent advances in Large Language Models (LLMs) offer an unprecedented opportunity to incorporate more realistic human-like decision-making into agent-based models. LLMs trained on vast corpora of human text can simulate nuanced responses to complex social situations, potentially bridging the gap between simplified mathematical models and real-world behavior (Park et al. 2023; Argyle et al. 2023).
In this paper, we present a comparative study of three agent types within the Schelling framework:
- Mechanical agents: Traditional utility-maximizing agents using best-response dynamics
- Standard LLM agents: Agents whose decisions are generated by LLMs based on current neighborhood context
- Memory LLM agents: LLM agents with persistent memory of past interactions and relationships
Our key research questions are: - How do convergence dynamics differ between mechanical and LLM-based agents? - Do LLM agents produce different segregation patterns than traditional agents? - What is the impact of memory on residential stability and segregation outcomes?
Methods
Experimental Design
We implemented a comparative framework using identical environmental conditions across all agent types. The simulation environment consists of a 15×15 grid (225 cells) populated with 50 agents equally divided between two types (25 Type A “red” and 25 Type B “blue”), yielding a density of 22.2%.
Agent Implementations
Mechanical Baseline Agents
Traditional Schelling agents operate as pure utility maximizers using a deterministic threshold function. Each agent continuously evaluates their current position based on neighborhood composition:
\[U_i = \begin{cases} 1 & \text{if } p_i \geq \tau \\ 0 & \text{otherwise} \end{cases}\]
where \(p_i\) is the proportion of like neighbors within Moore neighborhood (8 adjacent cells) and \(\tau = 0.5\) is the satisfaction threshold. Agents with \(U_i = 0\) immediately relocate to the nearest available cell that satisfies their threshold, following a best-response dynamic that guarantees utility improvement with each move.
This approach represents classical rational choice theory: agents have perfect information, consistent preferences, and make optimal decisions to maximize their utility function. While computationally efficient and theoretically elegant, it reduces complex human residential decisions to simple mathematical optimization.
Standard LLM Agents
LLM agents replace mathematical utility functions with natural language reasoning. Each agent receives contextual prompts describing their current situation and must make residential decisions through linguistic reasoning. For baseline (red/blue) scenarios, the prompt structure is:
You are a [red/blue] resident in a neighborhood simulation.
Current situation:
- Your neighborhood has [X] red neighbors and [Y] blue neighbors
- There are [Z] empty houses within moving distance
- You have been living here for [N] time steps
Based on your preferences as a [red/blue] resident, would you:
1. Stay in your current location
2. Move to a different available house
If moving, consider factors like neighborhood composition,
proximity to similar residents, and overall comfort level.
The LLM generates a natural language response that is parsed to extract the agent’s decision. This approach captures nuanced reasoning that may include: - Gradual comfort with diversity vs. strong segregation preferences
- Consideration of neighborhood trends and stability - Social factors beyond pure numerical thresholds - Context-dependent preferences that may vary over time
Memory-Enhanced LLM Agents
Memory-enhanced agents extend standard LLM agents with persistent episodic memory, more closely approximating human decision-making where past experiences shape current choices. Each agent maintains a detailed history including:
Residential History: Complete record of past locations, duration at each address, and reasons for moving Social Interactions: Memory of positive/negative encounters with neighbors of different types Neighborhood Evolution: Observations of how local composition changed over time Personal Relationships: Development of attachments to specific neighbors or locations
The prompt structure includes this historical context:
You are a [identity] resident with the following history:
RESIDENTIAL HISTORY:
- Previously lived at [locations] for [durations]
- Moved because: [recorded reasons]
SOCIAL MEMORY:
- Positive interactions: [specific neighbor relationships]
- Concerns about: [negative experiences or observations]
CURRENT SITUATION:
- Living at current location for [duration]
- Neighborhood has [composition and trends]
- Available moving options: [locations with contexts]
Given your personal history and relationships, what would you do?
Theoretical Expectations for Memory Effects:
Reduced Volatility: Agents with established relationships should move less frequently, reducing overall system dynamics and leading to faster convergence.
Path Dependence: Early positive experiences with diversity should make agents more tolerant of mixed neighborhoods, while negative experiences should increase segregation preferences.
Stabilization Effects: As agents develop local social ties, they become less likely to abandon neighborhoods even when composition changes slightly.
Realistic Inertia: Memory should introduce the residential inertia observed in real populations, where moving decisions involve substantial social and emotional costs beyond simple preference satisfaction.
Reduced Extreme Segregation: Strong social memories should prevent the formation of completely homogeneous neighborhoods (“ghettos”) by maintaining some agents who value established relationships over perfect homophily.
These expectations are based on urban sociology research showing that residential decisions involve complex tradeoffs between preferences for similar neighbors and attachment to place, social networks, and personal history (Sampson 1988; Massey and Fischer 2001).
Segregation Metrics: The Pancs and Vriend Framework
A critical challenge in Schelling model research has been the lack of standardized metrics for comparing segregation outcomes across different implementations and parameters. While Schelling’s original work provided intuitive insights about segregation emergence, it lacked quantitative measures that could enable systematic comparison of results across studies, agent types, or experimental conditions.
Pancs and Vriend (2007) addressed this limitation by developing a comprehensive statistical framework specifically designed for the Schelling model. Their contribution was crucial because traditional segregation indices used in urban sociology (such as the Dissimilarity Index or Isolation Index) were designed for large-scale census data and perform poorly on small-grid simulations with stochastic dynamics.
The Need for Schelling-Specific Metrics
Pancs and Vriend identified several problems with applying standard segregation measures to agent-based models:
- Scale Sensitivity: Traditional indices assume large populations and break down with small grids (our 15×15 grid with 50 agents)
- Boundary Effects: Grid-based simulations have edge effects that distort standard distance-based measures
- Dynamic Context: ABM requires metrics that capture segregation patterns during transient states, not just final equilibria
- Comparative Framework: No existing metrics enabled direct comparison between different agent implementations
Pancs-Vriend Metric Suite
We adopt Pancs and Vriend’s five complementary metrics, each capturing different aspects of spatial segregation:
Share (\(S\)): Average proportion of like neighbors around each agent \[S = \frac{1}{N} \sum_{i=1}^{N} \frac{L_i}{L_i + D_i}\] where \(L_i\) is like neighbors and \(D_i\) is different-type neighbors for agent \(i\). This metric ranges from 0.5 (perfect integration) to 1.0 (complete segregation).
Clusters (\(C\)): Number of spatially contiguous same-type regions using 8-connectivity \[C = \text{count of connected components by type}\] Lower values indicate more segregated (fewer, larger clusters) while higher values suggest fragmented settlement patterns.
Distance (\(D\)): Average Euclidean distance between different-type agents \[D = \frac{1}{N_A \cdot N_B} \sum_{i \in A} \sum_{j \in B} ||pos_i - pos_j||\] Higher values indicate greater spatial separation between groups.
Ghetto Rate (\(G\)): Proportion of agents living in completely homogeneous neighborhoods \[G = \frac{\text{agents with only same-type neighbors}}{N}\] This captures extreme segregation where agents have zero contact with the other group.
Mix Deviation (\(M\)): Deviation from perfect checkerboard integration pattern \[M = \frac{1}{N} \sum_{i=1}^{N} |actual\_neighbors_i - expected\_neighbors_i|\] Measures how far the current pattern deviates from perfect spatial integration.
Why This Framework Enables Our Comparison
The Pancs-Vriend metrics are particularly valuable for our study because they:
- Enable Cross-Agent Comparison: Provide standardized measures that work equally well for mechanical, standard LLM, and memory LLM agents
- Capture Multiple Segregation Aspects: No single metric fully captures segregation complexity; the five-metric suite provides complementary perspectives
- Handle Small-Scale Dynamics: Designed specifically for grid-based ABM with realistic population sizes
- Track Convergence: Enable detection of stable states across different agent types with different convergence patterns
- Quantify Qualitative Differences: Convert complex spatial patterns into comparable numerical values
This standardized framework allows us to make the quantitative claims about LLM agents converging 2.2× faster while achieving similar final segregation levels (~55% vs 58% share metric) - comparisons that would be impossible without robust, validated metrics designed for Schelling-type models.
Statistical Analysis
All experiments were run with 2 replicates for each condition. We use Mann-Whitney U tests for pairwise comparisons and report effect sizes using Cohen’s d.
Results
Convergence Dynamics
Our results reveal striking differences in convergence behavior across agent types. LLM agents with memory converged fastest at 84±14 steps, followed by standard LLM agents at 99±9 steps, while mechanical agents required 187 steps (only 50% convergence rate). This represents a 2.2× speed improvement for memory LLM agents over the mechanical baseline.
Segregation Patterns
# Prepare pairwise data for visualization
metrics_summary <- pairwise_data %>%
filter(group1 == "mechanical_baseline") %>%
select(metric, group1, group2, mean1, std1, mean2, std2) %>%
pivot_longer(cols = c(mean1, mean2, std1, std2),
names_to = c(".value", "group"),
names_pattern = "(mean|std)(.)") %>%
mutate(
agent_type = case_when(
group == "1" ~ "Mechanical Baseline",
group == "2" & str_detect(group2, "standard") ~ "Standard LLM",
group == "2" & str_detect(group2, "memory") ~ "Memory LLM"
)
) %>%
bind_rows(
# Add mechanical baseline self-comparison
pairwise_data %>%
filter(group1 == "mechanical_baseline", group2 == "standard_llm") %>%
select(metric, mean = mean1, std = std1) %>%
mutate(agent_type = "Mechanical Baseline")
)
# Create faceted plot for all metrics
metrics_plot <- metrics_summary %>%
mutate(
metric_label = case_when(
metric == "share" ~ "Share (% Like Neighbors)",
metric == "clusters" ~ "Number of Clusters",
metric == "distance" ~ "Inter-type Distance",
metric == "ghetto_rate" ~ "Ghetto Formation",
metric == "mix_deviation" ~ "Mix Deviation"
)
) %>%
ggplot(aes(x = agent_type, y = mean, fill = agent_type)) +
geom_col() +
geom_errorbar(aes(ymin = mean - std, ymax = mean + std), width = 0.2) +
facet_wrap(~ metric_label, scales = "free_y", ncol = 2) +
scale_fill_manual(values = agent_colors) +
labs(x = "", y = "Metric Value",
title = "Segregation Patterns Across Agent Types") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none")
metrics_plotStatistical Comparisons
# Create summary table of key comparisons
comparison_table <- pairwise_data %>%
filter(metric %in% c("share", "ghetto_rate", "distance")) %>%
mutate(
comparison = paste(group1, "vs", group2),
metric = str_to_title(str_replace(metric, "_", " ")),
effect_size_cat = case_when(
abs(effect_size) < 0.2 ~ "Negligible",
abs(effect_size) < 0.5 ~ "Small",
abs(effect_size) < 0.8 ~ "Medium",
TRUE ~ "Large"
),
significance = ifelse(p_value < 0.05, "*", "")
) %>%
select(Metric = metric,
Comparison = comparison,
`Mean Diff (%)` = percent_change,
`Effect Size` = effect_size,
`Category` = effect_size_cat,
`p-value` = p_value,
Sig = significance) %>%
mutate(
`Mean Diff (%)` = round(`Mean Diff (%)`, 1),
`Effect Size` = round(`Effect Size`, 2),
`p-value` = round(`p-value`, 3)
)
kable(comparison_table, booktabs = TRUE, align = "lcccccc") %>%
kable_styling(latex_options = c("striped", "hold_position")) %>%
column_spec(1, width = "2cm") %>%
column_spec(2, width = "5cm") %>%
footnote(general = "* indicates p < 0.05",
general_title = "Note:",
footnote_as_chunk = TRUE)| Metric | Comparison | Mean Diff (%) | Effect Size | Category | p-value | Sig |
|---|---|---|---|---|---|---|
| Distance | mechanical_baseline vs standard_llm | -5.6 | 0.57 | Medium | 1.000 | |
| Share | mechanical_baseline vs standard_llm | -5.3 | 0.30 | Small | 1.000 | |
| Ghetto Rate | mechanical_baseline vs standard_llm | 0.0 | 0.00 | Negligible | 1.000 | |
| Distance | mechanical_baseline vs memory_llm | -16.2 | 1.64 | Large | 0.333 | |
| Share | mechanical_baseline vs memory_llm | -5.0 | 0.30 | Small | 1.000 | |
| Ghetto Rate | mechanical_baseline vs memory_llm | -53.8 | 1.00 | Large | 0.617 | |
| Distance | standard_llm vs memory_llm | -11.2 | 6.71 | Large | 0.333 | |
| Share | standard_llm vs memory_llm | 0.3 | -0.03 | Negligible | 1.000 | |
| Ghetto Rate | standard_llm vs memory_llm | -53.8 | 7.00 | Large | 0.221 | |
| Note: * indicates p < 0.05 |
Time Series Evolution
# Create time series plot
time_evolution <- time_series_data %>%
group_by(step, agent_type) %>%
summarise(
mean_share = mean(share),
se_share = sd(share) / sqrt(n()),
.groups = "drop"
) %>%
ggplot(aes(x = step, y = mean_share, color = agent_type)) +
geom_line(linewidth = 1.2) +
geom_ribbon(aes(ymin = mean_share - se_share,
ymax = mean_share + se_share,
fill = agent_type),
alpha = 0.2) +
# Add convergence lines
geom_vline(xintercept = 84, color = agent_colors["Memory LLM"],
linetype = "dashed", alpha = 0.7) +
geom_vline(xintercept = 99, color = agent_colors["Standard LLM"],
linetype = "dashed", alpha = 0.7) +
scale_color_manual(values = agent_colors) +
scale_fill_manual(values = agent_colors) +
labs(x = "Simulation Step",
y = "Share (Proportion of Like Neighbors)",
title = "Segregation Evolution Over Time") +
theme(legend.title = element_blank()) +
coord_cartesian(xlim = c(0, 200))
time_evolutionDiscussion
Key Findings
Our study reveals three major insights about incorporating LLM-based decision-making into agent-based models:
Convergence Efficiency: LLM agents achieve stable residential patterns significantly faster than mechanical agents. The 2.2× speed improvement for memory-enhanced LLMs suggests that human-like decision-making may actually be more efficient at reaching equilibrium states in social systems.
Segregation Outcomes: Despite different decision mechanisms, all agent types converged to similar segregation levels (~55-58% like neighbors). This supports Schelling’s original insight that segregation emerges from mild preferences, regardless of the specific decision process.
Memory Effects: Persistent memory reduced extreme segregation (“ghetto” formation) by 53.8% and accelerated convergence by 15% compared to memoryless LLM agents. This suggests that relationship history and social ties play a stabilizing role in residential dynamics.
Implications for Agent-Based Modeling
The successful integration of LLMs into the Schelling model opens new possibilities for ABM:
- Behavioral Realism: LLMs can capture nuanced decision-making that reflects cultural context, personal history, and social relationships
- Emergent Behaviors: Human-like agents may produce unexpected emergent patterns not captured by utility maximization
- Policy Testing: More realistic agents enable better prediction of policy interventions’ effects
Computational Considerations
comp_data <- data.frame(
`Agent Type` = c("Mechanical", "Standard LLM", "Memory LLM"),
`Avg Time/Step (s)` = c(0.02, 19.3, 19.3),
`API Calls/Step` = c(0, 50, 50),
`Memory Requirements` = c("Minimal", "Moderate", "High"),
`Scalability` = c("Excellent", "Limited", "Limited")
)
kable(comp_data, booktabs = TRUE) %>%
kable_styling(latex_options = "striped")| Agent.Type | Avg.Time.Step..s. | API.Calls.Step | Memory.Requirements | Scalability |
|---|---|---|---|---|
| Mechanical | 0.02 | 0 | Minimal | Excellent |
| Standard LLM | 19.30 | 50 | Moderate | Limited |
| Memory LLM | 19.30 | 50 | High | Limited |
While LLM agents provide behavioral realism, they come with computational costs. Each step requires ~50 LLM API calls (one per agent), resulting in ~1000× slower execution than mechanical agents. Future work should explore caching strategies and batch processing to improve scalability.
Limitations and Future Work
Several limitations warrant consideration:
- Sample Size: With only 2 runs per condition, statistical power is limited
- Single Context: While our framework supports multiple social contexts (race, income, political affiliation), this paper focuses on the baseline (red/blue) scenario to establish proof-of-concept
- Grid Size: Results may differ for larger neighborhoods or different densities (our 15×15 grid with 22.2% density)
- LLM Variability: Results may depend on the specific LLM used (Mixtral:8x22b) and could vary with different model architectures or training approaches
Future research directions include: - Social Context Analysis: Systematic comparison across racial, economic, and political scenarios to understand how cultural contexts affect segregation dynamics - Scale Effects: Testing with realistic city sizes and varying population densities - Multi-factor Models: Incorporating multiple social identities simultaneously (e.g., race + income) - LLM Architecture Studies: Comparing different language models and prompting strategies - Hybrid Approaches: Developing computationally efficient models that balance LLM realism with mechanical agent scalability - Longitudinal Validation: Comparing model predictions with real-world residential mobility data
Conclusion
This study demonstrates that Large Language Models can successfully replace traditional utility-maximizing agents in agent-based models, providing more realistic behavioral dynamics while maintaining the essential insights of classical models. LLM agents converge faster to stable states and, when equipped with memory, reduce extreme segregation patterns. These findings suggest that the integration of AI language models into agent-based modeling represents a promising direction for studying complex social systems.
The ability to simulate human-like decision-making at scale opens new avenues for policy analysis, urban planning, and social science research. As LLM technology continues to advance and computational costs decrease, we anticipate that hybrid human-AI agent models will become standard tools for understanding and predicting social phenomena.
Code and Data Availability
All code, data, and analysis scripts are available at: [repository URL]
References
Appendix: Detailed Statistical Results
# Full statistical results table
full_stats <- pairwise_data %>%
mutate(
comparison = paste(group1, "vs", group2),
metric = str_to_title(str_replace(metric, "_", " ")),
mean_diff = mean2 - mean1,
ci_lower = mean_diff - 1.96 * sqrt(std1^2 + std2^2),
ci_upper = mean_diff + 1.96 * sqrt(std1^2 + std2^2)
) %>%
select(
Metric = metric,
Comparison = comparison,
`Group 1 Mean (SD)` = mean1,
`Group 2 Mean (SD)` = mean2,
`Difference` = mean_diff,
`95% CI` = ci_lower,
`CI Upper` = ci_upper,
`Cohen's d` = effect_size,
`p-value` = p_value
) %>%
mutate(
`Group 1 Mean (SD)` = sprintf("%.3f (%.3f)", `Group 1 Mean (SD)`,
pairwise_data$std1),
`Group 2 Mean (SD)` = sprintf("%.3f (%.3f)", `Group 2 Mean (SD)`,
pairwise_data$std2),
`95% CI` = sprintf("[%.3f, %.3f]", `95% CI`, `CI Upper`),
`Cohen's d` = round(`Cohen's d`, 3),
`p-value` = round(`p-value`, 3)
) %>%
select(-`CI Upper`)
kable(full_stats, booktabs = TRUE) %>%
kable_styling(latex_options = c("striped", "scale_down")) %>%
landscape()| Metric | Comparison | Group 1 Mean (SD) | Group 2 Mean (SD) | Difference | 95% CI | Cohen's d | p-value |
|---|---|---|---|---|---|---|---|
| Clusters | mechanical_baseline vs standard_llm | 15.500 (17.678) | 13.000 (2.828) | -2.5000000 | [-37.589, 32.589] | 0.197 | 1.000 |
| Distance | mechanical_baseline vs standard_llm | 1.420 (0.198) | 1.340 (0.028) | -0.0800000 | [-0.472, 0.312] | 0.566 | 1.000 |
| Mix Deviation | mechanical_baseline vs standard_llm | 0.164 (0.069) | 0.222 (0.016) | 0.0571667 | [-0.081, 0.195] | -1.146 | 0.667 |
| Share | mechanical_baseline vs standard_llm | 0.583 (0.131) | 0.553 (0.059) | -0.0307233 | [-0.313, 0.252] | 0.301 | 1.000 |
| Ghetto Rate | mechanical_baseline vs standard_llm | 6.500 (4.950) | 6.500 (0.707) | 0.0000000 | [-9.800, 9.800] | 0.000 | 1.000 |
| Clusters | mechanical_baseline vs memory_llm | 15.500 (17.678) | 12.500 (2.121) | -3.0000000 | [-37.897, 31.897] | 0.238 | 1.000 |
| Distance | mechanical_baseline vs memory_llm | 1.420 (0.198) | 1.190 (0.014) | -0.2300000 | [-0.619, 0.159] | 1.639 | 0.333 |
| Mix Deviation | mechanical_baseline vs memory_llm | 0.164 (0.069) | 0.204 (0.005) | 0.0398690 | [-0.095, 0.175] | -0.818 | 1.000 |
| Share | mechanical_baseline vs memory_llm | 0.583 (0.131) | 0.554 (0.040) | -0.0291812 | [-0.299, 0.240] | 0.300 | 1.000 |
| Ghetto Rate | mechanical_baseline vs memory_llm | 6.500 (4.950) | 3.000 (0.000) | -3.5000000 | [-13.202, 6.202] | 1.000 | 0.617 |
| Clusters | standard_llm vs memory_llm | 13.000 (2.828) | 12.500 (2.121) | -0.5000000 | [-7.430, 6.430] | 0.200 | 1.000 |
| Distance | standard_llm vs memory_llm | 1.340 (0.028) | 1.190 (0.014) | -0.1500000 | [-0.212, -0.088] | 6.708 | 0.333 |
| Mix Deviation | standard_llm vs memory_llm | 0.222 (0.016) | 0.204 (0.005) | -0.0172976 | [-0.050, 0.015] | 1.465 | 0.333 |
| Share | standard_llm vs memory_llm | 0.553 (0.059) | 0.554 (0.040) | 0.0015421 | [-0.139, 0.142] | -0.030 | 1.000 |
| Ghetto Rate | standard_llm vs memory_llm | 6.500 (0.707) | 3.000 (0.000) | -3.5000000 | [-4.886, -2.114] | 7.000 | 0.221 |
Social Context vs. Nominal Measures
Unlike traditional Schelling models that use abstract “red/blue” or “Type A/Type B” labels, our LLM implementation enables testing with realistic social contexts that carry cultural meaning and implicit associations. We implemented several social scenarios:
Baseline Control: Generic “red vs blue” teams without social connotations, serving as a neutral control condition.
Racial Context: “White middle-class families” vs “Black families” - capturing historical patterns of residential segregation with embedded cultural associations about neighborhood preferences, school quality concerns, and social comfort.
Economic Context: “High-income professionals” vs “working-class families” - exploring how economic segregation emerges from preferences about property values, amenities, and social status.
Political Context: “Liberal households” vs “Conservative households” - investigating ideological clustering and how political identity affects residential choices.
These contexts enable the LLM to draw upon cultural knowledge embedded in training data, producing more realistic responses than arbitrary labels. For example, when prompted as a “White middle-class family,” the LLM may express concerns about school quality or property values that wouldn’t emerge from “Type A” framing.