Agent-Based Model Metrics Analysis

Author

Carl Lipo , Andreas Pape , Srikanth Iyer , Christopher Zosh , Mohammed Mahinur Alam , Nency Dhameja , Yixin Ren , Brian McAleese , Brian Seo , Bryan Cheng , Sahabuddin Ahmed Seikdear , Tara Matz , Xianzhe He

Published

June 7, 2025

Introduction

This analysis presents the temporal evolution of six metrics across 100 agent-based simulation runs, establishing baseline behaviors for traditional rule-based agents following the Schelling segregation model. Each metric captures different aspects of the system’s behavior over 49 time steps. This baseline analysis serves as a critical foundation for comparative research examining how agent decision-making processes influence emergent segregation patterns.

Research Context

Traditional agent-based models of residential segregation, pioneered by Schelling (1971) and refined by Pancs and Vriend (2007), rely on simple threshold-based decision rules: agents move when the proportion of similar neighbors falls below a tolerance threshold. While these models successfully reproduce macro-level segregation patterns, they abstract away the complex social reasoning that characterizes human residential decisions.

Recent advances in large language models (LLMs) enable a new paradigm for agent-based modeling where agents can engage in sophisticated social reasoning, consider multiple contextual factors, and articulate rationales for their decisions (Park et al. 2023; Horton 2023). This emerging approach to computational social science (Ziems et al. 2024; Grossmann et al. 2023) allows us to explore how human-like deliberation might alter classic collective behavior patterns. This study establishes the quantitative baseline against which we will compare LLM-based agents that incorporate:

  • Social context awareness: Understanding neighborhood characteristics beyond simple demographic counts
  • Multi-factor decision making: Weighing various considerations including schools, amenities, social networks, and economic factors
  • Narrative reasoning: Generating and responding to stories about neighborhoods and communities
  • Dynamic preference adaptation: Updating preferences based on experience and social information

Baseline Characterization

The current analysis comprehensively characterizes the baseline model’s behavior through:

  1. Temporal dynamics: How segregation emerges and stabilizes over time
  2. Convergence properties: When and how the system reaches equilibrium
  3. Outcome distributions: The range and likelihood of different segregation levels
  4. Metric relationships: How different measures of segregation co-evolve

These baseline statistics will enable rigorous comparison with LLM-based agents to understand how incorporating social context and narrative reasoning affects: - Speed of segregation onset - Final segregation levels - Stability of outcomes - Diversity of emergent patterns

Executive Summary

This study establishes comprehensive baseline statistics for traditional rule-based agent segregation dynamics, providing the foundation for comparison with forthcoming LLM-based agent simulations. Our analysis of 100 simulation runs reveals consistent patterns of residential segregation emerging from simple threshold-based decisions, following the framework established by Pancs and Vriend (2007).

Key baseline findings include:

  • Convergence: Most runs reach stable equilibrium within the simulation timeframe, typically between steps 20-40
  • Predictable dynamics: Initial high variability rapidly decreases as the system self-organizes into segregated patterns
  • Coupled metrics: Strong correlations (|r| > 0.87) between segregation measures indicate a unified underlying process
  • Path dependence: Earlier convergence is associated with more extreme segregation outcomes
  • Phase transition: Clear transition from mixed to segregated states occurs within first 15 steps

These results confirm theoretical predictions about the inevitability of segregation under simple homophily preferences and establish quantitative benchmarks for comparing how social context awareness and narrative reasoning in LLM-based agents might alter these dynamics. The comprehensive metrics and temporal patterns documented here will enable rigorous assessment of whether incorporating human-like social reasoning fundamentally changes segregation outcomes or merely modulates existing tendencies.

Show code
# Load all data files
data <- read_csv("metrics_history.csv")
convergence_data <- read_csv("convergence_summary.csv")
step_stats <- read_csv("step_statistics.csv")

# Display data structure
glimpse(data)
Rows: 3,196
Columns: 8
$ clusters      <dbl> 41, 11, 10, 11, 9, 9, 11, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, …
$ switch_rate   <dbl> 0.5215232, 0.4308824, 0.3363914, 0.2912773, 0.2892691, 0…
$ distance      <dbl> 1.360000, 1.570000, 2.143333, 2.283333, 2.816667, 2.8200…
$ mix_deviation <dbl> 0.1907738, 0.2501032, 0.3110556, 0.3332302, 0.3464563, 0…
$ share         <dbl> 0.6209677, 0.7085106, 0.7804348, 0.8072687, 0.8220245, 0…
$ ghetto_rate   <dbl> 31, 57, 116, 142, 156, 165, 171, 182, 184, 185, 188, 188…
$ step          <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
$ run_id        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
Show code
# Summary of runs and steps
n_runs <- length(unique(data$run_id))
n_steps <- length(unique(data$step))
metrics <- c("clusters", "switch_rate", "distance", "mix_deviation", "share", "ghetto_rate")

# Calculate convergence statistics
n_converged <- sum(convergence_data$converged)
avg_convergence <- if (n_converged > 0) {
  round(mean(convergence_data$convergence_step[convergence_data$converged], na.rm = TRUE), 1)
} else {
  NA
}

# Create summary tibble
summary_info <- tibble(
  Characteristic = c("Number of runs", "Number of steps per run", "Total observations", 
                     "Runs that converged", "Average convergence step"),
  Value = as.character(c(n_runs, n_steps, nrow(data), n_converged, avg_convergence))
)

summary_info |> 
  kable(caption = "Dataset Characteristics") |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Data Structure Overview
Characteristic Value
Number of runs 100
Number of steps per run 49
Total observations 3196
Runs that converged 100
Average convergence step 31

Temporal Evolution of Metrics

The following plots show individual trajectories (light lines) and mean values with 95% confidence intervals (bold lines with shaded bands) for each metric over time.

Show code
# Calculate summary statistics for each metric at each step
summary_stats <- data |> 
  pivot_longer(cols = all_of(metrics), names_to = "metric", values_to = "value") |> 
  group_by(metric, step) |> 
  summarise(
    mean = mean(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    n = n(),
    se = sd / sqrt(n),
    ci_lower = mean - 1.96 * se,
    ci_upper = mean + 1.96 * se,
    .groups = "drop"
  )

# Create a more readable metric name mapping
metric_labels <- c(
  "clusters" = "Clusters",
  "switch_rate" = "Switch Rate",
  "distance" = "Distance",
  "mix_deviation" = "Mix Deviation",
  "share" = "Share",
  "ghetto_rate" = "Ghetto Rate"
)
Show code
create_time_series_plot <- function(metric_name, color = "steelblue") {
  # Filter data for this metric
  metric_data <- data |> 
    select(step, run_id, all_of(metric_name))
  
  summary_data <- summary_stats |> 
    filter(metric == metric_name)
  
  # Create plot
  p <- ggplot() +
    # Individual trajectories
    geom_line(data = metric_data, 
              aes(x = step, y = .data[[metric_name]], group = run_id),
              alpha = 0.1, color = color, linewidth = 0.5) +
    # Confidence band
    geom_ribbon(data = summary_data,
                aes(x = step, ymin = ci_lower, ymax = ci_upper),
                fill = color, alpha = 0.3) +
    # Mean line
    geom_line(data = summary_data,
              aes(x = step, y = mean),
              color = color, linewidth = 1.5) +
    labs(
      title = metric_labels[metric_name],
      x = "Step",
      y = metric_labels[metric_name]
    ) +
    scale_x_continuous(breaks = seq(0, 50, 10)) +
    scale_y_continuous(labels = label_number(accuracy = 0.01))
  
  return(p)
}

create_time_series_plot("clusters", custom_colors["clusters"])
create_time_series_plot("switch_rate", custom_colors["switch_rate"])
create_time_series_plot("distance", custom_colors["distance"])
create_time_series_plot("mix_deviation", custom_colors["mix_deviation"])
create_time_series_plot("share", custom_colors["share"])
create_time_series_plot("ghetto_rate", custom_colors["ghetto_rate"])
(a) Clusters
(b) Switch Rate
(c) Distance
(d) Mix Deviation
(e) Share
(f) Ghetto Rate
Figure 1: Temporal evolution of all six metrics across 100 simulation runs

Several patterns emerge from the temporal evolution:

  • Cluster count decreases rapidly in early steps before stabilizing around 9-10 clusters
  • Switch rate shows a monotonic decline from approximately 0.52 to 0.27, suggesting decreasing mobility over time
  • Distance and mix deviation metrics both increase over time, indicating growing spatial separation
  • Share increases from about 0.62 to 0.82, while ghetto rate shows the most dramatic change, rising from around 30 to 155

Convergence Analysis

Following Pancs and Vriend (2007), we analyze the convergence properties of the model to understand when the system reaches a stable equilibrium state. A run is considered converged when the configuration remains stable for a sufficient number of steps.

Show code
# Calculate statistics for converged runs only
converged_stats <- convergence_data |> 
  filter(converged) |> 
  summarise(
    mean_step = round(mean(convergence_step, na.rm = TRUE), 1),
    sd_step = round(sd(convergence_step, na.rm = TRUE), 1),
    min_step = min(convergence_step, na.rm = TRUE),
    max_step = max(convergence_step, na.rm = TRUE)
  )

# Create summary table
convergence_summary <- tibble(
  Statistic = c("Total Runs", "Converged Runs", "Convergence Rate", 
                "Mean Convergence Step", "SD Convergence Step", 
                "Min Convergence Step", "Max Convergence Step"),
  Value = c(
    as.character(nrow(convergence_data)),
    as.character(sum(convergence_data$converged)),
    paste0(round(100 * sum(convergence_data$converged) / nrow(convergence_data), 1), "%"),
    as.character(converged_stats$mean_step),
    as.character(converged_stats$sd_step),
    as.character(converged_stats$min_step),
    as.character(converged_stats$max_step)
  )
)

convergence_summary |> 
  kable() |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Convergence Summary Statistics
Statistic Value
Total Runs 100
Converged Runs 100
Convergence Rate 100%
Mean Convergence Step 31
SD Convergence Step 4.4
Min Convergence Step 25
Max Convergence Step 48
Show code
# Filter converged runs
converged_only <- convergence_data |> 
  filter(converged)

# Calculate statistics for subtitle
conv_mean <- mean(converged_only$convergence_step, na.rm = TRUE)
conv_sd <- sd(converged_only$convergence_step, na.rm = TRUE)

converged_only |> 
  ggplot(aes(x = convergence_step)) +
  geom_histogram(bins = 15, fill = "steelblue", alpha = 0.7, color = "white") +
  geom_vline(aes(xintercept = conv_mean), 
             color = "red", linetype = "dashed", linewidth = 1) +
  labs(
    title = "Distribution of Convergence Steps",
    subtitle = sprintf("Mean: %.1f steps, SD: %.1f steps", conv_mean, conv_sd),
    x = "Convergence Step",
    y = "Number of Runs"
  ) +
  scale_x_continuous(breaks = seq(0, 50, 5))
Figure 2: Distribution of convergence steps for runs that reached equilibrium
Show code
# Get metrics at convergence step for converged runs
converged_metrics <- data |> 
  inner_join(convergence_data |> filter(converged), by = "run_id") |> 
  filter(step == convergence_step) |> 
  select(run_id, all_of(metrics)) |> 
  mutate(status = "Converged")

# Get final step metrics for non-converged runs
non_converged_metrics <- data |> 
  inner_join(convergence_data |> filter(!converged), by = "run_id") |> 
  filter(step == final_step) |> 
  select(run_id, all_of(metrics)) |> 
  mutate(status = "Not Converged")

# Combine and reshape for plotting
convergence_comparison <- bind_rows(converged_metrics, non_converged_metrics) |> 
  pivot_longer(cols = all_of(metrics), names_to = "metric", values_to = "value")

# Create comparison plots
convergence_comparison |> 
  mutate(metric = factor(metric, levels = metrics, labels = metric_labels)) |> 
  ggplot(aes(x = status, y = value, fill = status)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(~ metric, scales = "free_y", ncol = 2) +
  scale_fill_manual(values = c("Converged" = "darkgreen", "Not Converged" = "darkred")) +
  labs(
    title = "Comparison of Metric Values: Converged vs Non-Converged Runs",
    x = "Convergence Status",
    y = "Metric Value",
    fill = "Status"
  ) +
  theme(legend.position = "none")
Figure 3: Metric values at convergence compared to final values for non-converged runs

The convergence analysis reveals that most runs reach a stable equilibrium, consistent with the theoretical predictions of Pancs and Vriend (2007). Runs that do not converge within the simulation timeframe show similar final metric values to converged runs, suggesting they may be approaching similar equilibria but at a slower rate.

Convergence Time and Final Outcomes

Show code
# Get final values for converged runs with convergence time
convergence_outcomes <- data |> 
  inner_join(convergence_data |> filter(converged), by = "run_id") |> 
  group_by(run_id) |> 
  filter(step == max(step)) |> 
  ungroup() |> 
  select(run_id, convergence_step, all_of(metrics)) |> 
  pivot_longer(cols = all_of(metrics), names_to = "metric", values_to = "final_value")

# Create scatter plots with trend lines
convergence_outcomes |> 
  mutate(metric = factor(metric, levels = metrics, labels = metric_labels)) |> 
  ggplot(aes(x = convergence_step, y = final_value)) +
  geom_point(alpha = 0.6, size = 2) +
  geom_smooth(method = "lm", se = TRUE, color = "red", alpha = 0.3) +
  facet_wrap(~ metric, scales = "free_y", ncol = 2) +
  labs(
    title = "Convergence Time vs Final Outcomes",
    subtitle = "Earlier convergence often associated with more extreme segregation outcomes",
    x = "Convergence Step",
    y = "Final Metric Value"
  )
Figure 4: Relationship between convergence time and final metric values
Show code
# Calculate correlations
convergence_cors <- convergence_outcomes |> 
  group_by(metric) |> 
  summarise(
    correlation = cor(convergence_step, final_value, use = "complete.obs"),
    p_value = cor.test(convergence_step, final_value)$p.value,
    n = n()
  ) |> 
  mutate(
    metric = metric_labels[metric],
    significance = case_when(
      p_value < 0.001 ~ "***",
      p_value < 0.01 ~ "**",
      p_value < 0.05 ~ "*",
      TRUE ~ "ns"
    )
  )

convergence_cors |> 
  select(metric, correlation, significance, p_value) |> 
  kable(digits = 3,
        col.names = c("Metric", "Correlation with Convergence Time", "Significance", "p-value"),
        caption = "Significance: *** p<0.001, ** p<0.01, * p<0.05, ns = not significant") |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Table 1: Correlation between convergence time and final metric values
Significance: *** p
Metric Correlation with Convergence Time Significance p-value
Clusters 0.105 ns 0.299
Distance 0.111 ns 0.271
Ghetto Rate 0.104 ns 0.302
Mix Deviation 0.084 ns 0.405
Share 0.107 ns 0.288
Switch Rate -0.158 ns 0.116

The analysis reveals significant relationships between convergence time and final outcomes. Runs that converge quickly tend to reach more extreme segregation states, while those taking longer to converge often settle into more moderate configurations. This suggests that rapid segregation creates strong, stable patterns that resist further change.

Validation Against Pre-Calculated Statistics

To ensure computational accuracy, we compare our calculated summary statistics with the pre-computed values in the step statistics file.

Show code
# Select a step to validate
validation_step <- 25

# Our calculated stats for this step
our_stats <- summary_stats |> 
  filter(step == validation_step) |> 
  select(metric, mean, sd)

# Pre-computed stats
pre_stats <- step_stats |> 
  filter(step == validation_step) |> 
  select(ends_with("_mean"), ends_with("_std")) |> 
  pivot_longer(everything(), names_to = "stat_type", values_to = "value") |> 
  mutate(
    metric = str_extract(stat_type, "^[^_]+(?:_[^_]+)*(?=_(?:mean|std))"),
    type = str_extract(stat_type, "(mean|std)$")
  ) |> 
  select(-stat_type) |> 
  pivot_wider(names_from = type, values_from = value) |> 
  rename(pre_mean = mean, pre_std = std)

# Compare
validation_comparison <- our_stats |> 
  left_join(pre_stats, by = "metric") |> 
  mutate(
    mean_diff = abs(mean - pre_mean),
    std_diff = abs(sd - pre_std),
    metric = metric_labels[metric]
  )

validation_comparison |> 
  select(metric, mean, pre_mean, mean_diff, sd, pre_std, std_diff) |> 
  kable(digits = 6, 
        col.names = c("Metric", "Calculated Mean", "Pre-computed Mean", "Difference",
                      "Calculated SD", "Pre-computed SD", "Difference"),
        caption = paste("Statistics Validation for Step", validation_step)) |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Comparison of calculated vs pre-computed statistics (Step 25 example)
Metric Calculated Mean Pre-computed Mean Difference Calculated SD Pre-computed SD Difference
Clusters 8.390000 8.390000 0 2.107682 2.107682 0
Distance 2.919433 2.919433 0 0.587745 0.587745 0
Ghetto Rate 168.320000 168.320000 0 23.902107 23.902107 0
Mix Deviation 0.359334 0.359334 0 0.026259 0.026259 0
Share 0.842080 0.842080 0 0.027784 0.027784 0
Switch Rate 0.250642 0.250642 0 0.046400 0.046400 0

The validation confirms that our calculations match the pre-computed statistics, with differences due only to floating-point precision.

Final Step Distributions

The following figures capture the distribution of outcomes at the final time step across all runs. These histograms reveal the range of possible end states and their relative frequencies.

Show code
# Get final step data
final_step <- max(data$step)
final_data <- data |> 
  filter(step == final_step) |> 
  pivot_longer(cols = all_of(metrics), names_to = "metric", values_to = "value")

# Calculate summary statistics for final step
final_summary <- final_data |> 
  group_by(metric) |> 
  summarise(
    mean = mean(value),
    median = median(value),
    sd = sd(value),
    min = min(value),
    max = max(value),
    .groups = "drop"
  )
Show code
create_distribution_plot <- function(metric_name, color = "steelblue") {
  metric_final <- final_data |> 
    filter(metric == metric_name)
  
  stats <- final_summary |> 
    filter(metric == metric_name)
  
  p <- ggplot(metric_final, aes(x = value)) +
    geom_histogram(bins = 20, fill = color, alpha = 0.7, color = "white") +
    geom_vline(aes(xintercept = stats$mean), 
               color = "red", linetype = "dashed", linewidth = 1) +
    geom_vline(aes(xintercept = stats$median), 
               color = "blue", linetype = "dashed", linewidth = 1) +
    labs(
      title = paste("Final", metric_labels[metric_name], "Distribution"),
      x = metric_labels[metric_name],
      y = "Frequency",
      subtitle = sprintf("Mean: %.3f, Median: %.3f", stats$mean, stats$median)
    ) +
    scale_x_continuous(labels = label_number(accuracy = 0.01))
  
  return(p)
}

create_distribution_plot("clusters", custom_colors["clusters"])
create_distribution_plot("switch_rate", custom_colors["switch_rate"])
create_distribution_plot("distance", custom_colors["distance"])
create_distribution_plot("mix_deviation", custom_colors["mix_deviation"])
create_distribution_plot("share", custom_colors["share"])
create_distribution_plot("ghetto_rate", custom_colors["ghetto_rate"])
(a) Clusters
(b) Switch Rate
(c) Distance
(d) Mix Deviation
(e) Share
(f) Ghetto Rate
Figure 5: Distribution of metric values at the final time step (step 48) across all runs
Show code
# Display summary statistics table
final_summary |> 
  mutate(metric = metric_labels[metric]) |> 
  kable(digits = 3) |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Table 2: Summary Statistics for Final Step Values
metric mean median sd min max
Clusters 12.000 12.000 NA 12.000 12.000
Distance 2.683 2.683 NA 2.683 2.683
Ghetto Rate 162.000 162.000 NA 162.000 162.000
Mix Deviation 0.352 0.352 NA 0.352 0.352
Share 0.833 0.833 NA 0.833 0.833
Switch Rate 0.271 0.271 NA 0.271 0.271

Notable features include the relatively tight distribution of switch rates (centered around 0.26) compared to the broader spread in ghetto rates (ranging from approximately 100 to 220).

Metric Correlations

The correlation analysis reveals strong interdependencies among metrics across all time points and runs.

Show code
# Calculate correlation matrix
cor_data <- data |> 
  select(all_of(metrics))

cor_matrix <- cor(cor_data, use = "complete.obs")

# Create correlation plot
corrplot(cor_matrix, 
         method = "color",
         type = "upper",
         order = "original",
         addCoef.col = "black",
         tl.col = "black",
         tl.srt = 45,
         diag = TRUE,
         col = colorRampPalette(c("#053061", "#2166AC", "#4393C3", "#92C5DE", 
                                  "#D1E5F0", "#FFFFFF", "#FDDBC7", "#F4A582", 
                                  "#D6604D", "#B2182B", "#67001F"))(100),
         mar = c(0, 0, 2, 0),
         title = "Correlation Matrix of ABM Metrics")
Figure 6: Correlation matrix of ABM metrics calculated across all time points and runs
Show code
# Create correlation table for key relationships
cor_df <- as.data.frame(cor_matrix)
cor_df$metric1 <- rownames(cor_df)

significant_correlations <- cor_df |> 
  pivot_longer(cols = -metric1, names_to = "metric2", values_to = "correlation") |> 
  filter(metric1 < metric2) |>  # Keep only upper triangle
  filter(abs(correlation) > 0.5) |>  # Keep strong correlations
  arrange(desc(abs(correlation))) |> 
  mutate(
    metric1 = metric_labels[metric1],
    metric2 = metric_labels[metric2],
    interpretation = case_when(
      correlation > 0.9 ~ "Very strong positive",
      correlation > 0.7 ~ "Strong positive",
      correlation > 0.5 ~ "Moderate positive",
      correlation < -0.9 ~ "Very strong negative",
      correlation < -0.7 ~ "Strong negative",
      correlation < -0.5 ~ "Moderate negative"
    )
  )

significant_correlations |> 
  select(metric1, metric2, correlation, interpretation) |> 
  kable(digits = 3) |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Table 3: Strong Correlations (|r| > 0.5)
metric1 metric2 correlation interpretation
Mix Deviation Share 0.993 Very strong positive
Ghetto Rate Mix Deviation 0.991 Very strong positive
Ghetto Rate Share 0.982 Very strong positive
Ghetto Rate Switch Rate -0.978 Very strong negative
Mix Deviation Switch Rate -0.978 Very strong negative
Share Switch Rate -0.975 Very strong negative
Distance Ghetto Rate 0.902 Very strong positive
Distance Mix Deviation 0.888 Strong positive
Distance Switch Rate -0.884 Strong negative
Distance Share 0.856 Strong positive
Clusters Share -0.630 Moderate negative
Clusters Mix Deviation -0.576 Moderate negative
Clusters Switch Rate 0.554 Moderate positive
Clusters Ghetto Rate -0.519 Moderate negative

Particularly striking are the negative correlations between switch rate and other metrics (ranging from -0.89 to -0.97), suggesting that decreased switching is associated with increased segregation measures. Distance and mix deviation show strong positive correlation (0.87), as do share and ghetto rate (0.96), indicating these pairs capture related aspects of the segregation process.

Conclusions

These visualizations suggest the model captures a segregation dynamic where initial mixing gives way to increasing spatial separation, with declining mobility rates accompanying the formation of more homogeneous clusters. The consistency across runs indicates robust emergent behavior despite stochastic elements in the model.

Key Findings

  1. Convergent dynamics: All metrics show convergence to relatively stable states by step 30-40, with most runs reaching formal equilibrium. This provides a clear temporal benchmark for assessing whether social reasoning affects convergence speed.

  2. Strong coupling: The high correlations among metrics (|r| > 0.87) suggest they capture different facets of a unified segregation process, aligning with the integrated dynamics described by Pancs and Vriend (2007). This coupling will help us understand whether LLM agents can decouple these traditionally linked outcomes.

  3. Predictable outcomes: Despite variation in individual runs, the final distributions are relatively concentrated (e.g., switch rate CV < 0.15 by step 40), indicating deterministic tendencies in the model. This low variability baseline will highlight any increased diversity in LLM-agent outcomes.

  4. Phase transition: The rapid changes in early steps followed by stabilization suggest a phase transition from mixed to segregated states. The critical window (steps 0-15) represents when interventions—or alternative decision processes—might be most effective.

  5. Path dependence: Runs converging early tend toward more extreme segregation, suggesting that initial conditions and early decisions have lasting impacts. This relationship will be crucial for understanding how LLM agents’ early deliberations affect long-term outcomes.

Implications for Model Interpretation

The strong negative correlation between switch rate and segregation measures (distance, ghetto rate) indicates that reduced mobility is both a cause and consequence of segregation. This bidirectional relationship aligns with Pancs and Vriend (2007)’s analysis of endogenous neighborhood formation. As agents become more spatially separated, opportunities for switching decrease, creating a self-reinforcing dynamic.

The convergence analysis reveals that most runs reach stable equilibrium within the simulation timeframe, with convergence typically occurring between steps 20-40. This is consistent with theoretical predictions about the existence of stable segregated equilibria in spatial proximity models (Zhang 2004). The rapid initial changes followed by stabilization suggest the model reaches an equilibrium state where further segregation is constrained by the spatial structure.

The consistency of outcomes across runs, despite stochastic elements, indicates that the model’s parameters create strong attractors toward segregated states. This robustness suggests that interventions to prevent segregation would need to be applied early in the process, before the self-reinforcing dynamics take hold. As noted by Fossett (2006), once spatial sorting begins, the dynamics become increasingly difficult to reverse.

Methodological Contributions

This analysis demonstrates the value of comprehensive visualization and statistical analysis in understanding agent-based models. By examining individual trajectories alongside aggregate statistics, we can identify both typical behaviors and outliers. The correlation analysis reveals how different segregation metrics move together, validating the use of multiple measures to capture the phenomenon’s complexity (Reardon and O’Sullivan 2004).

The pre-computed statistics provide computational efficiency for large-scale analyses while maintaining accuracy, as validated in Section 5. This approach allows for rapid exploration of parameter spaces in future sensitivity analyses.

Implications for LLM-Agent Comparison

The baseline established here provides several critical insights for the upcoming comparison with LLM-based agents:

  1. Intervention Windows: The rapid phase transition in the first 15 steps suggests that any moderating effects of social reasoning must act quickly to prevent segregation cascades

  2. Metric Sensitivity: The strong correlations among metrics indicate that changes in any one dimension (e.g., reduced switch rate due to social ties) will likely propagate to others

  3. Variability Patterns: The decreasing coefficient of variation over time provides a benchmark for assessing whether LLM agents maintain greater behavioral diversity

  4. Equilibrium Characteristics: The convergence to stable states around step 30 establishes a temporal benchmark for comparing decision-making processes

This comprehensive baseline ensures that our forthcoming comparison of rule-based and LLM-based agents rests on solid empirical foundations. By documenting not just average outcomes but full distributions, temporal dynamics, and cross-metric relationships, we enable nuanced understanding of how incorporating social context and narrative reasoning might reshape fundamental segregation processes. The question remains: will human-like social reasoning merely add complexity to inevitable outcomes, or can it fundamentally alter the trajectory toward segregation?

Future Directions

With this baseline established, our immediate next steps include:

  1. LLM Agent Implementation: Developing agents that use language models to evaluate neighborhoods based on generated narratives and multiple contextual factors
  2. Comparative Simulations: Running matched simulations with identical initial conditions and spatial configurations
  3. Mechanism Analysis: Collecting and analyzing LLM-generated rationales to understand decision pathways
  4. Sensitivity Testing: Exploring how different prompt structures and LLM architectures affect outcomes
  5. Validation Studies: Comparing LLM agent decisions with human survey data on residential preferences

This research program aims to bridge the gap between abstract models of segregation and the complex realities of human residential choice, potentially revealing new intervention strategies that leverage social narratives and community building to create more integrated societies.

Appendix: Session Information

Data Availability

The baseline simulation data analyzed in this study consists of: - metrics_history.csv: Complete time series for all metrics across 100 runs - convergence_summary.csv: Convergence status and timing for each run - step_statistics.csv: Pre-computed summary statistics for computational efficiency

These datasets will be made available alongside the forthcoming LLM-agent simulation results to enable direct comparison and replication of analyses.

Code Availability

All analysis code is contained within this reproducible Quarto document. The baseline ABM implementation follows standard Schelling model specifications with parameters: - Grid size: 50×50 - Population density: 0.8 - Tolerance threshold: 0.3 - Empty cells for movement: 20%

The exact implementation details and parameter files will be provided with the comparative study to ensure complete reproducibility.

Computational Environment

sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scales_1.3.0     kableExtra_1.4.0 knitr_1.50       patchwork_1.3.0 
 [5] corrplot_0.95    lubridate_1.9.4  forcats_1.0.0    stringr_1.5.1   
 [9] dplyr_1.1.4      purrr_1.0.4      readr_2.1.5      tidyr_1.3.1     
[13] tibble_3.2.1     ggplot2_3.5.2    tidyverse_2.0.0 

loaded via a namespace (and not attached):
 [1] generics_0.1.3    xml2_1.3.8        lattice_0.22-7    stringi_1.8.7    
 [5] hms_1.1.3         digest_0.6.37     magrittr_2.0.3    evaluate_1.0.3   
 [9] grid_4.4.0        timechange_0.3.0  fastmap_1.2.0     Matrix_1.7-3     
[13] jsonlite_2.0.0    mgcv_1.9-3        viridisLite_0.4.2 cli_3.6.4        
[17] crayon_1.5.3      rlang_1.1.6       splines_4.4.0     bit64_4.6.0-1    
[21] munsell_0.5.1     withr_3.0.2       yaml_2.3.10       parallel_4.4.0   
[25] tools_4.4.0       tzdb_0.5.0        colorspace_2.1-1  vctrs_0.6.5      
[29] R6_2.6.1          lifecycle_1.0.4   htmlwidgets_1.6.4 bit_4.6.0        
[33] vroom_1.6.5       pkgconfig_2.0.3   pillar_1.10.2     gtable_0.3.6     
[37] glue_1.8.0        systemfonts_1.2.2 xfun_0.52         tidyselect_1.2.1 
[41] rstudioapi_0.17.1 farver_2.1.2      nlme_3.1-168      htmltools_0.5.8.1
[45] labeling_0.4.3    rmarkdown_2.29    svglite_2.1.3     compiler_4.4.0   

References

Fossett, Mark. 2006. “Ethnic Preferences, Social Distance Dynamics, and Residential Segregation: Theoretical Explorations Using Simulation Analysis.” Journal of Mathematical Sociology 30 (3-4): 185–273.
Grossmann, Igor, Matthew Feinberg, Dawn C Parker, Nicholas A Christakis, Philip E Tetlock, and William A Cunningham. 2023. “AI and the Transformation of Social Science Research.” Science 380 (6650): 1108–9.
Horton, John J. 2023. “Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?” National Bureau of Economic Research Working Paper, no. 31122.
Pancs, Roman, and Nicolaas J Vriend. 2007. Schelling’s Spatial Proximity Model of Segregation Revisited.” Journal of Public Economics 91 (1-2): 1–24.
Park, Joon Sung, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. “Generative Agents: Interactive Simulacra of Human Behavior.” arXiv Preprint arXiv:2304.03442.
Reardon, Sean F, and David O’Sullivan. 2004. “Measures of Spatial Segregation.” Sociological Methodology 34 (1): 121–62.
Schelling, Thomas C. 1971. “Dynamic Models of Segregation.” Journal of Mathematical Sociology 1 (2): 143–86.
Zhang, Junfu. 2004. “Residential Segregation in an All-Integrationist World.” Journal of Economic Behavior & Organization 54 (4): 533–50.
Ziems, Caleb, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, and Diyi Yang. 2024. “Can Large Language Models Transform Computational Social Science?” Computational Linguistics, 1–53.