Agent-Based Model Metrics Analysis

Author

Carl Lipo clipo@binghamton.edu, Andreas Pape apape@binghamton.edu, Srikanth Iyer siyer5@binghamton.edu, Christopher Zosh czosh1@binghamton.edu, Mohammed Mahinur Alam malam56@binghamton.edu, Nency Dhameja ndhamej1@binghamton.edu, Yixin Ren yren36@binghamton.edu, Brian McAleese bmcaleese@binghamton.edu, Brian Seo bseo2@binghamton.edu, Bryan Cheng bcheng11@binghamton.edu, Sahabuddin Ahmed Seikdear sseikdear@binghamton.edu, Tara Matz tmatz@binghamton.edu, Xianzhe He xhe6@binghamton.edu

Published

June 7, 2025

Introduction

This analysis presents the temporal evolution of six metrics across 100 agent-based simulation runs, establishing baseline behaviors for traditional rule-based agents following the Schelling segregation model. Each metric captures different aspects of the system’s behavior over 49 time steps. This baseline analysis serves as a critical foundation for comparative research examining how agent decision-making processes influence emergent segregation patterns.

Research Context

Traditional agent-based models of residential segregation, pioneered by Schelling (1971) and refined by Pancs and Vriend (2007), rely on simple threshold-based decision rules: agents move when the proportion of similar neighbors falls below a tolerance threshold. While these models successfully reproduce macro-level segregation patterns, they abstract away the complex social reasoning that characterizes human residential decisions.

Recent advances in large language models (LLMs) enable a new paradigm for agent-based modeling where agents can engage in sophisticated social reasoning, consider multiple contextual factors, and articulate rationales for their decisions (Park et al. 2023; Horton 2023). This emerging approach to computational social science (Ziems et al. 2024; Grossmann et al. 2023) allows us to explore how human-like deliberation might alter classic collective behavior patterns. This study establishes the quantitative baseline against which we will compare LLM-based agents that incorporate:

Social context awareness: Understanding neighborhood characteristics beyond simple demographic counts
Multi-factor decision making: Weighing various considerations including schools, amenities, social networks, and economic factors
Narrative reasoning: Generating and responding to stories about neighborhoods and communities
Dynamic preference adaptation: Updating preferences based on experience and social information

Baseline Characterization

The current analysis comprehensively characterizes the baseline model’s behavior through:

Temporal dynamics: How segregation emerges and stabilizes over time
Convergence properties: When and how the system reaches equilibrium
Outcome distributions: The range and likelihood of different segregation levels
Metric relationships: How different measures of segregation co-evolve

These baseline statistics will enable rigorous comparison with LLM-based agents to understand how incorporating social context and narrative reasoning affects: - Speed of segregation onset - Final segregation levels - Stability of outcomes - Diversity of emergent patterns

Executive Summary

This study establishes comprehensive baseline statistics for traditional rule-based agent segregation dynamics, providing the foundation for comparison with forthcoming LLM-based agent simulations. Our analysis of 100 simulation runs reveals consistent patterns of residential segregation emerging from simple threshold-based decisions, following the framework established by Pancs and Vriend (2007).

Key baseline findings include:

Convergence: Most runs reach stable equilibrium within the simulation timeframe, typically between steps 20-40
Predictable dynamics: Initial high variability rapidly decreases as the system self-organizes into segregated patterns
Coupled metrics: Strong correlations (|r| > 0.87) between segregation measures indicate a unified underlying process
Path dependence: Earlier convergence is associated with more extreme segregation outcomes
Phase transition: Clear transition from mixed to segregated states occurs within first 15 steps

These results confirm theoretical predictions about the inevitability of segregation under simple homophily preferences and establish quantitative benchmarks for comparing how social context awareness and narrative reasoning in LLM-based agents might alter these dynamics. The comprehensive metrics and temporal patterns documented here will enable rigorous assessment of whether incorporating human-like social reasoning fundamentally changes segregation outcomes or merely modulates existing tendencies.

Show code

# Load all data files
data <- read_csv("metrics_history.csv")
convergence_data <- read_csv("convergence_summary.csv")
step_stats <- read_csv("step_statistics.csv")

# Display data structure
glimpse(data)

Rows: 3,196
Columns: 8
$ clusters      <dbl> 41, 11, 10, 11, 9, 9, 11, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, …
$ switch_rate   <dbl> 0.5215232, 0.4308824, 0.3363914, 0.2912773, 0.2892691, 0…
$ distance      <dbl> 1.360000, 1.570000, 2.143333, 2.283333, 2.816667, 2.8200…
$ mix_deviation <dbl> 0.1907738, 0.2501032, 0.3110556, 0.3332302, 0.3464563, 0…
$ share         <dbl> 0.6209677, 0.7085106, 0.7804348, 0.8072687, 0.8220245, 0…
$ ghetto_rate   <dbl> 31, 57, 116, 142, 156, 165, 171, 182, 184, 185, 188, 188…
$ step          <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
$ run_id        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…

Show code

# Summary of runs and steps
n_runs <- length(unique(data$run_id))
n_steps <- length(unique(data$step))
metrics <- c("clusters", "switch_rate", "distance", "mix_deviation", "share", "ghetto_rate")

# Calculate convergence statistics
n_converged <- sum(convergence_data$converged)
avg_convergence <- if (n_converged > 0) {
  round(mean(convergence_data$convergence_step[convergence_data$converged], na.rm = TRUE), 1)
} else {
  NA
}

# Create summary tibble
summary_info <- tibble(
  Characteristic = c("Number of runs", "Number of steps per run", "Total observations", 
                     "Runs that converged", "Average convergence step"),
  Value = as.character(c(n_runs, n_steps, nrow(data), n_converged, avg_convergence))
)

summary_info |> 
  kable(caption = "Dataset Characteristics") |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Data Structure Overview
Characteristic	Value
Number of runs	100
Number of steps per run	49
Total observations	3196
Runs that converged	100
Average convergence step	31

Temporal Evolution of Metrics

The following plots show individual trajectories (light lines) and mean values with 95% confidence intervals (bold lines with shaded bands) for each metric over time.

Show code

# Calculate summary statistics for each metric at each step
summary_stats <- data |> 
  pivot_longer(cols = all_of(metrics), names_to = "metric", values_to = "value") |> 
  group_by(metric, step) |> 
  summarise(
    mean = mean(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    n = n(),
    se = sd / sqrt(n),
    ci_lower = mean - 1.96 * se,
    ci_upper = mean + 1.96 * se,
    .groups = "drop"
  )

# Create a more readable metric name mapping
metric_labels <- c(
  "clusters" = "Clusters",
  "switch_rate" = "Switch Rate",
  "distance" = "Distance",
  "mix_deviation" = "Mix Deviation",
  "share" = "Share",
  "ghetto_rate" = "Ghetto Rate"
)

Show code

create_time_series_plot <- function(metric_name, color = "steelblue") {
  # Filter data for this metric
  metric_data <- data |> 
    select(step, run_id, all_of(metric_name))
  
  summary_data <- summary_stats |> 
    filter(metric == metric_name)
  
  # Create plot
  p <- ggplot() +
    # Individual trajectories
    geom_line(data = metric_data, 
              aes(x = step, y = .data[[metric_name]], group = run_id),
              alpha = 0.1, color = color, linewidth = 0.5) +
    # Confidence band
    geom_ribbon(data = summary_data,
                aes(x = step, ymin = ci_lower, ymax = ci_upper),
                fill = color, alpha = 0.3) +
    # Mean line
    geom_line(data = summary_data,
              aes(x = step, y = mean),
              color = color, linewidth = 1.5) +
    labs(
      title = metric_labels[metric_name],
      x = "Step",
      y = metric_labels[metric_name]
    ) +
    scale_x_continuous(breaks = seq(0, 50, 10)) +
    scale_y_continuous(labels = label_number(accuracy = 0.01))
  
  return(p)
}

create_time_series_plot("clusters", custom_colors["clusters"])
create_time_series_plot("switch_rate", custom_colors["switch_rate"])
create_time_series_plot("distance", custom_colors["distance"])
create_time_series_plot("mix_deviation", custom_colors["mix_deviation"])
create_time_series_plot("share", custom_colors["share"])
create_time_series_plot("ghetto_rate", custom_colors["ghetto_rate"])

Several patterns emerge from the temporal evolution:

Cluster count decreases rapidly in early steps before stabilizing around 9-10 clusters
Switch rate shows a monotonic decline from approximately 0.52 to 0.27, suggesting decreasing mobility over time
Distance and mix deviation metrics both increase over time, indicating growing spatial separation
Share increases from about 0.62 to 0.82, while ghetto rate shows the most dramatic change, rising from around 30 to 155

Convergence Analysis

Following Pancs and Vriend (2007), we analyze the convergence properties of the model to understand when the system reaches a stable equilibrium state. A run is considered converged when the configuration remains stable for a sufficient number of steps.

Show code

# Calculate statistics for converged runs only
converged_stats <- convergence_data |> 
  filter(converged) |> 
  summarise(
    mean_step = round(mean(convergence_step, na.rm = TRUE), 1),
    sd_step = round(sd(convergence_step, na.rm = TRUE), 1),
    min_step = min(convergence_step, na.rm = TRUE),
    max_step = max(convergence_step, na.rm = TRUE)
  )

# Create summary table
convergence_summary <- tibble(
  Statistic = c("Total Runs", "Converged Runs", "Convergence Rate", 
                "Mean Convergence Step", "SD Convergence Step", 
                "Min Convergence Step", "Max Convergence Step"),
  Value = c(
    as.character(nrow(convergence_data)),
    as.character(sum(convergence_data$converged)),
    paste0(round(100 * sum(convergence_data$converged) / nrow(convergence_data), 1), "%"),
    as.character(converged_stats$mean_step),
    as.character(converged_stats$sd_step),
    as.character(converged_stats$min_step),
    as.character(converged_stats$max_step)
  )
)

convergence_summary |> 
  kable() |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Convergence Summary Statistics
Statistic	Value
Total Runs	100
Converged Runs	100
Convergence Rate	100%
Mean Convergence Step	31
SD Convergence Step	4.4
Min Convergence Step	25
Max Convergence Step	48

Show code

# Filter converged runs
converged_only <- convergence_data |> 
  filter(converged)

# Calculate statistics for subtitle
conv_mean <- mean(converged_only$convergence_step, na.rm = TRUE)
conv_sd <- sd(converged_only$convergence_step, na.rm = TRUE)

converged_only |> 
  ggplot(aes(x = convergence_step)) +
  geom_histogram(bins = 15, fill = "steelblue", alpha = 0.7, color = "white") +
  geom_vline(aes(xintercept = conv_mean), 
             color = "red", linetype = "dashed", linewidth = 1) +
  labs(
    title = "Distribution of Convergence Steps",
    subtitle = sprintf("Mean: %.1f steps, SD: %.1f steps", conv_mean, conv_sd),
    x = "Convergence Step",
    y = "Number of Runs"
  ) +
  scale_x_continuous(breaks = seq(0, 50, 5))

Figure 2: Distribution of convergence steps for runs that reached equilibrium

Show code

# Get metrics at convergence step for converged runs
converged_metrics <- data |> 
  inner_join(convergence_data |> filter(converged), by = "run_id") |> 
  filter(step == convergence_step) |> 
  select(run_id, all_of(metrics)) |> 
  mutate(status = "Converged")

# Get final step metrics for non-converged runs
non_converged_metrics <- data |> 
  inner_join(convergence_data |> filter(!converged), by = "run_id") |> 
  filter(step == final_step) |> 
  select(run_id, all_of(metrics)) |> 
  mutate(status = "Not Converged")

# Combine and reshape for plotting
convergence_comparison <- bind_rows(converged_metrics, non_converged_metrics) |> 
  pivot_longer(cols = all_of(metrics), names_to = "metric", values_to = "value")

# Create comparison plots
convergence_comparison |> 
  mutate(metric = factor(metric, levels = metrics, labels = metric_labels)) |> 
  ggplot(aes(x = status, y = value, fill = status)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(~ metric, scales = "free_y", ncol = 2) +
  scale_fill_manual(values = c("Converged" = "darkgreen", "Not Converged" = "darkred")) +
  labs(
    title = "Comparison of Metric Values: Converged vs Non-Converged Runs",
    x = "Convergence Status",
    y = "Metric Value",
    fill = "Status"
  ) +
  theme(legend.position = "none")

Figure 3: Metric values at convergence compared to final values for non-converged runs

The convergence analysis reveals that most runs reach a stable equilibrium, consistent with the theoretical predictions of Pancs and Vriend (2007). Runs that do not converge within the simulation timeframe show similar final metric values to converged runs, suggesting they may be approaching similar equilibria but at a slower rate.

Convergence Time and Final Outcomes

Show code

# Get final values for converged runs with convergence time
convergence_outcomes <- data |> 
  inner_join(convergence_data |> filter(converged), by = "run_id") |> 
  group_by(run_id) |> 
  filter(step == max(step)) |> 
  ungroup() |> 
  select(run_id, convergence_step, all_of(metrics)) |> 
  pivot_longer(cols = all_of(metrics), names_to = "metric", values_to = "final_value")

# Create scatter plots with trend lines
convergence_outcomes |> 
  mutate(metric = factor(metric, levels = metrics, labels = metric_labels)) |> 
  ggplot(aes(x = convergence_step, y = final_value)) +
  geom_point(alpha = 0.6, size = 2) +
  geom_smooth(method = "lm", se = TRUE, color = "red", alpha = 0.3) +
  facet_wrap(~ metric, scales = "free_y", ncol = 2) +
  labs(
    title = "Convergence Time vs Final Outcomes",
    subtitle = "Earlier convergence often associated with more extreme segregation outcomes",
    x = "Convergence Step",
    y = "Final Metric Value"
  )

Figure 4: Relationship between convergence time and final metric values

Show code

# Calculate correlations
convergence_cors <- convergence_outcomes |> 
  group_by(metric) |> 
  summarise(
    correlation = cor(convergence_step, final_value, use = "complete.obs"),
    p_value = cor.test(convergence_step, final_value)$p.value,
    n = n()
  ) |> 
  mutate(
    metric = metric_labels[metric],
    significance = case_when(
      p_value < 0.001 ~ "***",
      p_value < 0.01 ~ "**",
      p_value < 0.05 ~ "*",
      TRUE ~ "ns"
    )
  )

convergence_cors |> 
  select(metric, correlation, significance, p_value) |> 
  kable(digits = 3,
        col.names = c("Metric", "Correlation with Convergence Time", "Significance", "p-value"),
        caption = "Significance: *** p<0.001, ** p<0.01, * p<0.05, ns = not significant") |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Table 1: Correlation between convergence time and final metric values

Significance: *** p
Metric	Correlation with Convergence Time	Significance	p-value
Clusters	0.105	ns	0.299
Distance	0.111	ns	0.271
Ghetto Rate	0.104	ns	0.302
Mix Deviation	0.084	ns	0.405
Share	0.107	ns	0.288
Switch Rate	-0.158	ns	0.116

The analysis reveals significant relationships between convergence time and final outcomes. Runs that converge quickly tend to reach more extreme segregation states, while those taking longer to converge often settle into more moderate configurations. This suggests that rapid segregation creates strong, stable patterns that resist further change.

Validation Against Pre-Calculated Statistics

To ensure computational accuracy, we compare our calculated summary statistics with the pre-computed values in the step statistics file.

Show code

# Select a step to validate
validation_step <- 25

# Our calculated stats for this step
our_stats <- summary_stats |> 
  filter(step == validation_step) |> 
  select(metric, mean, sd)

# Pre-computed stats
pre_stats <- step_stats |> 
  filter(step == validation_step) |> 
  select(ends_with("_mean"), ends_with("_std")) |> 
  pivot_longer(everything(), names_to = "stat_type", values_to = "value") |> 
  mutate(
    metric = str_extract(stat_type, "^[^_]+(?:_[^_]+)*(?=_(?:mean|std))"),
    type = str_extract(stat_type, "(mean|std)$")
  ) |> 
  select(-stat_type) |> 
  pivot_wider(names_from = type, values_from = value) |> 
  rename(pre_mean = mean, pre_std = std)

# Compare
validation_comparison <- our_stats |> 
  left_join(pre_stats, by = "metric") |> 
  mutate(
    mean_diff = abs(mean - pre_mean),
    std_diff = abs(sd - pre_std),
    metric = metric_labels[metric]
  )

validation_comparison |> 
  select(metric, mean, pre_mean, mean_diff, sd, pre_std, std_diff) |> 
  kable(digits = 6, 
        col.names = c("Metric", "Calculated Mean", "Pre-computed Mean", "Difference",
                      "Calculated SD", "Pre-computed SD", "Difference"),
        caption = paste("Statistics Validation for Step", validation_step)) |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Comparison of calculated vs pre-computed statistics (Step 25 example)
Metric	Calculated Mean	Pre-computed Mean	Calculated SD	Pre-computed SD
Clusters	8.390000	8.390000	2.107682	2.107682
Distance	2.919433	2.919433	0.587745	0.587745
Ghetto Rate	168.320000	168.320000	23.902107	23.902107
Mix Deviation	0.359334	0.359334	0.026259	0.026259
Share	0.842080	0.842080	0.027784	0.027784
Switch Rate	0.250642	0.250642	0.046400	0.046400

The validation confirms that our calculations match the pre-computed statistics, with differences due only to floating-point precision.

Final Step Distributions

The following figures capture the distribution of outcomes at the final time step across all runs. These histograms reveal the range of possible end states and their relative frequencies.

Show code

# Get final step data
final_step <- max(data$step)
final_data <- data |> 
  filter(step == final_step) |> 
  pivot_longer(cols = all_of(metrics), names_to = "metric", values_to = "value")

# Calculate summary statistics for final step
final_summary <- final_data |> 
  group_by(metric) |> 
  summarise(
    mean = mean(value),
    median = median(value),
    sd = sd(value),
    min = min(value),
    max = max(value),
    .groups = "drop"
  )

Show code

create_distribution_plot <- function(metric_name, color = "steelblue") {
  metric_final <- final_data |> 
    filter(metric == metric_name)
  
  stats <- final_summary |> 
    filter(metric == metric_name)
  
  p <- ggplot(metric_final, aes(x = value)) +
    geom_histogram(bins = 20, fill = color, alpha = 0.7, color = "white") +
    geom_vline(aes(xintercept = stats$mean), 
               color = "red", linetype = "dashed", linewidth = 1) +
    geom_vline(aes(xintercept = stats$median), 
               color = "blue", linetype = "dashed", linewidth = 1) +
    labs(
      title = paste("Final", metric_labels[metric_name], "Distribution"),
      x = metric_labels[metric_name],
      y = "Frequency",
      subtitle = sprintf("Mean: %.3f, Median: %.3f", stats$mean, stats$median)
    ) +
    scale_x_continuous(labels = label_number(accuracy = 0.01))
  
  return(p)
}

create_distribution_plot("clusters", custom_colors["clusters"])
create_distribution_plot("switch_rate", custom_colors["switch_rate"])
create_distribution_plot("distance", custom_colors["distance"])
create_distribution_plot("mix_deviation", custom_colors["mix_deviation"])
create_distribution_plot("share", custom_colors["share"])
create_distribution_plot("ghetto_rate", custom_colors["ghetto_rate"])

Show code

# Display summary statistics table
final_summary |> 
  mutate(metric = metric_labels[metric]) |> 
  kable(digits = 3) |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Table 2: Summary Statistics for Final Step Values

metric	mean	median	sd	min	max
Clusters	12.000	12.000	NA	12.000	12.000
Distance	2.683	2.683	NA	2.683	2.683
Ghetto Rate	162.000	162.000	NA	162.000	162.000
Mix Deviation	0.352	0.352	NA	0.352	0.352
Share	0.833	0.833	NA	0.833	0.833
Switch Rate	0.271	0.271	NA	0.271	0.271

Notable features include the relatively tight distribution of switch rates (centered around 0.26) compared to the broader spread in ghetto rates (ranging from approximately 100 to 220).

Metric Correlations

The correlation analysis reveals strong interdependencies among metrics across all time points and runs.

Show code

# Calculate correlation matrix
cor_data <- data |> 
  select(all_of(metrics))

cor_matrix <- cor(cor_data, use = "complete.obs")

# Create correlation plot
corrplot(cor_matrix, 
         method = "color",
         type = "upper",
         order = "original",
         addCoef.col = "black",
         tl.col = "black",
         tl.srt = 45,
         diag = TRUE,
         col = colorRampPalette(c("#053061", "#2166AC", "#4393C3", "#92C5DE", 
                                  "#D1E5F0", "#FFFFFF", "#FDDBC7", "#F4A582", 
                                  "#D6604D", "#B2182B", "#67001F"))(100),
         mar = c(0, 0, 2, 0),
         title = "Correlation Matrix of ABM Metrics")

Figure 6: Correlation matrix of ABM metrics calculated across all time points and runs

Show code

# Create correlation table for key relationships
cor_df <- as.data.frame(cor_matrix)
cor_df$metric1 <- rownames(cor_df)

significant_correlations <- cor_df |> 
  pivot_longer(cols = -metric1, names_to = "metric2", values_to = "correlation") |> 
  filter(metric1 < metric2) |>  # Keep only upper triangle
  filter(abs(correlation) > 0.5) |>  # Keep strong correlations
  arrange(desc(abs(correlation))) |> 
  mutate(
    metric1 = metric_labels[metric1],
    metric2 = metric_labels[metric2],
    interpretation = case_when(
      correlation > 0.9 ~ "Very strong positive",
      correlation > 0.7 ~ "Strong positive",
      correlation > 0.5 ~ "Moderate positive",
      correlation < -0.9 ~ "Very strong negative",
      correlation < -0.7 ~ "Strong negative",
      correlation < -0.5 ~ "Moderate negative"
    )
  )

significant_correlations |> 
  select(metric1, metric2, correlation, interpretation) |> 
  kable(digits = 3) |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Table 3: Strong Correlations (|r| > 0.5)

metric1	metric2	correlation	interpretation
Mix Deviation	Share	0.993	Very strong positive
Ghetto Rate	Mix Deviation	0.991	Very strong positive
Ghetto Rate	Share	0.982	Very strong positive
Ghetto Rate	Switch Rate	-0.978	Very strong negative
Mix Deviation	Switch Rate	-0.978	Very strong negative
Share	Switch Rate	-0.975	Very strong negative
Distance	Ghetto Rate	0.902	Very strong positive
Distance	Mix Deviation	0.888	Strong positive
Distance	Switch Rate	-0.884	Strong negative
Distance	Share	0.856	Strong positive
Clusters	Share	-0.630	Moderate negative
Clusters	Mix Deviation	-0.576	Moderate negative
Clusters	Switch Rate	0.554	Moderate positive
Clusters	Ghetto Rate	-0.519	Moderate negative

Particularly striking are the negative correlations between switch rate and other metrics (ranging from -0.89 to -0.97), suggesting that decreased switching is associated with increased segregation measures. Distance and mix deviation show strong positive correlation (0.87), as do share and ghetto rate (0.96), indicating these pairs capture related aspects of the segregation process.

Conclusions

These visualizations suggest the model captures a segregation dynamic where initial mixing gives way to increasing spatial separation, with declining mobility rates accompanying the formation of more homogeneous clusters. The consistency across runs indicates robust emergent behavior despite stochastic elements in the model.

Key Findings

Convergent dynamics: All metrics show convergence to relatively stable states by step 30-40, with most runs reaching formal equilibrium. This provides a clear temporal benchmark for assessing whether social reasoning affects convergence speed.
Strong coupling: The high correlations among metrics (|r| > 0.87) suggest they capture different facets of a unified segregation process, aligning with the integrated dynamics described by Pancs and Vriend (2007). This coupling will help us understand whether LLM agents can decouple these traditionally linked outcomes.
Predictable outcomes: Despite variation in individual runs, the final distributions are relatively concentrated (e.g., switch rate CV < 0.15 by step 40), indicating deterministic tendencies in the model. This low variability baseline will highlight any increased diversity in LLM-agent outcomes.
Phase transition: The rapid changes in early steps followed by stabilization suggest a phase transition from mixed to segregated states. The critical window (steps 0-15) represents when interventions—or alternative decision processes—might be most effective.
Path dependence: Runs converging early tend toward more extreme segregation, suggesting that initial conditions and early decisions have lasting impacts. This relationship will be crucial for understanding how LLM agents’ early deliberations affect long-term outcomes.

Implications for Model Interpretation

The strong negative correlation between switch rate and segregation measures (distance, ghetto rate) indicates that reduced mobility is both a cause and consequence of segregation. This bidirectional relationship aligns with Pancs and Vriend (2007)’s analysis of endogenous neighborhood formation. As agents become more spatially separated, opportunities for switching decrease, creating a self-reinforcing dynamic.

The convergence analysis reveals that most runs reach stable equilibrium within the simulation timeframe, with convergence typically occurring between steps 20-40. This is consistent with theoretical predictions about the existence of stable segregated equilibria in spatial proximity models (Zhang 2004). The rapid initial changes followed by stabilization suggest the model reaches an equilibrium state where further segregation is constrained by the spatial structure.

The consistency of outcomes across runs, despite stochastic elements, indicates that the model’s parameters create strong attractors toward segregated states. This robustness suggests that interventions to prevent segregation would need to be applied early in the process, before the self-reinforcing dynamics take hold. As noted by Fossett (2006), once spatial sorting begins, the dynamics become increasingly difficult to reverse.

Methodological Contributions

This analysis demonstrates the value of comprehensive visualization and statistical analysis in understanding agent-based models. By examining individual trajectories alongside aggregate statistics, we can identify both typical behaviors and outliers. The correlation analysis reveals how different segregation metrics move together, validating the use of multiple measures to capture the phenomenon’s complexity (Reardon and O’Sullivan 2004).

The pre-computed statistics provide computational efficiency for large-scale analyses while maintaining accuracy, as validated in Section 5. This approach allows for rapid exploration of parameter spaces in future sensitivity analyses.

Implications for LLM-Agent Comparison

The baseline established here provides several critical insights for the upcoming comparison with LLM-based agents:

Intervention Windows: The rapid phase transition in the first 15 steps suggests that any moderating effects of social reasoning must act quickly to prevent segregation cascades
Metric Sensitivity: The strong correlations among metrics indicate that changes in any one dimension (e.g., reduced switch rate due to social ties) will likely propagate to others
Variability Patterns: The decreasing coefficient of variation over time provides a benchmark for assessing whether LLM agents maintain greater behavioral diversity
Equilibrium Characteristics: The convergence to stable states around step 30 establishes a temporal benchmark for comparing decision-making processes

This comprehensive baseline ensures that our forthcoming comparison of rule-based and LLM-based agents rests on solid empirical foundations. By documenting not just average outcomes but full distributions, temporal dynamics, and cross-metric relationships, we enable nuanced understanding of how incorporating social context and narrative reasoning might reshape fundamental segregation processes. The question remains: will human-like social reasoning merely add complexity to inevitable outcomes, or can it fundamentally alter the trajectory toward segregation?

Future Directions

With this baseline established, our immediate next steps include:

LLM Agent Implementation: Developing agents that use language models to evaluate neighborhoods based on generated narratives and multiple contextual factors
Comparative Simulations: Running matched simulations with identical initial conditions and spatial configurations
Mechanism Analysis: Collecting and analyzing LLM-generated rationales to understand decision pathways
Sensitivity Testing: Exploring how different prompt structures and LLM architectures affect outcomes
Validation Studies: Comparing LLM agent decisions with human survey data on residential preferences

This research program aims to bridge the gap between abstract models of segregation and the complex realities of human residential choice, potentially revealing new intervention strategies that leverage social narratives and community building to create more integrated societies.

Appendix: Session Information

Data Availability

The baseline simulation data analyzed in this study consists of: - metrics_history.csv: Complete time series for all metrics across 100 runs - convergence_summary.csv: Convergence status and timing for each run - step_statistics.csv: Pre-computed summary statistics for computational efficiency

These datasets will be made available alongside the forthcoming LLM-agent simulation results to enable direct comparison and replication of analyses.

Code Availability

All analysis code is contained within this reproducible Quarto document. The baseline ABM implementation follows standard Schelling model specifications with parameters: - Grid size: 50×50 - Population density: 0.8 - Tolerance threshold: 0.3 - Empty cells for movement: 20%

The exact implementation details and parameter files will be provided with the comparative study to ensure complete reproducibility.

Computational Environment

sessionInfo()

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scales_1.3.0     kableExtra_1.4.0 knitr_1.50       patchwork_1.3.0 
 [5] corrplot_0.95    lubridate_1.9.4  forcats_1.0.0    stringr_1.5.1   
 [9] dplyr_1.1.4      purrr_1.0.4      readr_2.1.5      tidyr_1.3.1     
[13] tibble_3.2.1     ggplot2_3.5.2    tidyverse_2.0.0 

loaded via a namespace (and not attached):
 [1] generics_0.1.3    xml2_1.3.8        lattice_0.22-7    stringi_1.8.7    
 [5] hms_1.1.3         digest_0.6.37     magrittr_2.0.3    evaluate_1.0.3   
 [9] grid_4.4.0        timechange_0.3.0  fastmap_1.2.0     Matrix_1.7-3     
[13] jsonlite_2.0.0    mgcv_1.9-3        viridisLite_0.4.2 cli_3.6.4        
[17] crayon_1.5.3      rlang_1.1.6       splines_4.4.0     bit64_4.6.0-1    
[21] munsell_0.5.1     withr_3.0.2       yaml_2.3.10       parallel_4.4.0   
[25] tools_4.4.0       tzdb_0.5.0        colorspace_2.1-1  vctrs_0.6.5      
[29] R6_2.6.1          lifecycle_1.0.4   htmlwidgets_1.6.4 bit_4.6.0        
[33] vroom_1.6.5       pkgconfig_2.0.3   pillar_1.10.2     gtable_0.3.6     
[37] glue_1.8.0        systemfonts_1.2.2 xfun_0.52         tidyselect_1.2.1 
[41] rstudioapi_0.17.1 farver_2.1.2      nlme_3.1-168      htmltools_0.5.8.1
[45] labeling_0.4.3    rmarkdown_2.29    svglite_2.1.3     compiler_4.4.0

References

Fossett, Mark. 2006. “Ethnic Preferences, Social Distance Dynamics, and Residential Segregation: Theoretical Explorations Using Simulation Analysis.” Journal of Mathematical Sociology 30 (3-4): 185–273.

Grossmann, Igor, Matthew Feinberg, Dawn C Parker, Nicholas A Christakis, Philip E Tetlock, and William A Cunningham. 2023. “AI and the Transformation of Social Science Research.” Science 380 (6650): 1108–9.

Horton, John J. 2023. “Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?” National Bureau of Economic Research Working Paper, no. 31122.

Pancs, Roman, and Nicolaas J Vriend. 2007. “Schelling’s Spatial Proximity Model of Segregation Revisited.” Journal of Public Economics 91 (1-2): 1–24.

Park, Joon Sung, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. “Generative Agents: Interactive Simulacra of Human Behavior.” arXiv Preprint arXiv:2304.03442.

Reardon, Sean F, and David O’Sullivan. 2004. “Measures of Spatial Segregation.” Sociological Methodology 34 (1): 121–62.

Schelling, Thomas C. 1971. “Dynamic Models of Segregation.” Journal of Mathematical Sociology 1 (2): 143–86.

Zhang, Junfu. 2004. “Residential Segregation in an All-Integrationist World.” Journal of Economic Behavior & Organization 54 (4): 533–50.

Ziems, Caleb, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, and Diyi Yang. 2024. “Can Large Language Models Transform Computational Social Science?” Computational Linguistics, 1–53.