2026-05-27

Warm-up

  • No warm-up today!

Today’s Class

  • Final Presentation: Brigette, Lindsey & Olivia
  • Topic Models
  • LLMs and the future of social science

Office Hours

  • Office Hours: Today, Friday 1:30pm-3:00pm (Tyler)
  • Tuesdays, 10:40am-12:00pm (Yao)

Announcements

  • PSet 8: deadline extended to Thursday 11:59pm
  • Rate limit errors - wait, run, save
  • PSet 9 is optional!
  • If no submission: full credit
  • If submission: full credit + replace an earlier assignment grade
  • A reminder to use GitHub commits

Presentations!

  • A reminder to engage / ask your peers questions! (part of rubric now)

Remember the “families” of text analysis

  • Term frequencies
  • Document structure
  • Semantic similarity

Document Structure

  • The second family, document structure analysis, assumes one can extract from word co-occurrence statistics what any given document is “about” (i.e., what the appropriate keywords or themes are) and represents text as observations that vary on this feature.

Document Structure

  • Each document has “themes” or “topics”
  • Words combined with each other can comprise topics (e.g. “march” with “january” vs. “soldier”)

Topic Models

  • Documents are comprised of topics
  • Topics are comprised of words
  • Researchers must length of document, proportion of document expected to come from each topic
  • Algorithm gives us words related to each topic

Latent Direlecht Analysis

  1. Each topic \(\theta\) has a distribution
  2. Each word \(\phi\) has a distribution within each topic
  3. Together, words and topics make up the document

Topic Models

  • Examples:
  • What topics have newspapers discussed?
  • Academic journals?
  • College application essays?

Topic Models

Large Language Models

  • Large Language Models (LLMs) are neural networks with billions of parameters trained on large amounts of data
  • Model: a statistical tool that uses data to make predictions
  • Language: text data, such as words and topics
  • Large: Lots of data!
  • Increasingly, LLMs refer to generative models

Large Language Models

  • Large Language Models (LLMs) are neural networks with billions of parameters trained on large amounts of data
  • Model: a statistical tool that uses data to make predictions
  • Language: text data, such as words and topics
  • Large: Lots of data!
  • Increasingly, LLMs refer to generative models

Large Language Models

The Paper Factory

  • A single LLM can’t write a sociology paper
  • But a multi-agent workflow can (?)

The Paper Factory

  • Sociologists Engzell and Wilmers automated the entire process of writing quantitative sociology papers
  • Their model: a multi-agent workflow

The Paper Factory

The Paper Factory

  • In pairs:
  • What are some of the benefits of automating social science research? The risks/drawbacks?

Benefits of LLMs

  • Codifying research processes and “hidden curriculum”
  • Open and reproducible science
  • Freeing up time for non-trivial tasks

Risks of LLMs

  • “Mediocrity at scale”
  • Destruction of peer review

The Future of Surveys?

  • Silicon Sampling refers to “sampling” LLMs rather than humans
  • Argyle and colleagues (2023) argue that we can learn much from surveying LLMs (trained on human backstories)
  • “The information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and sociocultural context that characterize human attitudes.”

The Future of Surveys?

  • Silicon Sampling refers to “sampling” LLMs rather than humans

Illusions of AI?

  • Messeri and Crockett (2026): “The proliferation of AI tools in science risks introducing a phase of scientific enquiry in which we produce more but understand less.”

Illusions 1: Breadth

Illusion 2: Depth

Illusion 3: Objectivity

Benefits for Society

  • Reduced administrative burdens?
  • Differences in valued skills?

Risks for Society

  • Existential risks

“If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.” - Eliezer Yudkowski (2023)

Future of Social Science Research

  • We’ve covered a lot of methods for computational social science research
  • Data viz
  • Spatial analysis
  • Networks
  • Prediction and algorithmic modeling
  • Text analysis

So … what is computational social science?

  • Anything that’s cool?

Future of Social Science Research

  • In the era of LLMs, division between social scientists and data scientists may be sharper than ever

Future of Social Science Research

  • What do you think the future of computational social science be?
  • What would you like it to be?