Intro to Computational Social Science: Week 9

2026-05-27

Warm-up

No warm-up today!

Today’s Class

Final Presentation: Brigette, Lindsey & Olivia
Topic Models
LLMs and the future of social science

Office Hours

Office Hours: Today, Friday 1:30pm-3:00pm (Tyler)
Tuesdays, 10:40am-12:00pm (Yao)

Announcements

PSet 8: deadline extended to Thursday 11:59pm
Rate limit errors - wait, run, save
PSet 9 is optional!
If no submission: full credit
If submission: full credit + replace an earlier assignment grade
A reminder to use GitHub commits

Presentations!

A reminder to engage / ask your peers questions! (part of rubric now)

Remember the “families” of text analysis

Term frequencies
Document structure
Semantic similarity

Document Structure

The second family, document structure analysis, assumes one can extract from word co-occurrence statistics what any given document is “about” (i.e., what the appropriate keywords or themes are) and represents text as observations that vary on this feature.

Document Structure

Each document has “themes” or “topics”
Words combined with each other can comprise topics (e.g. “march” with “january” vs. “soldier”)

Topic Models

Documents are comprised of topics
Topics are comprised of words
Researchers must length of document, proportion of document expected to come from each topic
Algorithm gives us words related to each topic

Latent Direlecht Analysis

Each topic \(\theta\) has a distribution
Each word \(\phi\) has a distribution within each topic
Together, words and topics make up the document

Topic Models

Examples:
What topics have newspapers discussed?
Academic journals?
College application essays?

Topic Models

Topics in college essays are related to household income (Alvero et al, 2021)

Large Language Models

Large Language Models (LLMs) are neural networks with billions of parameters trained on large amounts of data
Model: a statistical tool that uses data to make predictions
Language: text data, such as words and topics
Large: Lots of data!
Increasingly, LLMs refer to generative models

Large Language Models

Large Language Models (LLMs) are neural networks with billions of parameters trained on large amounts of data
Model: a statistical tool that uses data to make predictions
Language: text data, such as words and topics
Large: Lots of data!
Increasingly, LLMs refer to generative models

Large Language Models

The Paper Factory

A single LLM can’t write a sociology paper
But a multi-agent workflow can (?)

The Paper Factory

Sociologists Engzell and Wilmers automated the entire process of writing quantitative sociology papers
Their model: a multi-agent workflow

The Paper Factory

The Paper Factory

In pairs:
What are some of the benefits of automating social science research? The risks/drawbacks?

Benefits of LLMs

Codifying research processes and “hidden curriculum”
Open and reproducible science
Freeing up time for non-trivial tasks

Risks of LLMs

“Mediocrity at scale”
Destruction of peer review

The Future of Surveys?

Silicon Sampling refers to “sampling” LLMs rather than humans
Argyle and colleagues (2023) argue that we can learn much from surveying LLMs (trained on human backstories)
“The information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and sociocultural context that characterize human attitudes.”

The Future of Surveys?

Silicon Sampling refers to “sampling” LLMs rather than humans

Illusions of AI?

Messeri and Crockett (2026): “The proliferation of AI tools in science risks introducing a phase of scientific enquiry in which we produce more but understand less.”

Illusions 1: Breadth

Illusion 2: Depth

Illusion 3: Objectivity

Benefits for Society

Reduced administrative burdens?
Differences in valued skills?

Risks for Society

Existential risks

“If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.” - Eliezer Yudkowski (2023)

Future of Social Science Research

So … what is computational social science?

Future of Social Science Research

Future of Social Science Research