2026-5-11

Warm-up

  • Groups: Final project teams (if team members here)
  • Respond to worksheet questions on final projects

Today’s Class

  • Warm-up: final project prep
  • Data models vs. algorithmic models
  • Activity: categorization by hand
  • Unsupervised algorithms, supervised algorithms
  • Reflection on algorithms, society

Wednesday’s Class

  • Machine learning with tidymodels
  • Guest speaker from Recidiviz - add your questions on “Collaborations” tab

Office Hours

  • Office Hours: Monday 1:30-3:30pm (Tyler)
  • Tuesdays, 10:30am-12:00pm (Yao)

Learning Goals

  • Motivate algorithmic approaches to social science
  • Explore classification strategies
  • Understand data models vs. algorithms
  • Understand supervised vs. unsupervised algorithms
  • Discuss algorithmic bias
  • (Wednesday) Machine Learning with tidymodels

Algorithmic Approaches to Social Science

Algorithmic Approaches to Social Science

  • What is an example of a social process where something (\(x\)) leads to something else (\(y\))?
Source: Breiman, 2002

Source: Breiman, 2002

The Data Model Approach

  • In the data modeling approach, we would use a combination of features \(x\) to explain how \(y\) happens
  • Examples of data models?
Source: Breiman, 2002

Source: Breiman, 2002

The Data Model Approach

  • In the data modeling approach, we would use a combination of features \(x\) to explain how \(y\) happens
  • Examples of data models?
Source: Breiman, 2002

Source: Breiman, 2002

The Algorithmic Approach

  • In the algorithmic approach, we may or may not care how \(y\) happens
  • We use a machine learning model to take features from \(x\) and predict \(y\)
  • Examples of algorithmic models?
Source: Breiman, 2002

Source: Breiman, 2002

The Algorithmic Approach

  • In the algorithmic approach, we may or may not care how \(y\) happens
  • We use a machine learning model to take features from \(x\) and predict \(y\)
  • Examples of algorithmic models?
Source: Breiman, 2002

Source: Breiman, 2002

Unsupervised Algorithmic Approaches

What is unsupervised machine learning?

  • Unsupervised machine learning means using an algorithm to classify data where we have no ground truth
  • In other words, we want to categorize our data, often on many characteristics at once, or something that is difficult to define

Activity: Classify Neighborhoods

  • Classify the following housing data into “neighborhoods”
  • Outline your process: how did you classify these house?
  • Is there a mathematical formula or set of instructions that could be applied to a different area?
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   2%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   5%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   8%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  12%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  18%  |                                                                              |==============                                                        |  20%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  23%  |                                                                              |==================                                                    |  25%  |                                                                              |====================                                                  |  28%  |                                                                              |=====================                                                 |  30%  |                                                                              |=======================                                               |  33%  |                                                                              |=========================                                             |  35%  |                                                                              |==========================                                            |  37%  |                                                                              |============================                                          |  39%  |                                                                              |=============================                                         |  41%  |                                                                              |==============================                                        |  43%  |                                                                              |================================                                      |  45%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  49%  |                                                                              |====================================                                  |  51%  |                                                                              |=====================================                                 |  53%  |                                                                              |=======================================                               |  55%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  59%  |                                                                              |===========================================                           |  61%  |                                                                              |============================================                          |  63%  |                                                                              |==============================================                        |  65%  |                                                                              |===============================================                       |  67%  |                                                                              |================================================                      |  69%  |                                                                              |==================================================                    |  71%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================                |  77%  |                                                                              |=======================================================               |  79%  |                                                                              |=========================================================             |  81%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  85%  |                                                                              |=============================================================         |  87%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  91%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  95%  |                                                                              |====================================================================  |  97%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%

What is unsupervised machine learning?

  • In unsupervised machine learning, we have no “ground truth” answers
  • We look to the algorithm for classifications

K-means

  1. Pick k points (centroids) at random
  2. Assign all data points to closest centroid
  3. Re-calculate centroids, based on points in group
  4. Re-assign points to nearest centroid
  5. Repeat 3-4 until no points switch groups

K-means

Supervised Algorithmic Approaches

What is supervised machine learning?

  • In supervised machine learning, we have “ground truth” data
  • We want to train an algorithm to make predictions on new data

Splitting Data

  • Recall: We might want to split our data into training and test sets
  • Question: Why do we split?

Cross-Validation

  • We might even want to split up the training sample to create better models

The Modeling Process

The Modeling Process

Decision Trees

  • A decision tree uses “split points” to make decisions based on certain variables
  • Let’s look at an example

Predicting Neighborhoods

  • Let’s say we have the “true” neighborhood classifications
  • What are some features that might help us classify neighborhoods?
  • Can you think of any “decision points”?

What is a random forest?

What is a random forest?

Accuracy and Interpretability

  • As our modeling strategies get more complicated, what happens to interpretability?
  • Plot the following on the chart below: - decision trees, random forests, linear regression, logistic regression, neural networks, and any other modeling strategies you can think of!

Accuracy and Interpretability

  • Maybe you have something like this? (not exact, will vary heavily by model specifications)
  • The takeaway: in effforts to increase predictive accuracy, models can become opaque/unclear

Accuracy and Interpretability

  • Recall: data models vs algorithmic models
  • What would it mean if important societal decisions were made by algorithms?

Algorithmic Bias

What is Algorithmic Bias?

  • We’ve seen how algorithms can make predictions by finding hidden patterns in data
  • Bias vs. variance tradeoff (statistical bias/variance)
  • This can also mean social bias: algorithms might favor certain individuals/groups/places

Credit Scores and Algorithmic Bias

  • Credit scoring is the “paradigmatic example of algorithmic governance” (Kiviat, 2019)
  • Big financial data is used to predict loan repayment based on obscure patterns
  • These scores dictate who can access loans, repayment rates, and more

Credit Scores and Algorithmic Bias

  • Example: Credit scores are biased according to individuals’ zip codes
  • Question: how would you make credit scores more fair?

Credit Scores and Algorithmic Bias

  • Question: how would you make credit scores more fair?
  • Use a data model? (Possibly more fair, but with lower predictive accuracy)
  • Remove zip code? (Other data might be correlated with zip code)
  • Set demographic parity (e.g. each zip code must have similar average scores)? This could make another variable more biased
  • Set equal opportunity (e.g. ensure hypothetical individuals with similar characteristics have similar scores across zip codes if all else equal)?

Credit Scores and Algorithmic Bias

  • Question: how would you make credit scores more fair?
  • This is not only a modeling question, it is also a social question!
  • Requires theory of fairness in society

Recap

  • Unsupervised algorithm: create categorizations when we don’t have any (e.g. k-means)
  • Supervised algorithm: make predictions when we have some ground-truth data (e.g. random forest)
  • Data models: explainable, interpretable, not necessarily the best predictors
  • Algorithmic approaches: accurate predictions, not necessarily explainaable/interpretable
  • Algorithmic fairness: requires social theory and algorithmic knowledge

Writing prompts 2

  • Respond to the prompts on the back of the warm-up exercise.