Judge, Jury, and Classifier

The Supreme Court of the United States

  • Consists of nine judges (“justices”), appointed by the President
    • Justices are distinguished judges, professors of law, state and federal attorneys
  • The Supreme Court of the United States (SCOTUS) decides on most difficult and controversial cases
    • Often involve interpretation of the Constitution
    • Significant social, political and economic consequences

Notable SCOTUS Decisions

  • Wickard v. Filbum (1942)
    • Congress allowed to intervene in industrial/economic activity
  • Roe v. Wade (1973)
    • Legalized abortion
  • Bush v. Gore (2000)
    • Decided outcome of presidential election
  • National Federation of Independent Business v. Sebelius (2012)
    • Patient Protection and Affordable Care Act (“ObamaCare”) upheld the requirement that individuals must buy health insurance

Predicting Supreme Court Cases

  • Legal academics and political scientists regularly make predictions of SCOTUS decisions from detailed studies of cases and individual justices

  • In 2002, Andrew Martin, a professor of political science at Washington University in St. Louis, decided to instead predict decisions using a statistical model built from data

  • Together with his colleagues, he decided to test this model against a panel of experts

  • Martin used a method called Classification and Regression Trees (CART)

  • Why not logistic regression?
    • Logistic regression models are generally not intepretable
    • Model coefficients indicate importance and relative effect of variables, but do not give a simple explanation of how decision is made

Data

  • Cases from 1994 through 2001
  • In this period, same nine justices presided SCOTUS
    • Breyer, Ginsburg, Kennedy, O’Connor, Rehnquist (Chief Justice), Scalia, Souter, Stevens, Thomas
    • Rare data set - longest period of time with the same set of justices in over 180 years
  • We will focus on predicting Justice Stevens’ decisions
    • Started out moderate, but became more liberal
    • Self-proclaimed conservative

Variables

  • Dependent Variable: Did Justice Stevens vote to reverse the lower court decision? 1 = reverse, 0 = affirm

  • Indepedent Variable: Properties of the case
    • Circuit court of origin
    • Issue area of case
    • Type of petitioner, type of respondent
    • Ideological direction of lower court decision
    • Whether petitioner argued that a law/practice was unconstitutional

Logistic Regression for Justice Stevens

  • Some significant variables and their coefficients
    • Case is from 2nd circuit court: +1.66
    • Case is from 4th circuit court: +2.82
    • Lower court decision is liberal: -1.22
  • This is complicated
    • Difficult to understand which factors are important
    • Difficult to quickly evaluate what prediction is for a new case

Classification and Regression Trees

  • Build a tree by splitting on variables
  • To predict the outcome for an observation, follow the splits and at the end, predict the most frequent outcome
  • Does not assume a linear model
  • Interpret able

Splits in CART

Final Tree

When Does CART Stop Splitting?

  • There are different ways to control how many splits are generated
    • One way is by setting a lower bound for the number of points in each subset
  • In R, a parameter that control this is minibucket
    • The smaller it is, the more splits will be generated
    • If it is too small, overfitting will occur
    • If it is too large, model will be too simple and accuracy will be poor

Predictions from CART

  • In each subset, we have a bucket of observations, which may contain both outcomes (i.e. affirm and reverse)

  • Compute the percentage of data in a subset of each type
    • Example: 10 affirm, 2 reverse -> 10/(10+2) = 0.87
  • Just like in logistic regression, we can threshold to obtain a prediction
    • Threshold of 0.5 corresponds to picking most frequent outcome

ROC curve for CART

  • Vary the threshold to obtain an ROC curve

Random Forests

  • Designed to improve prediction accuracy of CART

  • Works by building a large number of CART trees
    • Makes model less interpret able
  • To make a prediction for a new observation, each tree “votes” on the outcome, and we pick the outcome that receives the majority of the votes

Building Many Trees

  • Each tree can split on only a random subset of the variables

  • Each tree is built from a “bagged” /“bootstrapped” sample of the data
    • Select observations randomly with replacement
    • Example - original data: 1 2 3 4 5
    • New “data”:

    2 4 5 2 1 -> 1st tree

    3 5 1 5 2 -> 2nd tree

Random Forest Parameters

  • Minimum number of observations in a subset
    • In R, this is controlled by the moderate parameter
    • Smaller nodesize may take longer in R
  • Number of trees
    • In R, this is the ntree parameter
    • Should not be too small, because bagging procedure may missing observations
    • More tree take longer to build

Parameter Selection

  • In CART, the value of “minibucket” can affect the model’s out-of-sample accuracy

  • How should we set this parameter?

  • We could select the value that gives the best testing set accuracy
    • This is not right!

K-fold Cross-Validation

  • Given training set, split into k pieces
  • Use k-1 folds to estimate a model and test model on remaining on fold (“validation set”) for each candidate parameter value
  • Repeat for each of the k folds

Output of k-fold Cross-Validation

Cross-Validation in R

  • Before, we limited our tree using minibucket

  • When we use cross-validation in R, we’ll use a parameter called cp instead
    • Complexity Parameter
  • Like Adjusted R2 and AIC
    • Measures trade-off between model complexity and accuracy on the training set
  • Smaller cp leads to a bigger tree (might overfit)

Martins Model

  • Used 628 previous SCOTUS cases between 1994 and 2001

  • Made predictions for the 68 cases that would be decided in October 2002, before the term started

  • Two stage approach based on CART:
    • First stage: one tree to predict a unanimous liberal decision, other tree to predict unanimous conservative decision.
      • If conflicting predictions or predict no, move to next stage
    • Second stage: predict decisions of each individual justice, and using majority decision as prediction

Example Trees of Justices

The Experts

  • Martin and his colleagues recruited 83 legal experts
    • 71 academics and 12 attorneys
    • 38 previously clerked for a Supreme Court justice, 33 were chaired professors and 5 were current or former law school deans
  • Experts only asked to predict within their area of expertise; more than one expert to each case

  • Allowed to consider any source of information, but not allowed to communicate with each other regarding predictions

The Results

  • For the 68 cases in October 2002:

  • Overall case predictions:
    • Model accuracy: 75%
    • Experts accuracy: 59%
  • Individual justice predictions:
    • Model accuracy: 67%
    • Experts accuracy: 68%

The Analytics Edge

  • Predicting Supreme Court decisions is very valuable to firms, politicians and non-governmental organizations

  • A model that predicts these decisions is both more accurate and faster than experts

    • CART model based on very high-level details of case beats experts who can process much more detailed and complex information

Judge, Jury, and Classifier in R

Read in the data

# Read in the data
stevens = read.csv("stevens.csv")
# Output structure
str(stevens)
## 'data.frame':    566 obs. of  9 variables:
##  $ Docket    : Factor w/ 566 levels "00-1011","00-1045",..: 63 69 70 145 97 181 242 289 334 436 ...
##  $ Term      : int  1994 1994 1994 1994 1995 1995 1996 1997 1997 1999 ...
##  $ Circuit   : Factor w/ 13 levels "10th","11th",..: 4 11 7 3 9 11 13 11 12 2 ...
##  $ Issue     : Factor w/ 11 levels "Attorneys","CivilRights",..: 5 5 5 5 9 5 5 5 5 3 ...
##  $ Petitioner: Factor w/ 12 levels "AMERICAN.INDIAN",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Respondent: Factor w/ 12 levels "AMERICAN.INDIAN",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ LowerCourt: Factor w/ 2 levels "conser","liberal": 2 2 2 1 1 1 1 1 1 1 ...
##  $ Unconst   : int  0 0 0 0 0 1 0 1 0 0 ...
##  $ Reverse   : int  1 1 1 1 1 0 1 1 1 1 ...

Split the data

# Split the data
library(caTools)
set.seed(3000)
spl = sample.split(stevens$Reverse, SplitRatio = 0.7)
Train = subset(stevens, spl==TRUE)
Test = subset(stevens, spl==FALSE)

Load CART tree packages

# Load CART tree packages
library(rpart)
library(rpart.plot)

Implement CART Model

# CART model
StevensTree = rpart(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method="class", minbucket=25)
# Plot CART tree
prp(StevensTree)

Make predictions

# Make predictions
PredictCART = predict(StevensTree, newdata = Test, type = "class")
z = table(Test$Reverse, PredictCART)
kable(z)
0 1
0 41 36
1 22 71
# Compute Accuracy
sum(diag(z))/sum(z)
## [1] 0.6588235

ROC Curve

# ROC curve
library(ROCR)
# Make predictions on test set
PredictROC = predict(StevensTree, newdata = Test)
PredictROC
##             0         1
## 1   0.3035714 0.6964286
## 3   0.3035714 0.6964286
## 4   0.4000000 0.6000000
## 6   0.4000000 0.6000000
## 8   0.4000000 0.6000000
## 21  0.3035714 0.6964286
## 32  0.5517241 0.4482759
## 36  0.5517241 0.4482759
## 40  0.3035714 0.6964286
## 42  0.5517241 0.4482759
## 46  0.5517241 0.4482759
## 47  0.4000000 0.6000000
## 53  0.5517241 0.4482759
## 55  0.3035714 0.6964286
## 59  0.1842105 0.8157895
## 60  0.4000000 0.6000000
## 66  0.4000000 0.6000000
## 67  0.4000000 0.6000000
## 68  0.1842105 0.8157895
## 72  0.3035714 0.6964286
## 79  0.3035714 0.6964286
## 80  0.5517241 0.4482759
## 87  0.7600000 0.2400000
## 88  0.1842105 0.8157895
## 92  0.7910448 0.2089552
## 95  0.7910448 0.2089552
## 102 0.7910448 0.2089552
## 106 0.7910448 0.2089552
## 110 0.7910448 0.2089552
## 112 0.7910448 0.2089552
## 114 0.7910448 0.2089552
## 125 0.7910448 0.2089552
## 130 0.7910448 0.2089552
## 134 0.7910448 0.2089552
## 138 0.7910448 0.2089552
## 145 0.7910448 0.2089552
## 146 0.7910448 0.2089552
## 148 0.3035714 0.6964286
## 149 0.3035714 0.6964286
## 152 0.3035714 0.6964286
## 154 0.5517241 0.4482759
## 161 0.7878788 0.2121212
## 164 0.4000000 0.6000000
## 167 0.7878788 0.2121212
## 169 0.3035714 0.6964286
## 171 0.7600000 0.2400000
## 175 0.5517241 0.4482759
## 176 0.0754717 0.9245283
## 177 0.0754717 0.9245283
## 178 0.0754717 0.9245283
## 180 0.0754717 0.9245283
## 187 0.0754717 0.9245283
## 188 0.7878788 0.2121212
## 190 0.0754717 0.9245283
## 192 0.0754717 0.9245283
## 196 0.0754717 0.9245283
## 197 0.3035714 0.6964286
## 208 0.3035714 0.6964286
## 210 0.0754717 0.9245283
## 216 0.7910448 0.2089552
## 218 0.7910448 0.2089552
## 220 0.0754717 0.9245283
## 224 0.4000000 0.6000000
## 226 0.7600000 0.2400000
## 227 0.4000000 0.6000000
## 228 0.7878788 0.2121212
## 235 0.3035714 0.6964286
## 239 0.7878788 0.2121212
## 242 0.7600000 0.2400000
## 244 0.7600000 0.2400000
## 247 0.4000000 0.6000000
## 255 0.3035714 0.6964286
## 260 0.5517241 0.4482759
## 261 0.7600000 0.2400000
## 264 0.3035714 0.6964286
## 265 0.3035714 0.6964286
## 268 0.3035714 0.6964286
## 272 0.5517241 0.4482759
## 273 0.3035714 0.6964286
## 274 0.5517241 0.4482759
## 275 0.3035714 0.6964286
## 282 0.4000000 0.6000000
## 286 0.7878788 0.2121212
## 291 0.4000000 0.6000000
## 294 0.1842105 0.8157895
## 305 0.4000000 0.6000000
## 306 0.3035714 0.6964286
## 308 0.7878788 0.2121212
## 311 0.7878788 0.2121212
## 313 0.7878788 0.2121212
## 314 0.7878788 0.2121212
## 315 0.7878788 0.2121212
## 317 0.7878788 0.2121212
## 320 0.7878788 0.2121212
## 321 0.7878788 0.2121212
## 323 0.4000000 0.6000000
## 331 0.3035714 0.6964286
## 335 0.3035714 0.6964286
## 338 0.7600000 0.2400000
## 341 0.5517241 0.4482759
## 345 0.5517241 0.4482759
## 346 0.3035714 0.6964286
## 350 0.3035714 0.6964286
## 352 0.3035714 0.6964286
## 353 0.1842105 0.8157895
## 355 0.3035714 0.6964286
## 356 0.1842105 0.8157895
## 358 0.3035714 0.6964286
## 359 0.3035714 0.6964286
## 360 0.4000000 0.6000000
## 361 0.4000000 0.6000000
## 362 0.5517241 0.4482759
## 364 0.3035714 0.6964286
## 368 0.3035714 0.6964286
## 381 0.4000000 0.6000000
## 382 0.1842105 0.8157895
## 384 0.3035714 0.6964286
## 387 0.1842105 0.8157895
## 389 0.3035714 0.6964286
## 390 0.4000000 0.6000000
## 394 0.3035714 0.6964286
## 400 0.7878788 0.2121212
## 402 0.4000000 0.6000000
## 405 0.7878788 0.2121212
## 408 0.3035714 0.6964286
## 410 0.3035714 0.6964286
## 416 0.4000000 0.6000000
## 422 0.7600000 0.2400000
## 432 0.0754717 0.9245283
## 434 0.7910448 0.2089552
## 436 0.0754717 0.9245283
## 441 0.7910448 0.2089552
## 444 0.0754717 0.9245283
## 448 0.0754717 0.9245283
## 450 0.0754717 0.9245283
## 451 0.0754717 0.9245283
## 452 0.7910448 0.2089552
## 454 0.0754717 0.9245283
## 456 0.0754717 0.9245283
## 459 0.0754717 0.9245283
## 462 0.0754717 0.9245283
## 464 0.0754717 0.9245283
## 467 0.0754717 0.9245283
## 468 0.0754717 0.9245283
## 470 0.0754717 0.9245283
## 473 0.0754717 0.9245283
## 476 0.0754717 0.9245283
## 478 0.0754717 0.9245283
## 480 0.0754717 0.9245283
## 482 0.0754717 0.9245283
## 483 0.0754717 0.9245283
## 484 0.0754717 0.9245283
## 494 0.7910448 0.2089552
## 498 0.1842105 0.8157895
## 504 0.4000000 0.6000000
## 509 0.4000000 0.6000000
## 521 0.7600000 0.2400000
## 527 0.4000000 0.6000000
## 531 0.4000000 0.6000000
## 535 0.4000000 0.6000000
## 538 0.7600000 0.2400000
## 539 0.1842105 0.8157895
## 540 0.4000000 0.6000000
## 543 0.7600000 0.2400000
## 545 0.4000000 0.6000000
## 546 0.7910448 0.2089552
## 551 0.7910448 0.2089552
## 552 0.7910448 0.2089552
## 556 0.4000000 0.6000000
## 558 0.1842105 0.8157895
# Plot ROC curve
pred = prediction(PredictROC[,2], Test$Reverse)
perf = performance(pred, "tpr", "fpr")
plot(perf)

Load randomForest package

# Load randomForest package
library(randomForest)

Implement random forest model

# Build random forest model
StevensForest = randomForest(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, ntree=200, nodesize=25 )

# Convert outcome to factor
Train$Reverse = as.factor(Train$Reverse)
Test$Reverse = as.factor(Test$Reverse)

Implement random forest model2

# Try again
StevensForest = randomForest(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, ntree=200, nodesize=25 )

# Make predictions
PredictForest = predict(StevensForest, newdata = Test)
# Compute Accuracy
z = table(Test$Reverse, PredictForest)
kable(z)
0 1
0 42 35
1 18 75
sum(diag(z))/sum(z)
## [1] 0.6882353

Cross - Validation

# Load cross-validation packages
library(caret)
library(e1071)

# Define cross-validation experiment
numFolds = trainControl( method = "cv", number = 10 )
cpGrid = expand.grid( .cp = seq(0.01,0.5,0.01)) 

# Perform the cross validation
train(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method = "rpart", trControl = numFolds, tuneGrid = cpGrid )
## CART 
## 
## 396 samples
##   6 predictor
##   2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 356, 356, 357, 356, 356, 357, ... 
## Resampling results across tuning parameters:
## 
##   cp    Accuracy   Kappa       
##   0.01  0.6087821   0.189707219
##   0.02  0.6216667   0.223453071
##   0.03  0.6267949   0.239192228
##   0.04  0.6368590   0.266178297
##   0.05  0.6443590   0.283030759
##   0.06  0.6443590   0.283030759
##   0.07  0.6443590   0.283030759
##   0.08  0.6443590   0.283030759
##   0.09  0.6443590   0.283030759
##   0.10  0.6443590   0.283030759
##   0.11  0.6443590   0.283030759
##   0.12  0.6443590   0.283030759
##   0.13  0.6443590   0.283030759
##   0.14  0.6443590   0.283030759
##   0.15  0.6443590   0.283030759
##   0.16  0.6443590   0.283030759
##   0.17  0.6443590   0.283030759
##   0.18  0.6443590   0.283030759
##   0.19  0.6443590   0.283030759
##   0.20  0.6038462   0.185123111
##   0.21  0.5631410   0.078289037
##   0.22  0.5528846   0.051089037
##   0.23  0.5403846   0.004897294
##   0.24  0.5378846  -0.008808290
##   0.25  0.5378846  -0.008808290
##   0.26  0.5453846   0.000000000
##   0.27  0.5453846   0.000000000
##   0.28  0.5453846   0.000000000
##   0.29  0.5453846   0.000000000
##   0.30  0.5453846   0.000000000
##   0.31  0.5453846   0.000000000
##   0.32  0.5453846   0.000000000
##   0.33  0.5453846   0.000000000
##   0.34  0.5453846   0.000000000
##   0.35  0.5453846   0.000000000
##   0.36  0.5453846   0.000000000
##   0.37  0.5453846   0.000000000
##   0.38  0.5453846   0.000000000
##   0.39  0.5453846   0.000000000
##   0.40  0.5453846   0.000000000
##   0.41  0.5453846   0.000000000
##   0.42  0.5453846   0.000000000
##   0.43  0.5453846   0.000000000
##   0.44  0.5453846   0.000000000
##   0.45  0.5453846   0.000000000
##   0.46  0.5453846   0.000000000
##   0.47  0.5453846   0.000000000
##   0.48  0.5453846   0.000000000
##   0.49  0.5453846   0.000000000
##   0.50  0.5453846   0.000000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.19.

# Create a new CART model
StevensTreeCV = rpart(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method="class", cp = 0.18)

# Make predictions
PredictCV = predict(StevensTreeCV, newdata = Test, type = "class")
z = table(Test$Reverse, PredictCV)
kable(z)
0 1
0 59 18
1 29 64
# Compute Accuracy
sum(diag(z))/sum(z)
## [1] 0.7235294