Predicting the decision made by Supreme Court

Source: Analytics Edge Unit 4 Lecture

Techniques involved: CART, ROC curve

In this problem, we’ll see how analytics can be used to predict the outcomes of cases in the United States Supreme Court. This seems like a very unconventional use of analytics, but in 2002 a group of political science and law academics decided to test if a model can do better than a group of experts at predicting the decisions of the Supreme Court.

In this case, a very interpretable analytics method was used, called classification and regression trees. The legal system of the United States operates at the state level and at the federal or country-wide level. The federal level is necessary to deal with cases beyond the scope of state law, like disputes between states, and violations of federal laws. The federal court is divided into three levels–district courts, circuit courts, and the Supreme Court. Cases start at the district courts, where an initial decision is made about the case. The circuit courts hear appeals from the district courts, and can change the decision that was made. The Supreme Court is the highest level in the American legal system and makes the final decision on cases.

The Supreme Court of the United States consists of nine judges, or justices, who are appointed by the President. This image shows the nine Supreme Court justices from the time period 1994 through 2005. This was the longest period of time with the same set of justices in over 180 years. The people appointed as Supreme Court justices are usually distinguished judges, professors of law, or state or federal attorneys. The Supreme Court of the United States, or SCOTUS, decides on the most difficult and controversial cases in the United States. These cases often involve an interpretation of the Constitution, and have significant social, political, and economic consequences.

Since non-profits, voters, and anybody interested in long-term planning can benefit from knowing the outcomes of the Supreme Court cases before they happen, legal academics and political scientists regularly make predictions of Supreme Court decisions from detailed studies of the cases and individual justices. In 2002, Andrew Martin, a professor of political science at Washington University in St. Louis, decided to instead predict decisions using a statistical model built from data. Together with his colleagues, he decided to test the model against a panel of experts. They wanted to see if an analytical model could outperform the expertise and intuition of a large group of experts.

Martin used a method called classification and regression trees, or CART. In this case, the outcome is binary. Will the Supreme Court affirm the case or reject the case? He could have used logistic regression for this, but logistic regression models are not easily interpretable. The model coefficients in logistic regression indicate the importance and relative effect of variables, but do not give a simple explanation of how a decision is made. In this lecture, we’ll discuss the method of CART, and a related method called random forests. We will then see if these methods can actually outperform experts in predicting the outcome of Supreme Court cases.

Load the data

setwd("C:/Users/jzchen/Documents/Courses/Analytics Edge/Unit_4_Trees")
stevens <- read.csv("stevens.csv")
str(stevens)

## 'data.frame':    566 obs. of  9 variables:
##  $ Docket    : Factor w/ 566 levels "00-1011","00-1045",..: 63 69 70 145 97 181 242 289 334 436 ...
##  $ Term      : int  1994 1994 1994 1994 1995 1995 1996 1997 1997 1999 ...
##  $ Circuit   : Factor w/ 13 levels "10th","11th",..: 4 11 7 3 9 11 13 11 12 2 ...
##  $ Issue     : Factor w/ 11 levels "Attorneys","CivilRights",..: 5 5 5 5 9 5 5 5 5 3 ...
##  $ Petitioner: Factor w/ 12 levels "AMERICAN.INDIAN",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Respondent: Factor w/ 12 levels "AMERICAN.INDIAN",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ LowerCourt: Factor w/ 2 levels "conser","liberal": 2 2 2 1 1 1 1 1 1 1 ...
##  $ Unconst   : int  0 0 0 0 0 1 0 1 0 0 ...
##  $ Reverse   : int  1 1 1 1 1 0 1 1 1 1 ...

Split the dataset

library(caTools)
set.seed(3000)
spl <- sample.split(stevens$Reverse, SplitRatio = 0.7)
Train <- subset(stevens, spl == TRUE)
Test <- subset(stevens, spl == FALSE)

Build a CART model

library(rpart)
library(rpart.plot)
stevensTree <- rpart(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method = "class", minbucket = 25)

The last argument we’ll give is minbucket = 25. This limits the tree so that it doesn’t overfit to our training set.

Plot the tree

prp(stevensTree)

Evaluate the model

predictCART <- predict(stevensTree, newdata = Test, type = "class")

We added a third argument here, which is type = “class”. We need to give this argument when making predictions for our CART model if we want the majority class predictions.

table(Test$Reverse, predictCART)

##    predictCART
##      0  1
##   0 41 36
##   1 22 71

accuracy is 0.6588

If you were to build a logistic regression model,you would get an accuracy of 0.665 and a baseline model that always predicts Reverse, the most common outcome, has an accuracy of 0.547. So our CART model significantly beats the baseline and is competitive with logistic regression.

Create ROC curve

library(ROCR)

## Loading required package: gplots
## 
## Attaching package: 'gplots'
## 
## The following object is masked from 'package:stats':
## 
##     lowess

predictROC <- predict(stevensTree, newdata = Test)

predictROC returns the probability of outcome being zero and one respectively.

These numbers give the percentage of training set data in that subset with outcome 0 and the percentage of data in the training set in that subset with outcome 1. We’ll use the second column as our probabilities to generate an ROC curve.

pred <- prediction(predictROC[,2], Test$Reverse)
perf <- performance(pred, "tpr", "fpr")
plot(perf)

as.numeric(performance(pred, "auc")@y.values)

## [1] 0.6927105

Try different minbucket

change the minbucket to 5

stevensTree_5 <- rpart(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method = "class", minbucket = 5)
prp(stevensTree_5)

change the parameter to 100

stevensTree_100 <- rpart(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method = "class", minbucket = 100)
prp(stevensTree_100)