library(tidyverse)
library(dplyr)
library(haven)
library(ggplot2)
library(dendextend)
library(e1071)
library(pander)

I. Introduction

Using machine learning, and more specifically, Support Vector Machines (SVMs), I want to predict whether Congressional party majorities impact federal circuit court rulings in favor of plaintiffs. The judicial system is supposed to be non-partisan, but with recent Supreme Court Justices Kavanaugh and Coney-Barrett being chosen in a contentious political sphere, I thought it would be interesting to investigate how Congressional partisan majorities impact the federal judicial system on a slightly smaller scale via the circuit courts. More explicitly, my research question can be described as: how do Congressional parties impact rulings in federal circuit courts for plaintiffs and defendants?

II. Data and Methodology

I use data from the Federal Judicial Center (FJC), a national database containing datasets of criminal and civil cases ranging from 1988 to present. I also merged data on Republican percentages by using data from the US Senate and House Congressional websites. The dataset was too large for Stata– my original .dta file was significantly larger than 5MB, so through Stata (which was able to open my .dta file), I sampled 5 MB worth of data (the maximum size that R can run). This brings our observations down from an initial value of 10,312,898 to 24,059. I then further clean the dataset by calculating Congressional majorities across the years for which data is available (until 2020), and I filter for instances where judgment is in favor of the plaintiff (judgment == 1) or in favor of the defendant (judgment == 2). The FJC codebook indicates additional values for judgment: 3 for ruling in favor of both, 4 for unknown rulings, and 0/-8 for missing values. For the sake of this assignment, I only look at instances where judgments are exclusive and categorized for either plaintiffs or defendants.

setwd("C:/Users/Ishaani Sharma/Desktop/Statistical Learning/final")
cases <- read_dta("sample_case.dta")

clean_cases <- cases |>
  mutate(Senate_Repub_Per = Senate_Rep/(Senate_Dem + Senate_Rep + Senate_Oth),
         House_Repub_Per = House_Rep/(House_Dem + House_Rep + House_Oth),
         Senate_Majority = case_when(Senate_Repub_Per >= 0.5 ~ "Republican",
                             Senate_Repub_Per < 0.5 ~ "Democrat"),
         House_Majority = case_when(House_Repub_Per >= 0.5 ~ "Republican",
                             House_Repub_Per < 0.5 ~ "Democrat")) |>
  select(circuit, judgment, year, Senate_Repub_Per, House_Repub_Per, Senate_Majority, House_Majority) |>
  filter(judgment == 1 | judgment == 2) |>
  filter(year <= 2020)

Let’s learn a little bit more about our data first through unsupervised learning. This will help us to detect any patterns in the dataset before we start building our predictive model.

I choose to focus on the Senate majority in these unsupervised models because the Senate is the Congressional body that confirms federal judges. Thus, I determine that partisanship in the Senate would impact federal judges to a greater degree than partisanship in the House.

dend <- clean_cases |>
  head(50) |>
  arrange(year) |>
  select(-Senate_Majority, -House_Majority) |>
  scale() |>
  dist() |>
  hclust() |>
  as.dendrogram() |>
  place_labels(clean_cases$year)

dend_colors <- ifelse(clean_cases$Senate_Majority == "Republican",
                    "red",
                    "steelblue")

dend |>
  color_branches(col = dend_colors[order.dendrogram(dend)]) |>
  color_labels(col = dend_colors[order.dendrogram(dend)]) |>
  plot() |>
  title("Republican majority represented with red, Democratic majority represented with blue") +
  title("Figure 1. Clusters of judgment rulings by year, colored by party")

With this dendrogram, we can see clusters of observations. Generally, we can see observations with Senate Republican majorities stick together.

We can also see how judgment responses change over time graphically.

clean_cases |>
  group_by(year, Senate_Majority, House_Majority, judgment) |>
  summarise(n = n(), .groups = "drop") |>
  group_by(year, Senate_Majority, House_Majority) |>
  mutate(prop = n/sum(n)) |>
  ggplot() +
  geom_area(aes(x=year, y=prop, fill=factor(judgment)),
            position = "stack") +
  facet_wrap(~Senate_Majority) +
    scale_fill_manual(
    name = "Judgment",
    values = c("1" = "red", "2" = "steelblue"),
    labels = c("1" = "1 = Plaintiff", "2" = "2 = Defendant")) +
  labs(title = "Figure 2. Proportion of judgments over time split by Senate Majority",
       x = "Year",
       y = "Proportion") +
  theme_minimal()

We can see that over time, there is more fluctuation when the Senate is majority Republican compared to when it is majority Democrat. Generally, in a Democrat-controlled Senate, judicial courts seem to rule in favor of the defendant. We can also see time differences in when each party controlled the Senate, noting that Democrat control ends in 2013.

Now, onto the supervised learning for this project. As I mentioned earlier, we will be using SVMs to predict how Congressional partisanship impacts federal circuit court rulings. But how does this model work?

SVMs are effective in classification scenarios such as this one where we are trying to classify between two judgment types: rulings in favor of the plaintiff versus the defendant. It’s like a more sophisticated version of k-Nearest Neighbors, which looks for “k” closest data points to a new data point that we want to predict a category for, and the algorithm makes a classification based on that.

SVMs find the optimal hyperplane (think of just a line dividing the two classes, plaintiff and defendant). Note in the graph below, we can envision a line dividing the blue dots representing defendants and the red dotes representing plaintiffs. The optimal hyperplane that the SVM will find will maximize the distance between the closest data points for the plaintiff and defendant classes. This allows our SVM to find the best decision boundary between our two classes with the goal of improving predictions.

clean_cases |>
  group_by(year, judgment) |>
  summarise(count = n(), .groups = "drop") |>
  ggplot() +
  geom_point(aes(x= year, y = count, color = factor(judgment)),
             size = 3, alpha = 0.8) +
  scale_color_manual(
    name = "Judgment",
    values = c("1" = "red", "2" = "steelblue"),
    labels = c("1" = "1 = Plaintiff", "2" = "2 = Defendant")
  ) +
  labs(
    title = "Figure 3. Number of judgments over time",
    x = "Year",
    y = "Number of Cases"
  ) +
  theme_minimal()

III. Results

Let’s build the SVM model and try and fit the optimal hyperplane. Based off of Figure 3, I predict that we’ll see a quadratic line to fit this model.

set.seed(123)
training_rows <- sample(1:nrow(clean_cases),
                             size = nrow(clean_cases)/2)
train_data <- clean_cases[training_rows, ]
test_data  <- clean_cases[-training_rows, ]

# Linear
linear_svm <- svm(factor(judgment) ~ ., data = train_data, kernel = "linear", scale = TRUE)
linear_preds <- predict(linear_svm, newdata = test_data)

# Polynomial
polynomial_svm <- svm(factor(judgment) ~ ., data = train_data, kernel = "polynomial", scale = TRUE)
polynomial_preds <- predict(polynomial_svm, newdata = test_data)

# Radial
radial_svm <- svm(factor(judgment) ~ ., data = train_data, kernel = "radial", scale = TRUE)
radial_preds <- predict(radial_svm, newdata = test_data)

mean(linear_preds == test_data$judgment)
## [1] 0.6246194
mean(polynomial_preds == test_data$judgment)
## [1] 0.6228795
mean(radial_preds == test_data$judgment)
## [1] 0.6207047

We see that the linear SVM model has the highest accuracy, but when we look at the tables… (A note with the tables: “predicted” is along the y-axis and “actual” is along the x-axis)

linear_table <- table(Predicted = linear_preds, Actual = test_data$judgment)
poly_table <- table(Predicted = polynomial_preds, Actual = test_data$judgment)
rad_table <- table(Predicted = radial_preds, Actual = test_data$judgment)

pander(linear_table, caption = "Confusion matrix: Linear SVM")
Confusion matrix: Linear SVM
1 2
0 0
863 1436
pander(poly_table, caption = "Confusion matrix: Polynomial SVM")
Confusion matrix: Polynomial SVM
1 2
193 197
670 1239
pander(rad_table, caption = "Confusion matrix: Radial SVM")
Confusion matrix: Radial SVM
1 2
83 92
780 1344

We see that the second table (polynomial SVM) is successful in predicting rulings for 1 and 2. The linear SVM model is only ever making predictions for 2, and I would rather a model that can make accurate predictions for the two categories we have than overall accuracy.

poly1_svm <- svm(factor(judgment) ~ ., data = train_data, kernel = "polynomial", scale = TRUE, cost=0.8)
poly1_preds <- predict(poly1_svm, newdata = test_data)

poly2_svm <- svm(factor(judgment) ~ ., data = train_data, kernel = "polynomial", scale = TRUE, cost=1.5)
poly2_preds <- predict(poly2_svm, newdata = test_data)

mean(poly1_preds == test_data$judgment)
## [1] 0.6250544
mean(poly2_preds == test_data$judgment)
## [1] 0.6224445

A cost value lower than the default of 1 yields a higher accuracy, but only when cost is close to 0.8. For SVMs, a higher cost represents a higher penalty to the algorithm for mis-classifying data. This means that the function is likely to overfit, or try and be super specific for each prediction, which isn’t necessarily what we want from our algorithm.

Now let’s visualize our predictions.

all_judgments <- expand.grid(circuit = seq(from = 0, to = 11, by = 1),
                             year = seq(from = 1988, to = 2020, by = 1),
                             Senate_Repub_Per = seq(from = 0, to = 100, by = 5),
                             House_Repub_Per = seq(from = 0, to = 100, by = 5))

all_judgments$Senate_Majority <- ifelse(
  all_judgments$Senate_Repub_Per >= 50,
  "Republican",
  "Democrat")

all_judgments$House_Majority <- ifelse(
  all_judgments$House_Repub_Per >= 50,
  "Democrat",
  "Republican")

all_judgments$pred <- predict(poly1_svm, all_judgments)

all_judgments |>
  mutate(predictions = all_judgments$pred) |>
  ggplot() +
  geom_point(aes(x=year, y = Senate_Repub_Per, color = predictions)) +
    scale_color_manual(
    name = "Judgment",
    values = c("1" = "red", "2" = "steelblue"),
    labels = c("1" = "1 = Plaintiff", "2" = "2 = Defendant")) +
  labs(title = "Figure 4. Polynomial SVM decision boundary",
       x = "Year",
       y = "Senate Republican Percentage") +
  theme_minimal()

Figure 4 above demonstrates where the polynomial SVM makes a decision for determining if the courts rule in favor of the plaintiff or the defendant. There’s a high decision boundary– the percentage of Republicans in the Senate must be higher than roughly 8% for the model to predict that the ruling is in favor of the plaintiff, which isn’t necessarily the case as demonstrated in5 Figure 2.

Using Figure 4 and the table for the polynomial SVM model, which is provided below:

updated_poly <- table(Predicted = poly1_preds, Actual = test_data$judgment)
pander(updated_poly, caption = "Confusion matrix: Polynomial SVM with cost = 0.8")
Confusion matrix: Polynomial SVM with cost = 0.8
1 2
193 192
670 1244

We can determine that our model is better at predicting judgments for defendants than it is for plaintiffs. It also appears that Congressional partisanship is impacting judgments made by the federal circuit courts.

IV. Conclusion

To answer the research question “How do Congressional parties impact rulings in federal circuit courts for plaintiffs and defendants?,” I collected data from the FJC on civil cases and their associated judgments. Through unsupervised learning, we found some initial patterns that we could futher investigate through supervised learning. Our dendrogram clusters (Figure 1) showed greater commonalities for rulings when the Senate majority was Republican, and Figure 2 highlighted how Democrat-majority Senates result in more rulings for the defendant.

We used Support Vector Machine (SVM) as our supervised learning algorithm. The SVM algorithm determines an optimal hyperplane to determine which categories data may fall into. The algorithm tries to find a line in a 2D space that is the furthest distance away from the closest point to it on either side where categories may fall. So for this dataset, the SVM algorithm is finding the optimal hyperplane between the plaintiff and defendant points. Finding the optimal hyperplane allows for the SVM algorithm to make better predictions to determine if, using Congressional majorities, the federal circuit court rules in favor of the defendnant or the plaintiff.

Through SVM, we built a model that accurately predicted judgment rulings 62.5% of the time. This means that using Congressional majorities, in addition to the other variables in the dataset (circuit and year) did improve predictions from a 50-50 guess. We also saw, as outlined in Figure 4 that the decision boundary for the SVM algorithm to predict for the plaintiffs is around 85% for the percentage of Republicans in the Senate. As a reminder, I chose to focus on the Senate majority and not the House majority because the Senate is who confirms federal judges. Additionally, this data can be replicated using Senate Democrat majorities as well. It would show a mirror image of these figures.

V. Sources