A Deeper Dive on RPV: Using EI Software

Ray Block Jr., Penn State University

Expert Witness Convening, Summer 2021

Overview

  1. Liu provides (among other things) an intellectual history of ecological inference (EI) in RPV.

  2. I talk briefly about RPV analyses in RStudio, and I go through a step-by-step example.

  3. I pass the mic to Liu who will talk specifically about EI software.

Getting Started

Lowering expectations

Confessions of an SPSS and Stata user…

  • I literally just started using R this spring
  • I write stuff in RStudio more than I crunch numbers in it
  • My newness is both a weakness and a strength
  • We will do basic analyses of a straightforward example

Example: Obama v. McCain (2008)

RStudio, at a glance

Getting your information into RStudio

read in (i.e, import) the data…

library(readxl)
Morgan <- read_excel("C:/Users/rjb6233/Google Drive/Political Science Classes (Taught or Being Prepped)/PSU Courses/RPV Crash Course/Data/Morgan County (Alabama)/Morgan.xlsx")
View(Morgan)
  • Install the readxl package and load it to your “library”

  • Find the “Morgan.xlsx” file

  • Use the “read_excel” command to translate the file into R

  • Use View command to check if the data imported right

Getting your information into RStudio

Doing data stuff

Generally, we are looking for relationships between jurisdiction characteristics and voting patterns

Some ways of checking for these relationships:

  1. homogeneous precinct analyses
  2. correlation analyses (e.g., scatterplots)
  3. bi-variate regression analyses (e.g., Goodman’s)
  4. ecological inference analyses (King’s EI)

Doing data stuff (part 1): homogeneous precinct analyses

1. Homogeneous precinct analyses

What is it?

  • compares mostly White to mostly minority precincts
  • research question: are voting patterns different?

1. Homogeneous precinct analyses

How you do it?

  • sort jurisdiction characteristics by homogeneous-ness
  • use that variable to explore voting patterns

1. Homogeneous precinct analyses

Jurisdiction characteristics variables:

  • blkvap (Black voting age population)
  • whtvap (White voting age population)

These are theoretically-grounded measures of “racial political context” (see. e.g., McClerking 2008)

1. Homogeneous precinct analyses

Jurisdiction characteristics variables:

summary(Morgan$'blkvap')
summary(Morgan$'whtvap')

1. Homogeneous precinct analyses

Voting patterns variables:

  • Obama (black_percent)/McCain (white_percent) vote shares
summary(Morgan$'black_percent')
summary(Morgan$'white_percent')

1. Homogeneous precinct analyses

Manipulating variables

  • Install the dplyr package and load it to your “library”
library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

1. Homogeneous precinct analyses

Use dplyr’s “mutate” and “ifelse” commands to create homogeneous precinct variables

Morgan_RC <- (Morgan %>%
  mutate(pBlackVAP=(blkvap/vap)*100) %>%
  mutate(pWhiteVAP=(whtvap/vap)*100) %>%
  mutate(Obama_Vote=black_percent*100) %>%
  mutate(McCain_Vote=white_percent*100) %>% 
  mutate(MostlyBlack=ifelse(pBlackVAP >= 70, 1, 0),
         MostlyWhite=ifelse(pWhiteVAP >= 90, 1, 0))
  )
data(Morgan_RC)

I saved these new variables in a data file called “Morgan_RC” (“RC” stands for “re-coded”)

1. Homogeneous precinct analyses

Use the “filter” command (in dplyr) to sort through the data

  • precincts that are homogenously Black
black_precincts <- Morgan_RC %>%
  filter(MostlyBlack==1) %>%
  select(black_percent, white_percent)
data(black_precincts)
print(black_precincts)

1. Homogeneous precinct analyses

Use the “filter” command (in dplyr) to sort through the data

  • precincts that are homogenously White
black_precincts <- Morgan_RC %>%
  filter(MostlyWhite==1) %>%
  select(black_percent, white_percent)
data(black_precincts)
print(black_precincts)

1. Homogeneous precinct analyses

Interpretation: the evidence suggests racial polarization

  • residents in mostly Black precincts prefer Obama

  • residents in mostly White precincts prefer McCain

1. Homogeneous precinct analyses

Limitations of Homogeneous Precinct Analysis

  • Strict threshold: In many jurisdictions there are no precincts that can be classified as homogeneous (based on a 90% cutoff point)

  • Limited information: Homogeneous precincts are often only a small, possibly unrepresentative, sample of the population

    • Voters in less-homogeneous areas not always incorporated into the analyses

Doing data stuff (part 2): correlation analyses

2. Correlation analyses

Homogeneous precinct analyses test for differences in voting patterns across jurisdiction characteristics (homogeneously Black vs. White)

Analyzing difference is analogous to analyzing associations

Correlations are tests of associations

2. Correlation analyses

Bi-variate correlations determine:

  • if 2 variables are related

  • (if related), how are they related

The goal: make predictions from one variable to another

2. Correlation analyses

Theoretically, correlation \(=\) association

  • knowing something about one variable tells you something about another variable

Statistically, correlation \(=\) co-variation

  • variables are co-related if they co-vary

  • changes in one accompany changes in other one

2. Correlation analyses

How correlations work

2. Correlation analyses

It goes without saying, but correlation \(\neq\) causation!

  • correlations only look for relationship between variables

  • they do not determine which variable causes the other

2. Correlation analyses

Pearson’s correlation coefficient (rXY)

  • most commonly used measure of (linear) association

  • rXY = index representing the strength and direction of relatedness between X and Y

2. Correlation analyses

Pearson’s correlation coefficient (rXY)

  • rXY always ranges from -1 to +1

    • -1 \(=\) perfect inverse correlation between X and Y
    • 0 \(=\) no relationship between X and Y
    • +1 \(=\) perfect positive correlation between X and Y
  • rXY can take on any value between these extremes

2. Correlation analyses

Pearson’s correlation coefficient (rXY)

  • rXY does not change if the independent (X) and dependent (Y) variables are interchanged

  • ordering of variables does not matter (this is a measure of correlation, not causation)

2. Correlation analyses

Interpreting rXY:

  • sign represents the direction of relationship

  • absolute value represents strength of relationship

2. Correlation analyses

Interpreting rXY:

scatterplots are great for visualizing correlations

2. Correlation analyses

Creating scatterplots (Obama votes by Black vs. White VAP)

ggplot(Morgan_RC, aes(x = pBlackVAP, y = Obama_Vote)) +
    geom_point()

ggplot(Morgan_RC, aes(x = pWhiteVAP, y = Obama_Vote)) +
    geom_point()

2. Correlation analyses

Interpretation: racial difference in candidate preference

  • as the Black VAP increases, so does Obama’s vote share

  • Obama’s vote share decreases with rising White VAP

2. Correlation analyses

Limitations:

  • correlations are based on a continuous (rather than dichotomous) measure of jurisdiction characteristics

  • but conclusions are still based on aggregate patterns (can’t really talk about what individuals are doing)

Doing data stuff (part 3): bivariate regression

Bivatiate Regression

Applying correlation statistics to prediction problems

  • rXY tells us the strength and direction of the relationship between two variables

    • If rXY is strong, then we can use information about values of X to predict values of Y

    • If rXY has a positive (negative) sign, then the direction of relationship is positive (inverse)

Bivatiate Regression

Applying correlation statistics to prediction problems

  • The shape of the relationship modeled in rXY is linear

    • rXY describes how well a straight line describes the values of the Y variable across the range of X values

Bivatiate Regression

Applying correlation statistics to prediction problems

  • If the absolute value of rXY is close to 1, then the observed Y values all lie close to the best-fitting line

    • As a result, we can use a line to predict what the values of the Y variable will be for any given value of X

    • To make such a prediction, we need to know how to create the best-fitting (regression) line

Bivatiate Regression

Applying correlation statistics to prediction problems

  • X (horizontal axis) is a predictor of Y (vertical axis)

  • Best fitting line minimizes differences between data points and straight line

  • We call points not falling directly on the line residuals

Bivatiate Regression

Applying correlation statistics to prediction problems

  • Residual: gap btw value predicted by model and actual value of variables

  • Data points above line are positive residuals (model under-predicts Y across X)

  • Points below line are over-predicted (negative residuals)

Bivatiate Regression

Line that fits best \(\equiv\) line that minimizes residuals. One way to draw such a line is the method of least squares:

\[\hat{Y} = b + mX\]

where,

  • \(\hat{Y}\) represents values of Y predicted by the linear model

  • b is intercept (the value of \(\hat{Y}\) when X \(=\) 0)

  • m is the slope of the regression line (i.e., the change in \(\hat{Y}\) associated with a one-unit shift in X)

  • X is the value of the predictor variable

Bivatiate Regression

Let’s re-write the regression equation:

\[\hat{Y} = \beta_{0} + \beta_{1}X + \epsilon\]

where,

  • \(\hat{Y}\) represents values of Y predicted by the linear model

  • \(\beta_{0}\) is intercept (the value of \(\hat{Y}\) when X \(=\) 0)

  • \(\beta_{1}\) is the slope of the regression line (i.e., the change in \(\hat{Y}\) associated with a one-unit shift in X)

  • X is the value of the predictor variable (% Black VAP)

  • \(\epsilon\) is the error (“resudual”) term

Bivatiate Regression

The Goodman approach makes a lot of assumptions (and has its flaws), but the general idea is that:

  • intercept (\(\beta_{0}\)) predicts how Whites in a precinct vote

  • slope + intercept ( \(\beta_{0}\) + \(\beta_{1}\)) predicts how Blacks vote

Here’s (brief discussion of) logic behind all this…

The Intuition behind the Problem

Voting rights litigation relies on the Gingles Test, which requires the following conditions to be satisfied for a Voting Rights Act violation:

  1. The group of minority voters is sufficiently large and geographically compact.

  2. Minority voters are politically cohesive in supporting their candidate of choice.

  3. The majority votes in a bloc to usually defeat the minority’s preferred candidate.

The Intuition behind the Problem

The data we typically have gives us information about:

  • who voted (often obtained from the voter file), and

  • which candidates received votes (we learn this from the election results).

The Intuition behind the Problem

Knowing these totals isn’t good enough to meet the requirements of the Gingles test

  • We actually need to know which candidates voters supported

  • Why? So we can determine whether White voters and minority voters were cohesive blocks supporting different candidates (criteria 2 and 3)

The Intuition behind the Solution

  1. Goodman’s approach: use a modified version of bivarite regression to infer
  • unobserved quantities of interest (\(\beta_{i}^{b}\) and \(\beta_{i}^{w}\))

  • from the aggregate observed variables (turnout per precinct, Black VAP, total VAP, etc.)

The Intuition behind the Solution

The issue with Goodman’s regression

  • Assumes equal number of residents across precincts (\(N_{i}\))

  • If \(N\) differs across \(i\), Goodman’s regression cannot estimate correct quantities of interest

The Intuition behind the Solution

  1. Ecological Inference (“EI”): simulate \(\beta^b_{i}\) and \(\beta^w_{i}\)
  • Widely acknowledged to be an improvement over the Goodman ’s approach

    • In fact, the Court directly recommends EI as the main statistical method for estimating voting preference by racial group (e.g. Thornburg v. Gingles 478 U.S. 30, 1986)

Passing the mic back to Professor Liu!