RPV Analyses and Ecological Inference

Ray Block Jr., Penn State University

Winter 2021

Overview

Making inferences in empirical research
The ecological inference (EI) puzzle in RPV
EI pagkage in RStudio

1. Making (Statistical) Inferences

Making Inferences

Understanding the process…

Making Inferences

Summarizing data \(\equiv\) functions of facts you have

e.g., % Black in precinct, candidate’s vote share, etc.

Inference \(\equiv\) using facts you have to learn facts you lack

Descriptive inference \(\equiv\) seeks to describe the existence of something (e.g., % Blacks who support Candidate A)
- Ecological inference \(\equiv\) drawing conclusions about individual-level behavior from aggregate-level data

2. The Ecological Inference (EI) Puzzle in RPV

The Intuition behind the Problem

Voting rights litigation relies on the Gingles Test, which requires the following conditions to be satisfied for a Voting Rights Act violation:

The group of minority voters is sufficiently large and geographically compact.
Minority voters are politically cohesive in supporting their candidate of choice.
The majority votes in a bloc to usually defeat the minority’s preferred candidate.

The Intuition behind the Problem

The data we typically have gives us information about:

who voted (often obtained from the voter file), and
which candidates received votes (we learn this from the election results).

The Intuition behind the Problem

For example, based on the Handley data…

We have row/column totals but need data for table’s cells

The Intuition behind the Problem

Knowing these totals isn’t good enough to meet the requirements of the Gingles test

We actually need to know which candidates voters supported
Why? So we can determine whether White voters and minority voters were cohesive blocks supporting different candidates (criteria 2 and 3)

The Intuition behind the Solution

For precinct \(i\) (\(i\) \(=\) 1, 2, 3, …\(p\)), and Candidate A

Observed variables

\(T_{i}\) \(=\) voter turnout for precinct \(i\)
\(X_{i}\) \(=\) % Black VAP in precinct \(i\)
\(N_{i}\) \(=\) number of voting-age people per precinct

The Intuition behind the Solution

For precinct \(i\) (\(i\) \(=\) 1, 2, 3, …\(p\)), and Candidate A

Unobserved quantities of interest

\(\beta_{i}^{b}\) \(=\) % Blacks who vote in precinct \(i\)
\(\beta_{i}^{w}\) \(=\) % Whites who vote in precinct \(i\)

The Intuition behind the Solution

Goodman’s approach: use a modified version of bivarite regression to infer

unobserved quantities of interest (\(\beta_{i}^{b}\) and \(\beta_{i}^{w}\))
from the aggregate observed variables (\(X_{i}\), \(T_{i}\) and \(N_{i}\))

The Intuition behind the Solution

Goodman’s approach: Run a regression of \(T_{i}\) on \(X_{i}\) and (\(1\) \(−\) \(X_{i}\)) (no constant term)

\[\hat{T} = \beta^b_{i}X_{i} + \beta^w_{i}X_{i}(1-X_{i})\]

This equation can be re-written as

\[\hat{T} = \beta^w_{i} + (\beta^b_{i} - \beta^w_{i})X_{i}\]

Coefficients are intended to be:

\(\beta^b_{i}\) \(\equiv\) Black turnout in each precinct
\(\beta^w_{i}\) \(\equiv\) White turnout per precinct

The Intuition behind the Solution

The issue with Goodman’s regression

Assumes equal number of residents across precincts (\(N_{i}\))
If \(N\) differs across \(i\), Goodman’s regression cannot estimate correct quantities of interest

The Intuition behind the Solution

Ecological Inference (“EI”): simulate \(\beta^b_{i}\) and \(\beta^w_{i}\)

Widely acknowledged to be an improvement over the Goodman ’s approach
- In fact, the Court direcly recommends EI as the main statistical method for estimating voting preference by racial group (e.g. Thornburg v. Gingles 478 U.S. 30, 1986)
But standard EI does not work well for elections involving more than 2 candidates and/or racial/ethnicity groups

The Intuition behind the Solution

Rows & columns (R \(\times\) C) approach: extends EI

beyond the 2 candidates (e.g., at-large city council seats are sometimes multi-winner elections)
beyond 2 racial/ethnic group (e.g., Black, White, Latino)

Let’s explore these different EI frameworks in RStudio!

EI in RStudio

Load the eiCompare package (Collingwood et al. 2016)

install.packages("eiCompare")
library(eiCompare) # Use from latest release, which was summer 2020

(You might also need to install the latest version of the Rtools package)

EI in RStudio

Load (re-coded) Handley data; add it to the object “data”

library(readr)
dat <- read_csv("C:/Users/rjb6233/Google Drive/Political Science Classes (Taught or Being Prepped)/PSU Courses/RPV Crash Course/Data/Handley/PracticeData-ReCoded.csv")
View(dat)
data(dat)

EI in RStudio

eiCompare requires that vote variables be on a 0-1 scale (not scaled to range from 0 to 100).

dat <- read_csv("C:/Users/rjb6233/Google Drive/Political Science Classes (Taught or Being Prepped)/PSU Courses/RPV Crash Course/Data/Handley/PracticeData-ReCoded.csv")
dat$pVoteA <- dat$pVoteA/100
dat$pVoteB <- dat$pVoteB/100
dat$pBlackVAP <- dat$pBlackVAP/100
dat$pWhiteVAP <- dat$pWhiteVAP/100
View(dat)

So just divide your current variables by 100.

EI in RStudio

The goal: see how candidates A and B did with voters

Use the ei_est_gen() function, a generalized version of King’s (1997) ei() package. It entails:
- a vector of candidate names (A vs. B),
- the column for the total number of voters,
- the “data.frame” object holding the data, and
- the table names used to display the results.

EI in RStudio

First, run the generalized version of iterative EI (King 1997)

iter <- ei_iter(
    data = dat,
    cand_cols = c("pVoteA", "pVoteB"),
    race_cols = c("pBlackVAP", "pWhiteVAP"),
    totals_col = "total_votes",
    name = "Iterative EI"
)

This takes time because the model has to “converge”
Place results in an object (“iter”)
Results are stored for now (we will look at them later)

EI in RStudio

Next, I run the Rows by Columns (RxC) extension of EI

rxc <- ei_rxc(
    data = data,
    cand_cols = c("pVoteA", "pVoteB"),
    race_cols = c("pBlackVAP", "pWhiteVAP"),
    totals_col = "total_votes",
    name = "RxC EI",
)

Specify columns for race, candidates, and total votes
Place results in an object (“rxc”)
Again, we store results for later

EI in RStudio

Finally, compare the results side-by-side in a summary table

summary(iter, rxc)

EI in RStudio

Or, visualize them with a dot plot and 95% CIs

plot(iter, rxc)

EI in RStudio

Putting all the results into context

EI in RStudio

Final conclusion: candidate preference is racially polarized

Black residents in the precinct prefer Candidate A
White residents prefer Candidate B
This holds true, no matter what RPV analysis we conduct

References (FYI)

Collingwood, Loren, Kassra Oskooii, Sergio Garcia-Rios, and Matt Barreto. 2016. “eiCompare: Comparing Ecological Inference Estimates across EI and EI:R×C.” The R Journal 8(2): 92-101.

Goodman, Leo A. “Some Alternatives to Ecological Correlation.” 1959. American Journal of Sociology 64(6): 610–625.

King, Gary. 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton, NJ: Princeton University Press.