Making inferences in empirical research
The ecological inference (EI) puzzle in RPV
EI pagkage in RStudio
Understanding the process…
Summarizing data \(\equiv\) functions of facts you have
Inference \(\equiv\) using facts you have to learn facts you lack
Descriptive inference \(\equiv\) seeks to describe the existence of something (e.g., % Blacks who support Candidate A)
Voting rights litigation relies on the Gingles Test, which requires the following conditions to be satisfied for a Voting Rights Act violation:
The group of minority voters is sufficiently large and geographically compact.
Minority voters are politically cohesive in supporting their candidate of choice.
The majority votes in a bloc to usually defeat the minority’s preferred candidate.
The data we typically have gives us information about:
who voted (often obtained from the voter file), and
which candidates received votes (we learn this from the election results).
For example, based on the Handley data…
We have row/column totals but need data for table’s cells
Knowing these totals isn’t good enough to meet the requirements of the Gingles test
We actually need to know which candidates voters supported
Why? So we can determine whether White voters and minority voters were cohesive blocks supporting different candidates (criteria 2 and 3)
For precinct \(i\) (\(i\) \(=\) 1, 2, 3, …\(p\)), and Candidate A
Observed variables
For precinct \(i\) (\(i\) \(=\) 1, 2, 3, …\(p\)), and Candidate A
Unobserved quantities of interest
unobserved quantities of interest (\(\beta_{i}^{b}\) and \(\beta_{i}^{w}\))
from the aggregate observed variables (\(X_{i}\), \(T_{i}\) and \(N_{i}\))
\[\hat{T} = \beta^b_{i}X_{i} + \beta^w_{i}X_{i}(1-X_{i})\]
This equation can be re-written as
\[\hat{T} = \beta^w_{i} + (\beta^b_{i} - \beta^w_{i})X_{i}\]
Coefficients are intended to be:
\(\beta^b_{i}\) \(\equiv\) Black turnout in each precinct
\(\beta^w_{i}\) \(\equiv\) White turnout per precinct
The issue with Goodman’s regression
Assumes equal number of residents across precincts (\(N_{i}\))
If \(N\) differs across \(i\), Goodman’s regression cannot estimate correct quantities of interest
Widely acknowledged to be an improvement over the Goodman ’s approach
But standard EI does not work well for elections involving more than 2 candidates and/or racial/ethnicity groups
beyond the 2 candidates (e.g., at-large city council seats are sometimes multi-winner elections)
beyond 2 racial/ethnic group (e.g., Black, White, Latino)
Let’s explore these different EI frameworks in RStudio!
Load the eiCompare package (Collingwood et al. 2016)
(You might also need to install the latest version of the Rtools package)
Load (re-coded) Handley data; add it to the object “data”
eiCompare requires that vote variables be on a 0-1 scale (not scaled to range from 0 to 100).
dat <- read_csv("C:/Users/rjb6233/Google Drive/Political Science Classes (Taught or Being Prepped)/PSU Courses/RPV Crash Course/Data/Handley/PracticeData-ReCoded.csv")
dat$pVoteA <- dat$pVoteA/100
dat$pVoteB <- dat$pVoteB/100
dat$pBlackVAP <- dat$pBlackVAP/100
dat$pWhiteVAP <- dat$pWhiteVAP/100
View(dat)
So just divide your current variables by 100.
The goal: see how candidates A and B did with voters
Use the ei_est_gen() function, a generalized version of King’s (1997) ei() package. It entails:
a vector of candidate names (A vs. B),
the column for the total number of voters,
the “data.frame” object holding the data, and
the table names used to display the results.
First, run the generalized version of iterative EI (King 1997)
iter <- ei_iter(
data = dat,
cand_cols = c("pVoteA", "pVoteB"),
race_cols = c("pBlackVAP", "pWhiteVAP"),
totals_col = "total_votes",
name = "Iterative EI"
)
This takes time because the model has to “converge”
Place results in an object (“iter”)
Results are stored for now (we will look at them later)
Next, I run the Rows by Columns (RxC) extension of EI
rxc <- ei_rxc(
data = data,
cand_cols = c("pVoteA", "pVoteB"),
race_cols = c("pBlackVAP", "pWhiteVAP"),
totals_col = "total_votes",
name = "RxC EI",
)
Specify columns for race, candidates, and total votes
Place results in an object (“rxc”)
Again, we store results for later
Finally, compare the results side-by-side in a summary table
Or, visualize them with a dot plot and 95% CIs
Putting all the results into context
Final conclusion: candidate preference is racially polarized
Black residents in the precinct prefer Candidate A
White residents prefer Candidate B
This holds true, no matter what RPV analysis we conduct
Collingwood, Loren, Kassra Oskooii, Sergio Garcia-Rios, and Matt Barreto. 2016. “eiCompare: Comparing Ecological Inference Estimates across EI and EI:R×C.” The R Journal 8(2): 92-101.
Goodman, Leo A. “Some Alternatives to Ecological Correlation.” 1959. American Journal of Sociology 64(6): 610–625.
King, Gary. 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton, NJ: Princeton University Press.