Voting Machines Analysis

Introduction

I have seen a couple of analyses of US voting data, suggesting that voting machines have an effect. I’m sceptical of this, but thought it was worth checking. So, this is an attempt to replicate the analysis by Cornelius Hunter. As is typical, most of the time was spent formatting the data.

Data

The voting data is from GitHub (thanks, Tony McGovern, for your efforts): I downloaded it locally (there is a hidden setwd() in the Markdown doc that makes this page, but don’t tell anyone).

# Read vote data and remove the columns we don't want
Data2016 <- read.csv("2016_US_County_Level_Presidential_Results.csv")
Data2020 <- read.csv("2020_US_County_Level_Presidential_Results.csv")
Keep <- c("votes_dem", "votes_gop", "per_dem","per_gop")
Data2016 <- Data2016[,c(Keep, "combined_fips")]
Data2020 <- Data2020[,c(Keep, "state_name", "county_name", "county_fips")]

Data <- merge(x=Data2016, y=Data2020, by.x="combined_fips", 
              by.y = "county_fips", suffixes = c(".2016", ".2020"))
# this will be the response variable, the difference in votes in percent 
Data$diff <- 100*(Data$per_gop.2020 - Data$per_gop.2016)
# make a varia ble for unique counties
Data$State_County <- paste(Data$state_name, Data$county_name, sep="_")

The voting machine data is (rom the US Election Assistance Comission (again, thanks faceless civil servants!). I had to copy from the web page, paste into a spreadsheet and save as a .csv.

There is a lot of formatting here, mainly to make the counties line up (changing the names by hand)

VMData <- read.csv("VotingMachineData.csv")
VMData$State_County <- paste(VMData$State, VMData$County, sep="_")

# Correct county names to make consistent
VMData$State_County <- gsub("'", "", VMData$State_County, fixed = TRUE)
Data$State_County <- gsub("'", "", Data$State_County, fixed = TRUE)
VMData$State_County <- gsub("St ", "St. ", VMData$State_County, fixed = TRUE)
Data$State_County <- gsub("City$", "city", Data$State_County)
VMData$State_County <- gsub("City$", "city", VMData$State_County)
VMData$State_County <- gsub("Dekalb", "DeKalb", VMData$State_County, fixed = TRUE)
VMData$State_County <- gsub("Delaware_New Castle", "Delaware_New Castle County", VMData$State_County, fixed = TRUE)
VMData$State_County <- gsub("Illinois_DeWitt County", "Illinois_De Witt County", VMData$State_County, fixed = TRUE)
VMData$State_County <- gsub("Illinois_DeWitt County", "Illinois_De Witt County", VMData$State_County, fixed = TRUE)
VMData$State_County <- gsub("North Dakota_Lamoure County", "North Dakota_LaMoure County", VMData$State_County, fixed = TRUE)
VMData$State_County <- gsub("Pennsylvania_Mckean County", "Pennsylvania_McKean County", VMData$State_County, fixed = TRUE)
VMData$State_County <- gsub("Pennsylvania_Philadelphia", "Pennsylvania_Philadelphia County", VMData$State_County, fixed = TRUE)
VirginiaChange <- !VMData$State_County%in%Data$State_County & grepl("^Virginia", VMData$State_County)
VMData$State_County[VirginiaChange] <- paste(VMData$State_County[VirginiaChange], "city")

Dr. Hunter doesn’t use the machine types listed by the EAC, so I have to convert the machine types. THe choice seems inconsistent: for some machines the split is by different levels of version number. I don’t know why this was done, but assume there is a sensible reason.

# Make variable with voting machine type
VMData$Manufacturer[is.na(VMData$Manufacturer)] <- "None"
VMData$Machine <- paste(VMData$Manufacturer, 
                            VMData$Product, VMData$Version, sep="_")

# Function to extract verson numbers up to the level required
GetVers <- function(str, ver=1) {
  if(is.na(str)) str <- "0.0"
  st <- strsplit(str, "[.]")[[1]]
  if(length(st)<ver) ver <- 1
  paste(st[1:ver], collapse=".")
}

VMData$VersionMajor <- sapply(VMData$Version, GetVers)
VMData$VersionMinor <- sapply(VMData$Version, GetVers, ver=2)

VMData$VersionUse <- VMData$VersionMinor
VMData$VersionUse[VMData$Product == "ClearVote"] <- 1
VMData$VersionUse <- gsub("-.", "-*", VMData$VersionUse)
VMData$VersionUse[VMData$Manufacturer == "ES&S"] <- VMData$VersionMajor[VMData$Manufacturer == "ES&S"]
VMData$VersionUse[VMData$Manufacturer == "Hart"] <- 
  2+0.3*(as.numeric(VMData$VersionMinor[VMData$Manufacturer == "Hart"])<2.25)
VMData$VersionUse[VMData$Manufacturer == "Unisyn"] <- VMData$VersionMajor[VMData$Manufacturer == "Unisyn"]
VMData$VersionUse <- gsub(" Modification", "", VMData$VersionUse)

VMData$MachineHunter <- paste(VMData$Manufacturer, 
                            VMData$Product, VMData$VersionUse, sep="_")

One we have that, we can merge the data sets. Cointies without voting machine data are set to it being not available, and this is set to the reference level in the Machine factor.

VotingData <- merge(Data, VMData, by="State_County", all.x = TRUE)
VotingData$Machine[is.na(VotingData$Machine)] <- "Not available"
VotingData$MachineHunter[is.na(VotingData$MachineHunter)] <- "Not available"

VotingData$MachineF <- factor(VotingData$Machine)
VotingData$MachineF <- relevel(VotingData$MachineF, ref="Not available")

VotingData$MachineHunterF <- factor(VotingData$MachineHunter)
VotingData$MachineHunterF <- relevel(VotingData$MachineHunterF, ref="Not available")

The Analysis

First, a pair of functions. The first takes a model and calculates the effects of the voting machines. The second plots thesew. Neither function is especially portable, but save a bit of space.

CalcMachineEffects <- function(mod, extreme = 0.1) {
  CIs <- confint(mod)
  CIs <- CIs[!grepl("^\\.", rownames(CIs)),]
  Coefs <- summary(mod)$coefficients 
  Summ <- cbind(Coefs, CIs[!is.na(CIs[,1]),])
  MachineEffects <- data.frame(Summ[grep("^Machine", rownames(Summ)),])
  MachineEffects$At.y <- seq_len(nrow(MachineEffects))
  MachineEffects$Col <- 1 + (abs(MachineEffects$Estimate)>extreme)
  rownames(MachineEffects) <- gsub("^MachineHunterF", "", rownames(MachineEffects))
  rownames(MachineEffects) <- gsub("^MachineF", "", rownames(MachineEffects))
  MachineEffects
}

PlotEffects <- function(effs, labely =TRUE) {
  if(!exists("Col", effs)) effs$Col <- 1
  plot(effs$Estimate, effs$At.y, ann=FALSE, yaxt="n",
       xlim=range(effs[,c("X2.5..", "X97.5..")]), 
       col=effs$Col)
  segments(effs[,"X2.5.."], effs$At.y, 
           effs[,"X97.5.."], effs$At.y, 
           col=effs$Col)
  if(labely) axis(2, effs$At.y, labels = gsub("_", " ", rownames(effs)), las=1, 
       col.ticks =effs$Col, cex=0.5)
}

Now on to the analyses. I decided to use State as a random effect. The variation in state effect is much larger than the variation in machine effects. I plot the machine effects below.

library(lme4)
modState <- lmer(diff ~ (1|state_name) + MachineHunterF, data=VotingData)
modNoState <- lm(diff ~ MachineHunterF, data=VotingData)

MachineEffectsState <- CalcMachineEffects(modState, extreme = 1.3)
MachineEffectsNoState <- CalcMachineEffects(modNoState, extreme = 1.3)

par(mar=c(2.1,1.1,2,1), oma=c(2,10,0,0), mfrow=c(1,2))
PlotEffects(MachineEffectsNoState)
mtext("No State Effect in Model", 3)
PlotEffects(MachineEffectsState, labely = FALSE)
mtext("State Effect in Model", 3)
mtext("Effect on Difference in Trump vote (%)", 1, outer=TRUE)

We can see that in the model without state, the Dominion machines D-Suite 5.5-A and 5.5-B so have the most negative effect, i.e. they appear to pull the vote more towards Biden. But when we add State as an effect, this disappears (and if anything the D-Suite 4.14 series most favour Trump).

Compared to Dr. Hunter’s analysis, the effect looks a bit smaller, and the confidence intervals are larger. The effects of the other machine types is obscured by a lack of good labelling (grumble, grumble…). The disappearence of the effect when the state is taken into account suggests that the effect is not robust - basically I would want a better model before I could be at all confidence if there was (or was not) an effect of voting machine. There are probably all sorts of other factors that affected the changes in votes.

Analysis With All Machines Types

I was wondering about the choice to merge different voting machines, so I also ran a model with all of the machine types separately. These are the results.

modStateAll <- lmer(diff ~ (1|state_name) + MachineF, data=VotingData)
modNoStateAll <- lm(diff ~ MachineF, data=VotingData)

AllMachineEffectsState <- CalcMachineEffects(modStateAll, extreme = 10)
AllMachineEffectsNoState <- CalcMachineEffects(modNoStateAll, extreme = 10)

par(mar=c(2.1,1.1,2,1), oma=c(2,10,0,0), mfrow=c(1,2))
PlotEffects(AllMachineEffectsNoState)
mtext("No State Effect in Model", 3)
PlotEffects(AllMachineEffectsState, labely = FALSE)
mtext("State Effect in Model", 3)
mtext("Effect on Difference in Trump vote (%)", 1, outer=TRUE)

Yes, the ES&S machine EVS 5.4.0.0 stands out with over a 25% shift to Trump! It must be fraud!

Errm, no. It was only used in one county, Starr County in Texas. Quite what caused such a huge swing I’ve no idea, but a county about the size of Scunthorpe would have about the same effect as Scunthorpe would have on the US election.

Conclusions

The evidence for an effect of voting machine isn’t robust. I am not a political scientist, so there may be all sorts of other factors that should be included in the model, and which could be correlated with the machine used. But without a strong reason for saying why one particular type of machine would be a problem, I’m still not convinced.