The 2020 presidential election Milwaukee data was downloaded from here. This notebook is in response to the brief analysis appearing here.
Load library for animation:
library(gifski)
## Warning: package 'gifski' was built under R version 4.0.3
Import the data from csv:
dat <- read.csv("milw.csv")
names(dat)
## [1] "ï..Joseph.R..Biden...Kamala.D..Harris"
## [2] "Donald.J..Trump...Michael.R..Pence"
## [3] "Don.Blankenship...William.Mohr"
## [4] "Jo.Jorgensen...Jeremy.Spike.Cohen"
## [5] "Brian.Carroll...Amar.Patel"
## [6] "WRITE.IN"
candidates <- c("Biden", "Trump", "Don","Jo","Brian","WriteIn")
names(dat) <- candidates
This function returns all the first digits of its input vector:
get_first_digit <- function(x) {
strtoi(substr(paste(x),1,1))
}
The Biden counts per ward indeed do not follow Benford’s law:
for (i in candidates) {
hist(get_first_digit(dat[[i]]),
main = paste0("Benford -- ",i))
}
Get the ward population sizes (number of votes)
ward_pop <- dat$Biden + dat$Trump + dat$Don + dat$Jo + dat$Brian
num_wards <- length(ward_pop)
hist(ward_pop, main = "Ward populations (number of votes)")
Biden’s empirical probability of winning a vote is high
biden_proportion <- dat$Biden / ward_pop
barplot(biden_proportion, main = "Proportion Biden votes by ward")
hist(biden_proportion, main = "Proportion Biden votes")
Use binomial distribution with fixed probability p of success to simulate the number of votes received by Biden in each ward
simulate_election <- function(p) {
votes <- vector(mode = "numeric", length = num_wards)
for (s in 1:num_wards) {
votes[s] <- rbinom(1, prob = p, size = ward_pop[s])
}
return(votes)
}
Below we can see Benford’s law disappearing only as a function of the probability of success p, in a randomised simulation without any election fraud. Benford’s law disappears because the population sizes in Milwaukee wards are too small to accommodate Benford’s law when the candidate has more than around 50% chance of receiving a vote on average.
for (p in seq(0.05,0.9, by = 0.02)) {
hist(get_first_digit(simulate_election(p)), main = paste0("p = ", p))
}
We can simulate the election using Biden’s empirical success probability, taken as the average proportion of votes received by Biden.
biden_success_p <- mean(biden_proportion, na.rm = T)
cat("Average proportion of votes received by Biden = ", biden_success_p)
## Average proportion of votes received by Biden = 0.7309322
biden_votes_sim <- simulate_election(biden_success_p)
hist(biden_votes_sim, main = "Simulated Biden vote counts")
hist(get_first_digit(biden_votes_sim),
main = "Benford's law with empirical Biden success rate")
We can do the same simulation with Trump’s empirical success rate, to see if it does obey Benford’s law:
trump_proportion <- dat$Trump / ward_pop
trump_success_p <- mean(trump_proportion, na.rm = T)
cat("Trump's empirical success rate is ", trump_success_p)
## Trump's empirical success rate is 0.2572573
trump_votes_sim <- simulate_election(trump_success_p)
hist(trump_votes_sim, main = "Simulated Trump vote counts")
hist(get_first_digit(trump_votes_sim),
main = "Benford's law with empirical Trump success rate")