Simulating Country Voting in R

Our goal is to know on average how many votes each candidate will get if we vote completely randomly. We have a few variables that can change:

Number of Candidates
Number of Voting Countries
Number of Simulation Runs

The number of voting countries shouldn't make any difference, but we can simulate it anyway. We could also simulate the number of votes being different from 5, for sake of generality.

simproportions <- function(nCandidates, nCountries, nReps) {
    nVotes <- 5
    cand <- 1:nCandidates
    allmeans <- matrix(nrow = nReps, ncol = 1)
    for (nr in 1:nReps) {
        res <- matrix(nrow = nCountries, ncol = nVotes)
        for (nc in 1:nCountries) {
            res[nc, ] <- sample(cand, size = nVotes, replace = FALSE)
        }
        allmeans[nr] <- mean(tabulate(res, nbins = nCandidates))/nCountries
    }
    return(mean(allmeans))
}

Now, we'd like to simulate this over a range of values for Candidates:

nCandidates <- 5:100
res <- data.frame(Candidates = nCandidates, PropVotes = NA)

for (nc in nCandidates) {
    res[res$Candidates == nc, "PropVotes"] <- simproportions(nc, 180, 50)
}

We can plot these simulated values:

plot(PropVotes ~ Candidates, data = res, type = "l", xlab = "Number of Candidates", 
    ylab = "Proportion of Votes Received")
title("Vote Proportion: Chosen at Random")

plot of chunk unnamed-chunk-3

Having coded this, let's see if we can figure out the math behind it. We have \( k \) votes and \( N \) candidates. What is the probability of picking a certain candidate?

\( P(\mbox{Picked}) = 1 - P(\mbox{Not Picked}) \)

Let's start with the basic case of having \( N=6 \) candidates and \( k=5 \) votes, and let's look at the probability that the first candidate was picked. On the first attempt, there is a \( 5/6 \) chance that the first item is not picked. On the second attempt, there are only 5 items remaining, so there is now a \( 4/5 \) chance that this is not picked. For all five votes, we then have:

\( P(\mbox{Picked}) = 1 - \frac{5}{6} \times \frac{4}{5} \times \frac{3}{4} \times \frac{2}{3} \times \frac{1}{2} \)

Notice that we can combine this fraction as follows:

\( P(\mbox{Picked}) = 1 - \frac{5 \times 4 \times 3 \times 2 \times 1}{6 \times 5 \times 4 \times 3 \times 2} \)

Which then simplifies to:

\( P(\mbox{Picked}) = 1 - \frac{5 \times 4 \times 3 \times 2 \times 1}{5 \times 4 \times 3 \times 2 \times 6} = 1-\frac{1}{6} = \frac{5}{6} \)

In general:

\( P(\mbox{Picked}) = 1 - \prod_1^k \frac{N-i}{N-i+1} \)

We can verify that this formula produces the same values we reached by simulating:

computeproportions <- function(N, k) {
    val = 1
    for (i in 1:k) {
        val <- val * (N - i)/(N - i + 1)
    }
    return(1 - val)
}

res$PropVotesCompute <- NA
for (nc in nCandidates) {
    res[res$Candidates == nc, "PropVotesCompute"] <- computeproportions(nc, 
        5)
}

We can plot these against each other to verify that we have computed the function correctly:

plot(res$PropVotes, res$PropVotesCompute, xlab = "Simulated", ylab = "Numerically Computed")
title("Vote Proportion")
abline(0, 1)

plot of chunk unnamed-chunk-5