Discrete RVs and the CLT

In this lab we’ll lab we’ll use the DiscreteRV package to observe the Central Limit Theorem and, time permitting, simulate an election that already happenned.

First, log in to rstudio.cloud and, using the lower right panel, install the discreteRV package.

Run each of the lines of code below and try answering each of the questions.

library(discreteRV)

Using the RV function you can create a set of outcomes and their association probabilities. Here’s one random variable that look like the roll of a die.

X <- RV(outcomes = 1:6, probs = 1/6)
plot(X)

You can also find the expected value, variance and standard deviations of random variables. The (long) way to get the expected value is to use the formula $E[x] = \sum x P(x)$ as follows:

sum(probs(X)*outcomes(X))

## [1] 3.5

The (way) is to use the Expectation function in the discreteRV package:

E(X)

## [1] 3.5

Similarly, you can calculate variance by our formula $\sum P(x)(x-E[x])^2$:

sum(probs(X)*(outcomes(X) - E(X))^2)

## [1] 2.916667

or you can use the Variance function and tqke the square root to get the standard deviation

V(X)

## [1] 2.916667

sqrt(V(X))

## [1] 1.707825

What if you roll two dice and add the results? You could treat this as the sum of two independent and identically distributed (iid) random variables.

two_dice <- SofIID(X, n=2)
plot(two_dice)

What if you roll 20 dice and add them all up? What will this distribution look like?

twenty_dice <- SofIID(X, n=20)
plot(twenty_dice)

According to the Central Limit Theorem (usually just the CLT among friends) the sum or mean of a sufficiently large number iterations of ANY random variable will be approximately normally distributed. The approximation can be as good as you want it to be, you’ll just need to increase your definition of “sufficiently large” accordingly.

Our die roll random variable is no exception and the sum of 20 rolls already appears to be roughly normal.

You can also find the probability that a random variable will exceed a certain value of fall within a certain range.

P(twenty_dice > 80)

## [1] 0.08509149

P((twenty_dice>= 60) %AND% (twenty_dice <=80))

## [1] 0.829817

P((twenty_dice < 60) %OR% (twenty_dice > 80))

## [1] 0.170183

Questions:

Find the expected value, variance and standard deviation of the sum of twenty die rolls.

How do these values compare with the expected value, variance and standard deviation of one die roll?

What is the probability that the sum of twenty die rolls is more than one standard deviation above its expected value?

How does this compare to the probability that you would calculate by assuming normality and using the pnorm function?

Create a random variable representing a spin of the roulette wheel as follows:

R <- RV(outcomes = c(-1, 1), probs=c(20/38, 18/38))

In this game, each time you spin the wheel you lose $1 20/38ths of the time and win $1 18/38ths of the time. What is the probability that you end up winning money if you play this game 20 times?

Simulating the 2024 Election (with some stylized facts)

On the night before the election, according to your model, Harris has 15 states, the District of Columbia, and 199 electoral college votes locked up. DeSantis has 19 states and 147 electoral college votes in the bag. The other 16 states can each be represented as a random variable (as shown below):

New_Mexico <- RV(outcomes=c(0,5), probs=c(0.01, 0.99))
Ohio <- RV(outcomes=c(0,18), probs=c(0.01, 0.99))
Michigan <- RV(outcomes=c(0,16), probs=c(0.02, 0.98))
Pennsylvania <- RV(outcomes=c(0,20), probs=c(0.05, 0.95))
Iowa <- RV(outcomes=c(0,6), probs=c(0.05, 0.95))
New_Hampshire <- RV(outcomes=c(0,4), probs=c(0.05, 0.95))
Nevada <- RV(outcomes=c(0,6), probs=c(0.05, 0.95))
Oregon <- RV(outcomes=c(0,7), probs=c(0.07, 0.93))
Colorado <- RV(outcomes=c(0,9), probs=c(0.10, 0.90))
Virginia <- RV(outcomes=c(0,13), probs=c(0.30, 0.70))
Florida <- RV(outcomes=c(0,29), probs=c(0.80, 0.20))
South_Carolina <- RV(outcomes=c(0,9), probs=c(0.94, 0.06))
North_Carolina <- RV(outcomes=c(0,15), probs=c(0.95, 0.05))
Arizona <- RV(outcomes=c(0,11), probs=c(0.99, 0.01))
Georgia <- RV(outcomes=c(0,16), probs=c(0.99, 0.01))
Kentucky <- RV(outcomes=c(0,8), probs=c(0.99, 0.01))

You can then create a random variable the is the sum of Harris’s electoral college votes and plot it:

Harris <- 199+New_Mexico+Ohio+Michigan+Pennsylvania+Iowa+New_Hampshire+
  Nevada+Oregon+Colorado+Virginia+Florida+South_Carolina+
  North_Carolina+Arizona+Georgia+Kentucky

plot(Harris)

What is Harris’s chance of winning the election (note that she needs 270 electoral college votes in order to win)?
What is Harris’s chance of winning 300 or more electoral college votes?
What are the shortcomings of this model? Assuming that the probabilities for each state are correct does this model accurately predict Harris’s chances of winning, underestimate her chances or overestimate her chances?

Discrete RVs and the CLT

Jared Cross

January 24, 2022