Polling

Author

DMcCabe

Quarto

Study 3: Political Polling & Sampling Distribution of Voting Estimator for Bernoulli Trials

The 2014 independance referendum saw 2,001,926 No and 1,617,989 Yes votes Here I’ll simulate a polls with 2000 participants and estimate the outcome. Each poll attempts to find the ‘Yes’ voting intention \(p_0=0.447\).

the voting intention estimate is simply: \[\boxed{\hat{p} = \frac{1}{n} \sum_{i=1}^n X_i\quad\text{where}\:X_i\sim \mathit{Bin}(3619915,0.447)}\]
I’ll generate a pseudo statistic \(T'\) which normalises the mean estimator using the sample variance value \(S^2\) and real population mean \(\mu\) (it can’t be a real statistic since it uses the known true \(\mu\)): \[ \boxed{ \begin{aligned} &T=\frac{\bar{X}-\mu}{S}\quad\text{where}\dots\\ &\qquad\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i\\ &\qquad S^2 =\frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 \end{aligned} }\]
I’ll generate another pseudo statistic \(Z'\) which normalises the mean estimator using the real population mean value \(\mu\) and the real population variance value \(\sigma^2\) (think about \(T\) as a NHST z-statistic to test null hypothesis \(\mu_0=\mu\)): \[\boxed{T=\frac{\bar{X}-\mu}{\sigma}\quad\text{where}\quad\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i}\]

Again I’ll get an idea of the continuous sampling distribution of each of the estimators by looking at discrete histograms of the simulation results…

n <- 2000                    # poll size
n_trials <- 1000000         # No. Monte Carlo Replicates

# Population parameters:
nVoters <- 3619915         # No. Bernoulli trials
nNationalist <- 1617989    # No. Nationalists

# Monte Carlo loop
sample_yes <- rbinom(n_trials, size = n, prob=nNationalist/nVoters)/n

True Yes Vote: 0.4470

Average Poll Yes Vote = 0.4470, MSE = 0.0001

Note

The general form of the mean estimate normalisation is: \[ Y=\frac{\bar{X}-\mu}{\hat{\sigma}} \]

When we normailse using the true variance the probability distrbution of \(Y\) is simply a standard normal distribution:

\[ \begin{align} \text{when }\hat{\sigma}=\sigma:&\\ &Y=\frac{\bar{X}-\mu}{\hat{\sigma}}\quad\Rightarrow\quad\boxed{\frac{\bar{X}-\mu}{\sigma}\sim Z} \end{align} \] When we normalise using the sample variance, the probability distrbution of \(Y\) gives the expression for Student’s t distribution exactly as we first presented it @mccabe2024ContProb, also see that the unbiased variance estimates have a \(\frac{\sigma^2}{n-1}\chi^2_{n-1}\) distribution:

\[ \begin{align} \text{when }&\hat{\sigma}^2=\text{Var}(X):\\ &\quad Y=\frac{\bar{X}-\mu}{\hat{\sigma}}\quad\Rightarrow\quad\frac{\bar{X}-\mu}{S}\\ &\text{the numerator follows a scaled z distribution}\\ &\quad\bar{X}-\mu\sim \sigma Z\\ &\text{the numerator follows a scaled chi-squared distribution as seen in the first experiment}\\ &\quad S^2\sim \frac{\sigma^2}{n-1}\chi^2_{n-1}\quad\Rightarrow\quad S\sim\sigma\sqrt{\frac{\chi_{n-1}^2}{n-1}}\\\:\\ &\text{this give Student's-t distribution as presented in reference:}\\ &\qquad \frac{\bar{X}-\mu}{S}\sim\frac{Z}{\sqrt{\frac{\chi_{n-1}^2}{n-1}}}\qquad \text{where}\quad\underbrace{t_{n-1}=\frac{Z}{\sqrt{\frac{\chi_{n-1}^2}{n-1}}}}_{\text{by definition}}\\ &\therefore\qquad\boxed{\frac{\bar{X}-\mu}{S}\sim t_{n-1}} \end{align} \] >TODO: i think the dof are out - the studentised mean is: \(t=\frac{\bar{X}-\mu}{S/\sqrt{n}}\)