<- 2000 # poll size
n <- 1000000 # No. Monte Carlo Replicates
n_trials
# Population parameters:
<- 3619915 # No. Bernoulli trials
nVoters <- 1617989 # No. Nationalists
nNationalist
# Monte Carlo loop
<- rbinom(n_trials, size = n, prob=nNationalist/nVoters)/n sample_yes
Polling
Quarto
Study 3: Political Polling & Sampling Distribution of Voting Estimator for Bernoulli Trials
The 2014 independance referendum saw 2,001,926 No and 1,617,989 Yes votes Here I’ll simulate a polls with 2000 participants and estimate the outcome. Each poll attempts to find the ‘Yes’ voting intention \(p_0=0.447\).
- the voting intention estimate is simply: \[\boxed{\hat{p} = \frac{1}{n} \sum_{i=1}^n X_i\quad\text{where}\:X_i\sim \mathit{Bin}(3619915,0.447)}\]
- I’ll generate a pseudo statistic \(T'\) which normalises the mean estimator using the sample variance value \(S^2\) and real population mean \(\mu\) (it can’t be a real statistic since it uses the known true \(\mu\)): \[ \boxed{ \begin{aligned} &T=\frac{\bar{X}-\mu}{S}\quad\text{where}\dots\\ &\qquad\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i\\ &\qquad S^2 =\frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 \end{aligned} }\]
- I’ll generate another pseudo statistic \(Z'\) which normalises the mean estimator using the real population mean value \(\mu\) and the real population variance value \(\sigma^2\) (think about \(T\) as a NHST z-statistic to test null hypothesis \(\mu_0=\mu\)): \[\boxed{T=\frac{\bar{X}-\mu}{\sigma}\quad\text{where}\quad\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i}\]
Again I’ll get an idea of the continuous sampling distribution of each of the estimators by looking at discrete histograms of the simulation results…
True Yes Vote: 0.4470
Average Poll Yes Vote = 0.4470, MSE = 0.0001
The general form of the mean estimate normalisation is: \[ Y=\frac{\bar{X}-\mu}{\hat{\sigma}} \]
When we normailse using the true variance the probability distrbution of \(Y\) is simply a standard normal distribution:
\[ \begin{align} \text{when }\hat{\sigma}=\sigma:&\\ &Y=\frac{\bar{X}-\mu}{\hat{\sigma}}\quad\Rightarrow\quad\boxed{\frac{\bar{X}-\mu}{\sigma}\sim Z} \end{align} \] When we normalise using the sample variance, the probability distrbution of \(Y\) gives the expression for Student’s t distribution exactly as we first presented it @mccabe2024ContProb, also see that the unbiased variance estimates have a \(\frac{\sigma^2}{n-1}\chi^2_{n-1}\) distribution:
\[ \begin{align} \text{when }&\hat{\sigma}^2=\text{Var}(X):\\ &\quad Y=\frac{\bar{X}-\mu}{\hat{\sigma}}\quad\Rightarrow\quad\frac{\bar{X}-\mu}{S}\\ &\text{the numerator follows a scaled z distribution}\\ &\quad\bar{X}-\mu\sim \sigma Z\\ &\text{the numerator follows a scaled chi-squared distribution as seen in the first experiment}\\ &\quad S^2\sim \frac{\sigma^2}{n-1}\chi^2_{n-1}\quad\Rightarrow\quad S\sim\sigma\sqrt{\frac{\chi_{n-1}^2}{n-1}}\\\:\\ &\text{this give Student's-t distribution as presented in reference:}\\ &\qquad \frac{\bar{X}-\mu}{S}\sim\frac{Z}{\sqrt{\frac{\chi_{n-1}^2}{n-1}}}\qquad \text{where}\quad\underbrace{t_{n-1}=\frac{Z}{\sqrt{\frac{\chi_{n-1}^2}{n-1}}}}_{\text{by definition}}\\ &\therefore\qquad\boxed{\frac{\bar{X}-\mu}{S}\sim t_{n-1}} \end{align} \] >TODO: i think the dof are out - the studentised mean is: \(t=\frac{\bar{X}-\mu}{S/\sqrt{n}}\)