15.3 Exercises
1. Suppose you poll a population in which a proportion \(𝑝\) of voters are Democrats and \(1−𝑝\) are Republicans. Your sample size is \(N=25\). Consider the random variable \(S\) which is the total number of Democrats in your sample. What is the expected value of this random variable? Hint: it’s a function of \(p\). \(E[\overline{S}]=N*p=25p\)
2. What is the standard error of \(𝑆\)? Hint: it’s a function of \(𝑝\). \(SE[\overline{S}]=\sqrt{p(1-p)*1/n})\) produces \(SE[\overline{S}]=\sqrt{p(1-p)*1/25})\)
3. Consider the random variable \(S/N\). This is equivalent to the sample average, which we have been denoting as \(\overline{X}\). What is the expected value of the \(\overline{X}\)? Hint: it’s a function of \(𝑝\). \(E[{S/N}]=E[\overline{X}]=25*p/25=p\)
4. What is the standard error of \(\overline{X}\)? Hint: it’s a function of \(P\). \(SE[S/N]=SE[\overline{X}]=\sqrt{p(1-p)*1/25})\)
5. Write a line of code that gives you the standard error
se for the problem above for several values of \(𝑝\), specifically for
p <- seq(0, 1, length=100). Make a plot of
se versus p.
p<-seq(0, 1, length=100)
SE<-sqrt(p*(1-p)/25)
plot(p, SE)
6. Copy the code above and put it inside a for-loop to make the plot for \(𝑁=25\), \(N=100\), and \(N=1000\).
for (N in c(25, 100, 1000)) {
p<-seq(0, 1, length=100)
SE<-sqrt(p*(1-p)/N)
plot(p, SE)
}
7. If we are interested in the difference in proportions, \(𝑝−(1−𝑝)\), our estimate is \(𝑑=\overline{X}−(1−\overline{X})\). Use the rules we learned about sums of random variables and scaled random variables to derive the expected value of \(𝑑\). \(E[{\overline{X}−(1−\overline{X})}]=E[2\overline{X}-1]=p-1+p=2p-1\)
8. What is the standard error of \(𝑑\)? \(SE[d]=SE[\overline{X}−(1−\overline{X})]=2*SE[\overline{X}]=2\sqrt{p(1-p)/N}\)
9. If the actual \(𝑝=.45\), it means the Republicans are winning by a relatively large margin since \(d=−.1\), which is a 10% margin of victory. In this case, what is the standard error of \(2\hat{X}−1\) if we take a sample of \(N=25\)?
p<-.45
se<-2*sqrt(((p)*(1-p))/25)
se
## [1] 0.1989975
10. Given the answer to 9, which of the following best describes your strategy of using a sample size of \(𝑁=25\)?
b. Our standard error is larger than the difference, so the chances of \(2\overline{X}−1\) being positive and throwing us off were not that small. We should pick a larger sample size.