15.3 Exercises

1. Suppose you poll a population in which a proportion \(𝑝\) of voters are Democrats and \(1−𝑝\) are Republicans. Your sample size is \(N=25\). Consider the random variable \(S\) which is the total number of Democrats in your sample. What is the expected value of this random variable? Hint: it’s a function of \(p\). \(E[\overline{S}]=N*p=25p\)

2. What is the standard error of \(𝑆\)? Hint: it’s a function of \(𝑝\). \(SE[\overline{S}]=\sqrt{p(1-p)*1/n})\) produces \(SE[\overline{S}]=\sqrt{p(1-p)*1/25})\)

3. Consider the random variable \(S/N\). This is equivalent to the sample average, which we have been denoting as \(\overline{X}\). What is the expected value of the \(\overline{X}\)? Hint: it’s a function of \(𝑝\). \(E[{S/N}]=E[\overline{X}]=25*p/25=p\)

4. What is the standard error of \(\overline{X}\)? Hint: it’s a function of \(P\). \(SE[S/N]=SE[\overline{X}]=\sqrt{p(1-p)*1/25})\)

5. Write a line of code that gives you the standard error se for the problem above for several values of \(𝑝\), specifically for p <- seq(0, 1, length=100). Make a plot of se versus p.

p<-seq(0, 1, length=100)
SE<-sqrt(p*(1-p)/25)
plot(p, SE)

6. Copy the code above and put it inside a for-loop to make the plot for \(𝑁=25\), \(N=100\), and \(N=1000\).

for (N in c(25, 100, 1000)) {
  p<-seq(0, 1, length=100)
  SE<-sqrt(p*(1-p)/N)
  plot(p, SE)
}

7. If we are interested in the difference in proportions, \(𝑝−(1−𝑝)\), our estimate is \(𝑑=\overline{X}−(1−\overline{X})\). Use the rules we learned about sums of random variables and scaled random variables to derive the expected value of \(𝑑\). \(E[{\overline{X}−(1−\overline{X})}]=E[2\overline{X}-1]=p-1+p=2p-1\)

8. What is the standard error of \(𝑑\)? \(SE[d]=SE[\overline{X}−(1−\overline{X})]=2*SE[\overline{X}]=2\sqrt{p(1-p)/N}\)

9. If the actual \(𝑝=.45\), it means the Republicans are winning by a relatively large margin since \(d=−.1\), which is a 10% margin of victory. In this case, what is the standard error of \(2\hat{X}−1\) if we take a sample of \(N=25\)?

p<-.45
se<-2*sqrt(((p)*(1-p))/25)
se
## [1] 0.1989975

10. Given the answer to 9, which of the following best describes your strategy of using a sample size of \(𝑁=25\)?

b. Our standard error is larger than the difference, so the chances of \(2\overline{X}−1\) being positive and throwing us off were not that small. We should pick a larger sample size.