15.7 Exercises
For these exercises, we will use actual polls from the 2016 election. You can load the data from the dslabs package.
library(dslabs)
library(magrittr)
data("polls_us_election_2016")
Specifically, we will use all the national polls that ended within one week before the election.
library(tidyverse)
## Warning: package 'stringr' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract() masks magrittr::extract()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ purrr::set_names() masks magrittr::set_names()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
polls <- polls_us_election_2016 |>
filter(enddate >= "2016-10-31" & state=="U.S.")
N<-polls$samplesize[1]
x_hat<-polls$rawpoll_clinton[1]/100
Assume there are only two candidates and construct a 95% confidence interval for the election night proportion \(p\).
se_hat<-sqrt(x_hat*(1-x_hat)/N)
x_hat + c(-1, 1)*pnorm(0.975)*se_hat
## [1] 0.4611527 0.4788473
dplyr to add a confidence interval as two
columns, call them lower and upper, to the
object poll. Then use select to show the
pollster, enddate, x_hat,
lower, upper variables. Hint: define temporary
columns x_hat and se_hat.head(polls)
## state startdate enddate
## 1 U.S. 2016-11-03 2016-11-06
## 2 U.S. 2016-11-01 2016-11-07
## 3 U.S. 2016-11-02 2016-11-06
## 4 U.S. 2016-11-04 2016-11-07
## 5 U.S. 2016-11-03 2016-11-06
## 6 U.S. 2016-11-03 2016-11-06
## pollster grade samplesize
## 1 ABC News/Washington Post A+ 2220
## 2 Google Consumer Surveys B 26574
## 3 Ipsos A- 2195
## 4 YouGov B 3677
## 5 Gravis Marketing B- 16639
## 6 Fox News/Anderson Robbins Research/Shaw & Company Research A 1295
## population rawpoll_clinton rawpoll_trump rawpoll_johnson rawpoll_mcmullin
## 1 lv 47.00 43.00 4.00 NA
## 2 lv 38.03 35.69 5.46 NA
## 3 lv 42.00 39.00 6.00 NA
## 4 lv 45.00 41.00 5.00 NA
## 5 rv 47.00 43.00 3.00 NA
## 6 lv 48.00 44.00 3.00 NA
## adjpoll_clinton adjpoll_trump adjpoll_johnson adjpoll_mcmullin
## 1 45.20163 41.72430 4.626221 NA
## 2 43.34557 41.21439 5.175792 NA
## 3 42.02638 38.81620 6.844734 NA
## 4 45.65676 40.92004 6.069454 NA
## 5 46.84089 42.33184 3.726098 NA
## 6 49.02208 43.95631 3.057876 NA
polls %>% mutate(x_hat=polls$rawpoll_clinton/100, se_hat=sqrt(x_hat*(1-x_hat)/samplesize), lower=x_hat-pnorm(0.975)*se_hat, upper=x_hat+pnorm(0.975)*se_hat, hit=lower<=0.482 & upper>=0.482) %>% select(pollster, enddate, x_hat, lower, upper, hit) %>% summarize(mean(hit))
## mean(hit)
## 1 0.1857143
hit, to the previous table
stating if the confidence interval included the true proportion \(p=0.482\) or not.polls %>% mutate(x_hat=polls$rawpoll_clinton/100, se_hat=sqrt(x_hat*(1-x_hat)/samplesize), lower=x_hat-pnorm(0.975)*se_hat, upper=x_hat+pnorm(0.975)*se_hat, hit=lower<=0.482& upper>=0.482) %>% select(pollster, enddate, x_hat, lower, upper, hit)
## pollster enddate x_hat
## 1 ABC News/Washington Post 2016-11-06 0.4700
## 2 Google Consumer Surveys 2016-11-07 0.3803
## 3 Ipsos 2016-11-06 0.4200
## 4 YouGov 2016-11-07 0.4500
## 5 Gravis Marketing 2016-11-06 0.4700
## 6 Fox News/Anderson Robbins Research/Shaw & Company Research 2016-11-06 0.4800
## 7 CBS News/New York Times 2016-11-06 0.4500
## 8 NBC News/Wall Street Journal 2016-11-05 0.4400
## 9 IBD/TIPP 2016-11-07 0.4120
## 10 Selzer & Company 2016-11-06 0.4400
## 11 Angus Reid Global 2016-11-04 0.4800
## 12 Monmouth University 2016-11-06 0.5000
## 13 Marist College 2016-11-03 0.4400
## 14 The Times-Picayune/Lucid 2016-11-07 0.4500
## 15 USC Dornsife/LA Times 2016-11-07 0.4361
## 16 RKM Research and Communications, Inc. 2016-11-05 0.4760
## 17 CVOTER International 2016-11-06 0.4891
## 18 Morning Consult 2016-11-05 0.4500
## 19 SurveyMonkey 2016-11-06 0.4700
## 20 Rasmussen Reports/Pulse Opinion Research 2016-11-06 0.4500
## 21 Insights West 2016-11-07 0.4900
## 22 RAND (American Life Panel) 2016-11-01 0.4370
## 23 Fox News/Anderson Robbins Research/Shaw & Company Research 2016-11-03 0.4550
## 24 CBS News/New York Times 2016-11-01 0.4500
## 25 ABC News/Washington Post 2016-11-05 0.4700
## 26 Ipsos 2016-11-04 0.4300
## 27 ABC News/Washington Post 2016-11-04 0.4800
## 28 YouGov 2016-11-06 0.4290
## 29 IBD/TIPP 2016-11-06 0.4070
## 30 ABC News/Washington Post 2016-11-03 0.4700
## 31 IBD/TIPP 2016-11-03 0.4440
## 32 IBD/TIPP 2016-11-05 0.4300
## 33 ABC News/Washington Post 2016-11-02 0.4700
## 34 ABC News/Washington Post 2016-11-01 0.4700
## 35 ABC News/Washington Post 2016-10-31 0.4600
## 36 Ipsos 2016-11-03 0.4320
## 37 IBD/TIPP 2016-11-04 0.4420
## 38 YouGov 2016-11-01 0.4600
## 39 IBD/TIPP 2016-10-31 0.4460
## 40 Ipsos 2016-11-02 0.4550
## 41 Rasmussen Reports/Pulse Opinion Research 2016-11-03 0.4400
## 42 The Times-Picayune/Lucid 2016-11-06 0.4500
## 43 Ipsos 2016-11-01 0.4470
## 44 IBD/TIPP 2016-11-02 0.4400
## 45 IBD/TIPP 2016-11-01 0.4400
## 46 Rasmussen Reports/Pulse Opinion Research 2016-11-02 0.4200
## 47 Ipsos 2016-10-31 0.4400
## 48 The Times-Picayune/Lucid 2016-11-05 0.4500
## 49 Rasmussen Reports/Pulse Opinion Research 2016-10-31 0.4400
## 50 Google Consumer Surveys 2016-10-31 0.3769
## 51 CVOTER International 2016-11-05 0.4925
## 52 Rasmussen Reports/Pulse Opinion Research 2016-11-01 0.4400
## 53 CVOTER International 2016-11-04 0.4906
## 54 The Times-Picayune/Lucid 2016-11-04 0.4500
## 55 USC Dornsife/LA Times 2016-11-06 0.4323
## 56 CVOTER International 2016-11-03 0.4853
## 57 The Times-Picayune/Lucid 2016-11-03 0.4400
## 58 USC Dornsife/LA Times 2016-11-05 0.4263
## 59 CVOTER International 2016-11-02 0.4878
## 60 USC Dornsife/LA Times 2016-11-04 0.4256
## 61 CVOTER International 2016-11-01 0.4881
## 62 The Times-Picayune/Lucid 2016-11-02 0.4400
## 63 Gravis Marketing 2016-10-31 0.4600
## 64 USC Dornsife/LA Times 2016-11-03 0.4338
## 65 The Times-Picayune/Lucid 2016-11-01 0.4300
## 66 USC Dornsife/LA Times 2016-11-02 0.4247
## 67 Gravis Marketing 2016-11-02 0.4700
## 68 USC Dornsife/LA Times 2016-11-01 0.4236
## 69 The Times-Picayune/Lucid 2016-10-31 0.4200
## 70 USC Dornsife/LA Times 2016-10-31 0.4328
## lower upper hit
## 1 0.4611527 0.4788473 FALSE
## 2 0.3778127 0.3827873 FALSE
## 3 0.4112012 0.4287988 FALSE
## 4 0.4431476 0.4568524 FALSE
## 5 0.4667684 0.4732316 FALSE
## 6 0.4684045 0.4915955 TRUE
## 7 0.4389966 0.4610034 FALSE
## 8 0.4284208 0.4515792 FALSE
## 9 0.3996444 0.4243556 FALSE
## 10 0.4253328 0.4546672 FALSE
## 11 0.4677006 0.4922994 TRUE
## 12 0.4847307 0.5152693 FALSE
## 13 0.4264775 0.4535225 FALSE
## 14 0.4417244 0.4582756 FALSE
## 15 0.4285025 0.4436975 FALSE
## 16 0.4628682 0.4891318 TRUE
## 17 0.4787428 0.4994572 TRUE
## 18 0.4392064 0.4607936 FALSE
## 19 0.4684266 0.4715734 FALSE
## 20 0.4392714 0.4607286 FALSE
## 21 0.4763818 0.5036182 TRUE
## 22 0.4283028 0.4456972 FALSE
## 23 0.4424994 0.4675006 FALSE
## 24 0.4363526 0.4636474 FALSE
## 25 0.4605284 0.4794716 FALSE
## 26 0.4212711 0.4387289 FALSE
## 27 0.4698346 0.4901654 TRUE
## 28 0.4275762 0.4304238 FALSE
## 29 0.3941899 0.4198101 FALSE
## 30 0.4589339 0.4810661 FALSE
## 31 0.4301519 0.4578481 FALSE
## 32 0.4162397 0.4437603 FALSE
## 33 0.4577129 0.4822871 TRUE
## 34 0.4577974 0.4822026 TRUE
## 35 0.4478921 0.4721079 FALSE
## 36 0.4227969 0.4412031 FALSE
## 37 0.4273715 0.4566285 FALSE
## 38 0.4481452 0.4718548 FALSE
## 39 0.4329878 0.4590122 FALSE
## 40 0.4455130 0.4644870 FALSE
## 41 0.4292953 0.4507047 FALSE
## 42 0.4418259 0.4581741 FALSE
## 43 0.4371353 0.4568647 FALSE
## 44 0.4259197 0.4540803 FALSE
## 45 0.4258789 0.4541211 FALSE
## 46 0.4093563 0.4306437 FALSE
## 47 0.4291645 0.4508355 FALSE
## 48 0.4417325 0.4582675 FALSE
## 49 0.4292953 0.4507047 FALSE
## 50 0.3743043 0.3794957 FALSE
## 51 0.4819684 0.5030316 TRUE
## 52 0.4292953 0.4507047 FALSE
## 53 0.4797430 0.5014570 TRUE
## 54 0.4417892 0.4582108 FALSE
## 55 0.4246625 0.4399375 FALSE
## 56 0.4741238 0.4964762 TRUE
## 57 0.4319111 0.4480889 FALSE
## 58 0.4187437 0.4338563 FALSE
## 59 0.4763481 0.4992519 TRUE
## 60 0.4180440 0.4331560 FALSE
## 61 0.4768737 0.4993263 TRUE
## 62 0.4319003 0.4480997 FALSE
## 63 0.4543142 0.4656858 FALSE
## 64 0.4261943 0.4414057 FALSE
## 65 0.4219170 0.4380830 FALSE
## 66 0.4170834 0.4323166 FALSE
## 67 0.4615523 0.4784477 FALSE
## 68 0.4160701 0.4311299 FALSE
## 69 0.4119155 0.4280845 FALSE
## 70 0.4254209 0.4401791 FALSE
polls %>% mutate(x_hat=polls$rawpoll_clinton/100, se_hat =sqrt(x_hat*(1-x_hat)/samplesize), lower=x_hat- pnorm(0.975)*se_hat, upper=x_hat+pnorm(0.975)*se_hat,
hit=lower<=0.482 & upper>=0.482) %>% select(pollster, enddate, x_hat, lower, upper, hit) %>% summarize(mean(hit))
## mean(hit)
## 1 0.1857143
.314 of confidence intervals include the true proportion.
If these confidence intervals are constructed correctly, and the theory holds up, what proportion should include \(p\)? .95 of confidence intervals include the true proportion
A much smaller proportion of the polls than expected produce
confidence intervals containing \(p\) .
If you look closely at the table, you will see that most polls that fail
to include \(p\) are underestimating.
The reason for this is undecided voters, individuals polled that do not
yet know who they will vote for or do not want to say. Because,
historically, undecideds divide evenly between the two main candidates
on election day, it is more informative to estimate the spread or the
difference between the proportion of two candidates \(d\), which in this election was \(0.482−0.461=0.021\). Assume that there are
only two parties and that \(d=2p−1\),
redefine polls as below and re-do exercise 1, but for the
difference.
polls <- polls_us_election_2016 %>% filter(enddate >= "2016-10-31" & state == "U.S.") %>% mutate(d_hat=rawpoll_clinton/100-rawpoll_trump/100)
N<-polls$samplesize[1]
d_hat<-polls$d_hat[1]
x_hat<-(d_hat+1)/2
se_hat<-2*sqrt(x_hat*(1-x_hat)/N)
ci<-c(d_hat-qnorm(0.975)*se_hat, d_hat+qnorm(0.975)*se_hat)
se_hat
## [1] 0.02120683
polls %>% mutate(x_hat=(d_hat+1)/2, se_hat=2*sqrt(x_hat*(1-x_hat)/samplesize), lower=d_hat-pnorm(0.975)*se_hat, upper=d_hat+pnorm(0.975)*se_hat, hit=lower<=0.021 & upper>=0.021) %>% select(pollster, enddate, d_hat, lower, upper, hit)
## pollster enddate
## 1 ABC News/Washington Post 2016-11-06
## 2 Google Consumer Surveys 2016-11-07
## 3 Ipsos 2016-11-06
## 4 YouGov 2016-11-07
## 5 Gravis Marketing 2016-11-06
## 6 Fox News/Anderson Robbins Research/Shaw & Company Research 2016-11-06
## 7 CBS News/New York Times 2016-11-06
## 8 NBC News/Wall Street Journal 2016-11-05
## 9 IBD/TIPP 2016-11-07
## 10 Selzer & Company 2016-11-06
## 11 Angus Reid Global 2016-11-04
## 12 Monmouth University 2016-11-06
## 13 Marist College 2016-11-03
## 14 The Times-Picayune/Lucid 2016-11-07
## 15 USC Dornsife/LA Times 2016-11-07
## 16 RKM Research and Communications, Inc. 2016-11-05
## 17 CVOTER International 2016-11-06
## 18 Morning Consult 2016-11-05
## 19 SurveyMonkey 2016-11-06
## 20 Rasmussen Reports/Pulse Opinion Research 2016-11-06
## 21 Insights West 2016-11-07
## 22 RAND (American Life Panel) 2016-11-01
## 23 Fox News/Anderson Robbins Research/Shaw & Company Research 2016-11-03
## 24 CBS News/New York Times 2016-11-01
## 25 ABC News/Washington Post 2016-11-05
## 26 Ipsos 2016-11-04
## 27 ABC News/Washington Post 2016-11-04
## 28 YouGov 2016-11-06
## 29 IBD/TIPP 2016-11-06
## 30 ABC News/Washington Post 2016-11-03
## 31 IBD/TIPP 2016-11-03
## 32 IBD/TIPP 2016-11-05
## 33 ABC News/Washington Post 2016-11-02
## 34 ABC News/Washington Post 2016-11-01
## 35 ABC News/Washington Post 2016-10-31
## 36 Ipsos 2016-11-03
## 37 IBD/TIPP 2016-11-04
## 38 YouGov 2016-11-01
## 39 IBD/TIPP 2016-10-31
## 40 Ipsos 2016-11-02
## 41 Rasmussen Reports/Pulse Opinion Research 2016-11-03
## 42 The Times-Picayune/Lucid 2016-11-06
## 43 Ipsos 2016-11-01
## 44 IBD/TIPP 2016-11-02
## 45 IBD/TIPP 2016-11-01
## 46 Rasmussen Reports/Pulse Opinion Research 2016-11-02
## 47 Ipsos 2016-10-31
## 48 The Times-Picayune/Lucid 2016-11-05
## 49 Rasmussen Reports/Pulse Opinion Research 2016-10-31
## 50 Google Consumer Surveys 2016-10-31
## 51 CVOTER International 2016-11-05
## 52 Rasmussen Reports/Pulse Opinion Research 2016-11-01
## 53 CVOTER International 2016-11-04
## 54 The Times-Picayune/Lucid 2016-11-04
## 55 USC Dornsife/LA Times 2016-11-06
## 56 CVOTER International 2016-11-03
## 57 The Times-Picayune/Lucid 2016-11-03
## 58 USC Dornsife/LA Times 2016-11-05
## 59 CVOTER International 2016-11-02
## 60 USC Dornsife/LA Times 2016-11-04
## 61 CVOTER International 2016-11-01
## 62 The Times-Picayune/Lucid 2016-11-02
## 63 Gravis Marketing 2016-10-31
## 64 USC Dornsife/LA Times 2016-11-03
## 65 The Times-Picayune/Lucid 2016-11-01
## 66 USC Dornsife/LA Times 2016-11-02
## 67 Gravis Marketing 2016-11-02
## 68 USC Dornsife/LA Times 2016-11-01
## 69 The Times-Picayune/Lucid 2016-10-31
## 70 USC Dornsife/LA Times 2016-10-31
## d_hat lower upper hit
## 1 0.0400 0.0222876324 0.057712368 FALSE
## 2 0.0234 0.0182778367 0.028522163 TRUE
## 3 0.0300 0.0121808069 0.047819193 TRUE
## 4 0.0400 0.0262372162 0.053762784 FALSE
## 5 0.0400 0.0335302203 0.046469780 FALSE
## 6 0.0400 0.0168090671 0.063190933 TRUE
## 7 0.0400 0.0178999457 0.062100054 TRUE
## 8 0.0400 0.0166917809 0.063308219 TRUE
## 9 -0.0150 -0.0401002556 0.010100256 FALSE
## 10 0.0300 0.0004653446 0.059534655 TRUE
## 11 0.0400 0.0154011139 0.064598886 TRUE
## 12 0.0600 0.0295163562 0.090483644 FALSE
## 13 0.0100 -0.0172405063 0.037240506 TRUE
## 14 0.0500 0.0333861285 0.066613871 FALSE
## 15 -0.0323 -0.0476126286 -0.016987371 FALSE
## 16 0.0320 0.0057195521 0.058280448 TRUE
## 17 0.0278 0.0070887529 0.048511247 TRUE
## 18 0.0300 0.0083139130 0.051686087 TRUE
## 19 0.0600 0.0568532106 0.063146789 FALSE
## 20 0.0200 -0.0015609708 0.041560971 TRUE
## 21 0.0400 0.0127799338 0.067220066 TRUE
## 22 0.0910 0.0735386472 0.108461353 FALSE
## 23 0.0150 -0.0101002556 0.040100256 TRUE
## 24 0.0300 0.0025801276 0.057419872 TRUE
## 25 0.0400 0.0210378187 0.058962181 FALSE
## 26 0.0400 0.0223826056 0.057617394 FALSE
## 27 0.0500 0.0296784272 0.070321573 FALSE
## 28 0.0390 0.0361254021 0.041874598 FALSE
## 29 -0.0240 -0.0500676586 0.002067659 FALSE
## 30 0.0400 0.0178455024 0.062154498 TRUE
## 31 0.0050 -0.0228712997 0.032871300 TRUE
## 32 -0.0100 -0.0377929872 0.017792987 FALSE
## 33 0.0300 0.0053924920 0.054607508 TRUE
## 34 0.0200 -0.0044443511 0.044444351 TRUE
## 35 0.0000 -0.0242936114 0.024293611 TRUE
## 36 0.0490 0.0304435172 0.067556483 FALSE
## 37 0.0050 -0.0244555655 0.034455565 TRUE
## 38 0.0300 0.0062248230 0.053775177 TRUE
## 39 0.0090 -0.0171763651 0.035176365 TRUE
## 40 0.0820 0.0630128799 0.100987120 FALSE
## 41 0.0000 -0.0215652843 0.021565284 TRUE
## 42 0.0500 0.0335899080 0.066410092 FALSE
## 43 0.0730 0.0532116908 0.092788309 FALSE
## 44 -0.0010 -0.0293655397 0.027365540 TRUE
## 45 -0.0040 -0.0324474741 0.024447474 TRUE
## 46 -0.0300 -0.0515555778 -0.008444422 FALSE
## 47 0.0680 0.0462217057 0.089778294 FALSE
## 48 0.0500 0.0334025795 0.066597420 FALSE
## 49 0.0000 -0.0215652843 0.021565284 TRUE
## 50 0.0262 0.0208456638 0.031554336 TRUE
## 51 0.0333 0.0122460484 0.054353952 TRUE
## 52 0.0000 -0.0215652843 0.021565284 TRUE
## 53 0.0124 -0.0093161753 0.034116175 TRUE
## 54 0.0600 0.0435254755 0.076474524 FALSE
## 55 -0.0475 -0.0628994877 -0.032100512 FALSE
## 56 0.0009 -0.0214621491 0.023262149 TRUE
## 57 0.0500 0.0337247663 0.066275234 FALSE
## 58 -0.0553 -0.0705561675 -0.040043833 FALSE
## 59 0.0056 -0.0173103391 0.028510339 TRUE
## 60 -0.0540 -0.0692598084 -0.038740192 FALSE
## 61 0.0067 -0.0157584604 0.029158460 TRUE
## 62 0.0500 0.0337030391 0.066296961 FALSE
## 63 0.0100 -0.0014076636 0.021407664 TRUE
## 64 -0.0351 -0.0504370064 -0.019762994 FALSE
## 65 0.0300 0.0136806282 0.046319372 TRUE
## 66 -0.0503 -0.0656895111 -0.034910489 FALSE
## 67 0.0200 0.0030775025 0.036922498 TRUE
## 68 -0.0547 -0.0699159879 -0.039484012 FALSE
## 69 0.0200 0.0036232670 0.036376733 TRUE
## 70 -0.0362 -0.0510835232 -0.021316477 FALSE
polls %>% mutate(x_hat=(d_hat+1)/2, se_hat =2*sqrt(x_hat*(1-x_hat)/samplesize), lower=d_hat- pnorm(0.975)*se_hat, upper=d_hat+pnorm(0.975)*se_hat,
hit=lower<=0.021 & upper>=.021) %>% select(pollster, enddate, x_hat, lower, upper, hit) %>% summarize(mean(hit))
## mean(hit)
## 1 0.5428571
polls %>% mutate(error=d_hat-0.021) %>% ggplot(aes(pollster, error)) + geom_point() +
theme(axis.text.x=element_text(angle=90, hjust=1))
polls %>% mutate(error=d_hat-0.021) %>% group_by(pollster) %>% filter(n() >= 5) %>% ggplot(aes(pollster, error)) + geom_point() + theme(axis.text.x=element_text(angle=90, hjust=1))