From lines 9-14 I loaded in tidyverse, whihc contains the dslabs datasets. I decided to use the dataset brexit_polls and used the summary function to see which of the pollster companies had the most info.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)summary(brexit_polls)
startdate enddate pollster poll_type
Min. :2016-01-08 Min. :2016-01-10 ICM :28 Online :85
1st Qu.:2016-03-04 1st Qu.:2016-03-08 YouGov :26 Telephone:42
Median :2016-04-22 Median :2016-04-26 ORB :14
Mean :2016-04-16 Mean :2016-04-18 ComRes :10
3rd Qu.:2016-05-31 3rd Qu.:2016-06-01 Opinium: 9
Max. :2016-06-23 Max. :2016-06-23 TNS : 9
(Other):31
samplesize remain leave undecided
Min. : 497 Min. :0.3500 Min. :0.3200 Min. :0.0000
1st Qu.:1010 1st Qu.:0.4100 1st Qu.:0.3900 1st Qu.:0.0900
Median :1693 Median :0.4400 Median :0.4200 Median :0.1300
Mean :1694 Mean :0.4424 Mean :0.4223 Mean :0.1265
3rd Qu.:2010 3rd Qu.:0.4800 3rd Qu.:0.4500 3rd Qu.:0.1700
Max. :4772 Max. :0.5500 Max. :0.5500 Max. :0.3000
spread
Min. :-0.10000
1st Qu.:-0.02000
Median : 0.01000
Mean : 0.02008
3rd Qu.: 0.05000
Max. : 0.19000
From lines 17-20 I filtered the brexit_polls so that they only contained the four largest polling organizations as there were some that only had 2-3 entries. The one issue I had was becuase some of these polling organizations had overlapping names, ex: ORB & ORB Telegraph so I had to remove those aswell.
BP <- brexit_polls %>%filter(str_detect(pollster, "ICM|YouGov|ORB|ComRes")) %>%filter(pollster !="ORB/Telegraph") %>%filter(pollster !="YouGov/The Times")
From lines 24-38 I created a density plot to illustrate what percent of each of these polls predicted the outcome of the 2016 brexit referendum (to leave). As we can see the ORB had the highest listings for leave. I also made an undecided and remain density plot for fun.
Warning: No shared levels found between `names(values)` of the manual scale and the
data's colour values.
I used the brexit_polls dataset, whihc covers a variety of polling organizations as their recorded polling results. It also listed the polling sample size and how the poll was conducted (online or through telephone)