DS Labs Homework

Author

Bryan Argueta

Introduction

I will be using the “brexit_polls” data set. This data set contains the poll outcomes from January to June of the 2016 EU/Brexit referendum. The question was “Should the United Kingdom remain a member of the European Union or leave the European Union”. I want to look at the months and pollsters to find any patterns.

Load the library

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)
library(lubridate)

Load the data

data("brexit_polls")
head(brexit_polls)
   startdate    enddate   pollster poll_type samplesize remain leave undecided
1 2016-06-23 2016-06-23     YouGov    Online       4772   0.52  0.48      0.00
2 2016-06-22 2016-06-22    Populus    Online       4700   0.55  0.45      0.00
3 2016-06-20 2016-06-22     YouGov    Online       3766   0.51  0.49      0.00
4 2016-06-20 2016-06-22 Ipsos MORI Telephone       1592   0.49  0.46      0.01
5 2016-06-20 2016-06-22    Opinium    Online       3011   0.44  0.45      0.09
6 2016-06-17 2016-06-22     ComRes Telephone       1032   0.54  0.46      0.00
  spread
1   0.04
2   0.10
3   0.02
4   0.03
5  -0.01
6   0.08

Extract month

brexit_polls2 <- brexit_polls %>%
  mutate(month = month(startdate))

head(brexit_polls2)
   startdate    enddate   pollster poll_type samplesize remain leave undecided
1 2016-06-23 2016-06-23     YouGov    Online       4772   0.52  0.48      0.00
2 2016-06-22 2016-06-22    Populus    Online       4700   0.55  0.45      0.00
3 2016-06-20 2016-06-22     YouGov    Online       3766   0.51  0.49      0.00
4 2016-06-20 2016-06-22 Ipsos MORI Telephone       1592   0.49  0.46      0.01
5 2016-06-20 2016-06-22    Opinium    Online       3011   0.44  0.45      0.09
6 2016-06-17 2016-06-22     ComRes Telephone       1032   0.54  0.46      0.00
  spread month
1   0.04     6
2   0.10     6
3   0.02     6
4   0.03     6
5  -0.01     6
6   0.08     6

Change the months from numbers to words & reorder

brexit_polls2$month[brexit_polls2$month == 1]<- "Jan"
brexit_polls2$month[brexit_polls2$month == 2]<- "Feb"
brexit_polls2$month[brexit_polls2$month == 3]<- "Mar"
brexit_polls2$month[brexit_polls2$month == 4]<- "Apr"
brexit_polls2$month[brexit_polls2$month == 5]<- "May"
brexit_polls2$month[brexit_polls2$month == 6]<- "Jun"

brexit_polls2$month<-factor(brexit_polls2$month, levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun"))

Create a heatmap

ggplot(brexit_polls2, aes(x = month, y = pollster, fill = samplesize)) +
  geom_tile() +
  labs(x = "Months", y = "Pollster") +
  ggtitle("Brexit Referendum by Months in 2016") +
  scale_fill_continuous(name = "Number of Voters") +
  scale_y_discrete(labels = c("TNS" = "Transaction Network Services (TNS)", 
                              "ORB" = "Opinion Research Business (ORB)", 
                              "ORB/Telegraph" = "Opinion Research Business (ORB)/Telegraph", 
                              "ICM" = "Institute of Commerical Management (ICM)")) +
  theme_classic()

Conclusion

It appears that June was the most month with the most activity and YouGov & ICM were the most active pollsters.