First attach libraries as needed
library(dslabs)
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v tibble 2.1.3 v purrr 0.3.2
## v tidyr 0.8.3 v dplyr 0.8.3
## v readr 1.3.1 v stringr 1.4.0
## v tibble 2.1.3 v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(RColorBrewer)
Proceed to load data package “dslabs”
data(package="dslabs")
I decided to use the brexit polls data set since that was something I was personally interested in
data("brexit_polls")
For some reason I just couldn’t get this section of code to work in a chunk, so I decided to run them like this. I decided to first make a a simple scatter plot of all the polls based on the leave response.
p1 <- ggplot(brexit_polls, aes(x = startdate, y= leave, colour = pollster)) +
xlab("Date") +
ylab("Leave") +
ggtitle("Percent of respondents voting Leave")
p1 + geom_point() + theme_dark()
Then I decided to add geom_smooth to help see the pattern. You can see some semblance of a pattern in the above plot, but the line definitely stands out and helps the viewer see whats going on
p2 <- p1 + geom_point() + geom_smooth(color = "blue")
p2 + theme_dark()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Next I decided to see how accurate invidual pollsters were, and examine for any irregularities. The only abnormality I noticed was the prediction by pollster ORB which actually showed a decline in respondents stating the choice of leave. However, by the end the ORB predictions were similar to the ones by the others, they just started with a higher number of respondents stating their choice of leave compared to other pollsters
p2 <- p1 + geom_point() + geom_smooth(se = FALSE, method = lm)
ggplotly(p2)
onsidering the controversy surrounding the Brexit vote I was curious to see if you could spot any patterns on the popularity of leave depending on the medium the polls were conducted in. The result was intersting as you can see. Leave was always more popular online than via telephone polls. This could be an indication of foreign meddling,or people not willing to state their real opinions to another person in a conversation. The Brexit referendum took place on June 23rd 2016, so I would love to see any polling data before January 2016 to see if online polls reflected polls on Brexit gained via other means, and to see how much of an influence online opinions had on the outcome. p1 <- ggplot(brexit_polls, aes(x = startdate, y= leave, colour = poll_type)) + xlab(“Date”) + ylab(“Leave”) + ggtitle(“Percent of respondents voting Leave by poll type”)
p2 <- p1 + geom_point() + geom_smooth(method = "loess") +theme_dark()
ggplotly(p2)
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 16857
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 5.035
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 49.491
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 16857
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 5.035
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 49.491