Using data from Poll of Polls, I show the polling results for the four most popular parties in Italy over the last year, using only data from the 8 most active pollsters.
Code included for pedogogical reasons. In an actual report, I would leave this out.
knitr::opts_chunk$set(echo = FALSE)
library(tidyverse)
library(lubridate)
# There are a lot of NAs in the first few years, so we can't rely on accurately
# guessing the variable type with the default (looking at the first 1,000 rows).
x <- read_csv("https://pollofpolls.eu/get/polls/IT-parliament/format/csv",
guess_max = 3000)
# Generate a list of polling firms for which we have records of at least 100
# polls. Why is the count variable "nn" rather than the usual "n"? Because the
# input data already includes a variable named "n", which is, I think, the
# number of people in each poll.
good_firms <- x %>%
count(firm) %>%
filter(nn > 100) %>%
pull(firm)
# The beginning portion of an analysis often involves creating variables that we
# will need when we create the main pipe. Above, we create a list of active
# firms. We might also make the list of the important parties data dependent,
# but, for now, we will just hard code them.
# Another value we need is the date a year ago, since we don't want data older
# than that for the plot. We could just calculate this on the fly in the main
# pipe, but calculating these things ahead of time is often sensible.
first_date <- today() - years(1)
# We only want to show the polling over the last year, for the four most
# important parties and using polling data from the most active survey firms.
x %>%
filter(firm %in% good_firms) %>%
select(date, firm, n, LN, M5S, PD, FIPDLFI) %>%
filter(date >= first_date) %>%
# gather() is the trickiest part. The tidy chapter is R4DS is a great
# resource.
gather(key = "party", value = "poll", LN:FIPDLFI) %>%
ggplot(aes(x = date, y = poll, color = party)) +
geom_smooth(se = FALSE) +
geom_point(size = 1, alpha = .8) +
# There is a lot more thar I ought to do with the plot. First, use the full
# names of the parties. Second, order the parties in the legend by their
# current standing in the polls, i.e., with Lega at the top. Third, no need
# for a legend title.
# In fact, best would be to get rid of the legend all-together and put the
# party names (appropriately colored) directly into the plot itself. I leave
# all those improvements as exercies for the reader.
xlab(NULL) + # Putting "Date" here would be fairly redundant.
ylab("Percentage Support") +
labs(title =
"Popular support for League surges over the last year",
subtitle =
"Democratic Party and Forward Italy continue their longterm decline")