Polling in Italy

Using data from Poll of Polls, I show the polling results for the four most popular parties in Italy over the last year, using only data from the 8 most active pollsters.

Appendix

Code included for pedogogical reasons. In an actual report, I would leave this out.

knitr::opts_chunk$set(echo = FALSE)
library(tidyverse)
library(lubridate)

# There are a lot of NAs in the first few years, so we can't rely on accurately
# guessing the variable type with the default (looking at the first 1,000 rows).

x <- read_csv("https://pollofpolls.eu/get/polls/IT-parliament/format/csv", 
              guess_max = 3000)

# Generate a list of polling firms for which we have records of at least 100
# polls. Why is the count variable "nn" rather than the usual "n"? Because the
# input data already includes a variable named "n", which is, I think, the
# number of people in each poll.

good_firms <- x %>% 
  count(firm) %>% 
  filter(nn > 100) %>% 
  pull(firm)

# The beginning portion of an analysis often involves creating variables that we
# will need when we create the main pipe. Above, we create a list of active
# firms. We might also make the list of the important parties data dependent,
# but, for now, we will just hard code them.

# Another value we need is the date a year ago, since we don't want data older
# than that for the plot. We could just calculate this on the fly in the main
# pipe, but calculating these things ahead of time is often sensible.

first_date <- today() - years(1)

# We only want to show the polling over the last year, for the four most
# important parties and using polling data from the most active survey firms.

x %>% 
  filter(firm %in% good_firms) %>% 
  select(date, firm, n, LN, M5S, PD, FIPDLFI) %>%
  filter(date >= first_date) %>% 
  
  # gather() is the trickiest part. The tidy chapter is R4DS is a great
  # resource.
  
  gather(key = "party", value = "poll", LN:FIPDLFI) %>% 
  
  ggplot(aes(x = date, y = poll, color = party)) +
    geom_smooth(se = FALSE) +
    geom_point(size = 1, alpha = .8) +
  
    # There is a lot more thar I ought to do with the plot. First, use the full
    # names of the parties. Second, order the parties in the legend by their
    # current standing in the polls, i.e., with Lega at the top. Third, no need
    # for a legend title.
  
    # In fact, best would be to get rid of the legend all-together and put the
    # party names (appropriately colored) directly into the plot itself. I leave
    # all those improvements as exercies for the reader.
  
    xlab(NULL) +   # Putting "Date" here would be fairly redundant.    
    ylab("Percentage Support") +
    labs(title = 
               "Popular support for League surges over the last year",
         subtitle = 
               "Democratic Party and Forward Italy continue their longterm decline")

Polling in Italy

David Kane

October 03, 2018

Appendix