--- title: 'GE2017 quick posts: May vs. Neil' author: "Robert Roberts" date: 2017-05-21T02:23:15-05:00 categories: ["R"] tags: ["Tidytext", "Wordcloud", "GE2017", "Sentiment"] ---
Moulin Rouge OST - Come What May
theme_rob <- function (base_size = 12, base_family = "sans")
{
(theme_foundation(base_size = base_size, base_family = base_family) +
theme(line = element_line(colour = "grey60"), rect = element_rect(fill = "grey90",
linetype = 0, colour = NA), text = element_text(colour = "grey20"),
axis.title = element_text(colour = "grey30"), axis.text = element_text(),
axis.ticks = element_blank(), axis.line = element_blank(),
legend.background = element_rect(), legend.position = "bottom",
legend.direction = "horizontal", legend.box = "vertical",
panel.grid = element_line(colour = NULL), panel.grid.major = element_line(colour = "grey60"),
panel.grid.minor = element_blank(), plot.title = element_text(hjust = 0,
size = rel(1.5), face = "bold"), plot.margin = unit(c(1,
1, 1, 1), "lines"), strip.background = element_rect()))
}
On the evening of the 21st May, Conservative Prime Minister Theresa May was interview by Andrew Neil in the first of a series of interviews before the upcoming general election. All data for this post is taken from the Spectator transcription of that interview.
Before anaylsing, the data is grabbed using a simple rvest script:
#libraries
library(rvest)
library(magrittr)
library(dplyr)
library(stringr)
library(tidytext)
library(ggplot2)
library(ggthemes)
library(tidyr)
library(wordcloud)
#get the text from the interview
url <- "https://blogs.spectator.co.uk/2017/05/andrew-neil-interviews-theresa-may-full-transcript/"
text <- read_html(url) %>%
html_nodes(".spev-countdown__close-trigger , .ev-meter-content-class p") %>%
html_text() %>%
#subset out only the Prime Minister's answers
str_subset("PM:") %>%
str_replace("PM: ", "")
#sort into a data frame
May_df <- data.frame(Answer = seq_along(text),
Text = as.character(text),
#remove all punctuation
No_punc = as.character(gsub("[[:punct:]]", "", text)),
Topics = as.character(c(rep("Narrowing Polls", 3),
rep("Social Care", 14),
rep("NHS", 12),
rep("VAT / NI Rises", 8),
rep("JAMs", 4),
rep("Winter Fuel", 5),
rep("Immigration", 7),
rep("Deficit", 3),
rep("Brexit", 7),
rep("Term", 3))),
stringsAsFactors = FALSE)
The interview came the same day as a much-publicised U-turn over caps in social care for a one of the Conservatives manifesto policies. In addition to this, Andrew Neil challenged the PM aggressively over the recent narrowing in opinion polls, and missed targets in past Conservative budgets and immigration pledges.
Especially given May’s election strategy of denying ‘fake news’ (“Nothing has changed! Nothing has changed!” was the cry in the morning when questioned over the ‘Dementia Tax’), it’s perhaps unsurprising that many of her answers were rebuttals to the questions put to her. Indeed, of the 66 answers given in the interview, 20% began with the word “No”, far more than any other word.
#find how often words are used to start answers
Answer_starts <- data.frame(table(gsub(" .*", "", May_df$No_punc)))
Answer_starts$Var1 <- as.character(Answer_starts$Var1)
Answer_starts$Var1[1] <- "(interrupted)"
#get the shade of blue the Conservatives use
con_colour <- "#0087DC"
#plot answer beginnings
ggplot(Answer_starts, aes(x = Var1, y = Freq)) +
geom_bar(stat = "identity", fill = con_colour) +
geom_text(aes(label = Var1, angle = 90), nudge_y = 1) +
#to reproduce this, delete my personalised theme
theme_rob() +
xlab("First word of Answer") +
ylab("Frequency") +
theme(axis.text.x=element_blank()) +
ggtitle("I'm Just A Girl Who Can Say No",
"First Word Frequency of Theresa May's Answers to Andrew Neil")

The interview covered a reasonably wide range of topics given its short length. After a few quick questions on the recent polls, two long sections on the aforementioned social care policy, and on NHS funding accounted for almost half the total words spoken. Generally the time spent on each topic after was shorter as the interview progressed, except for a short time spent lingering on the governments previous immigration targets and a potential “no-deal” in Brexit negotiations.
#find the number of words in each answer
May_df$Words <- sapply(gregexpr("\\W+", May_df$No_punc), length) + 1
#plot the number of words vs. topic
ggplot(May_df, aes(x = factor(May_df$Topics, levels = unique(May_df$Topics)), y = Words,
label = factor(May_df$Topics, levels = unique(May_df$Topics)),
fill = factor(May_df$Topics, levels = unique(May_df$Topics)))) +
stat_summary(fun.y = "sum", geom = "bar", position = "identity") +
scale_fill_brewer(palette = "Spectral") +
guides(fill = FALSE) +
theme_rob() +
xlab("Topic") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Total Answers Length") +
ggtitle("Hot Topics",
"How Many Words May Spent on Topics")

In answering, Theresa May was mostly neutral for each of these topics- perhaps surprising given previous answers on (for instance) Brexit. However, there was one very stark exception to this.
Analysing the sentiment of the text for each topic using Julia Silge/ David Robinson’s tidytext package, shows a remarkable dip towards negative sentiment (words with negative connotations, for example: “hideous”, “sad”, or " abomination“). This is probably a combination of two factors: 1) The actual context of the policy- Theresa May spent a long time talking about the”challenges" society faces in years to come with an ageing population 2) The mood of the interview, which was at it’s most tense when Andrew Neil was skewering May over the morning’s U-Turn.
#tidy the answers and remove stopwords
May_tidytext <- May_df %>%
unnest_tokens(word, No_punc) %>%
anti_join(stop_words)
#get a df of words vs. sentiment
bing <- get_sentiments("bing")
#find the sentiment of each topic
May_sentiment <- May_tidytext %>%
inner_join(bing) %>%
count(Topics, index = Answer %/% 80, sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative)
#plot each topic's sentiment
ggplot(May_sentiment, aes(x = factor(May_sentiment$Topics, levels = unique(May_df$Topics)), y = sentiment,
fill = factor(May_sentiment$Topics, levels = unique(May_df$Topics)))) +
geom_bar(stat = "identity", show.legend = FALSE) +
scale_fill_brewer(palette = "Spectral") +
guides(fill = FALSE) +
theme_rob() +
xlab("Topic") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Total Answers Sentiment") +
ggtitle("Care-Fee Attitude",
"The Sentiment (+/-) Associated With May's Answers")

Finally, much of the Conservative election campaign thus far has focused on the opposition, rather on their own policies. The meme of a “strong and stable” government only works in the context of a “coalition of chaos” which could potentially replace it. During the interview the twitter feed of CCHQ posted a number of tweets asking voters to compare May and her opposition leader, Jeremy Corbyn.
This strategy was also borne out in the interview itself. The Prime Minister only mentioned her own party 3 times, a quarter of the number of mentions the Labour Party received.
#find the mentions of Labour vs. Conservatives
Party_mentions <- data.frame(Parties = c("Labour", "Conservatives"),
Values = c(sum(str_count(May_df$No_punc, "Labour")),
sum(str_count(May_df$No_punc, "Conservative"))))
#get the shade of red Labour use
lab_colour <- "#DC241F"
#plot the mentions in a bar graph
ggplot(Party_mentions, aes(x = Parties, y = Values, fill = Parties)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c(con_colour, lab_colour)) +
scale_y_continuous(breaks = seq(0, 12, by = 3)) +
guides(fill = FALSE) +
theme_rob() +
xlab("Party Mentioned") +
ylab("Mentions") +
ggtitle("Labouring the Point",
"Number of Times May Mentioned Her Own Party vs. the Opposition")

(Jeremy) “Corbyn” was also mentioned a further 9 times over the course of the interview.
This perhaps isn’t so surprising given the focus on Theresa May the Conservative campaign has run. At her manifesto launch last week, Theresa May had to reassure the press that “There is no May-ism”, and many events have had “May” not “Conservatives” in the backdrop, and the manifesto itself had a remarkable number of singular pronouns.
However, following the denial of May-ism last week, perhaps this is shifting. In the interview with Andrew Neil, plural pronouns (“We”, “Our”) were used much more than the singular (“I”, “My”).
#find the mentions of Corbyn, or how Theresa May refers to herself vs. her party
Pronoun_mentions <- data.frame(Person = c("Corbyn", "I", "We", "My", "Our"),
Values = c(sum(str_count(May_df$No_punc, "Corbyn")),
sum(str_count(May_df$No_punc, " I ")),
sum(str_count(tolower(May_df$No_punc), " we ")),
sum(str_count(tolower(May_df$No_punc), " my ")),
sum(str_count(tolower(May_df$No_punc), " our "))))
#plot this
ggplot(Pronoun_mentions, aes(x = factor(Pronoun_mentions$Person, levels = unique(Pronoun_mentions$Person)), y = Values,
fill = factor(Pronoun_mentions$Person, levels = unique(Pronoun_mentions$Person)))) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c(lab_colour, rep(con_colour, 4))) +
guides(fill = FALSE) +
theme_rob() +
xlab("Person Mentioned") +
ylab("Mentions") +
ggtitle("For the Many, Not the Few",
"Number of Times May Used First Person Plural vs. Singular Pronouns")

Because everyone loves wordclouds, here’s a wordcloud of all Theresa May’s answers:
#I don't think they look good, but whatever
May_tidytext %>%
count(word) %>%
with(wordcloud(word, n, max.words = 100, colors = con_colour))

Cheers!