Sentiment analysis of interactions with the UK Environment Agency on Twitter

Synopsis

This study investigates sentiment and substance of interactions with the UK Environment Agency (EA) on Twitter. The study builds on previous small-n qualitative research finding that both London residents and environmental charity workers express low levels of trust in the ability of the EA to effectively address urban water quality issues. Further findings from narrative interviews suggest that EA responsiveness differs by location and issue, with flood risk management taking precedence over urban stream pollution incidents. Some respondents spoke in stark terms of the lacklustre response they had received from the EA to repeated attempts at bringing urban water pollution to the attention of the organisation. Multiple research participants even expressed the sentiment that the EA had all but “given up” on its responsibility to protect and improve their local water environment.

Consequently, this study explores how members of the public interact with the EA on Twitter: which issues are Twitter users likely to bring up in their interactions with the EA? How are their interactions emotionally charged and expressive of trust or distrust? Where do these interactions originate?

Based on sentiment analysis of 18,000 tweets, we identify three distinct sentiment patterns for tweets on the issues of conservation, flooding and pollution. These categories are characterised by tweets expressing high levels of positive emotions and trust (conservation), high levels of positivity and trust modulated by fear and anticipation (flooding), and high levels of negative emotion and disgust and a lack of trust (pollution). We discuss a possible interpretation of these findings as well as limitations.

Sample

Our sample was constructed based on the six regional EA accounts and the primary national account. Data was mined using the rtweet package and was programmed in two phases; firstly, tweets from May 1st 2020 - November 22nd were mined using the search_fullarchive (paid twitter api functionality) function. This mine provided ~14,000 tweets. Secondly, a weekly scrape was set up from the 23rd of November using the search_twitter function. After two mines, this collected ~2,000 tweets. In total we collected 18,784 tweets, ranging from the 1st of May 2020 to the 7th of December 2020.

library(knitr)
setwd("Environmental-Twitter-Analaysis-master/")
sample <- read.csv("Hyp2-Sample.csv")
kable(sample, caption = "Account used for sample (followers/following correct of 11/12/2020")

Account used for sample (followers/following correct of 11/12/2020
Twitter.Handle	Region	Following	Followers
EnvAgency	National	3651	535100
EnvAgencyYNE	Yorkshire & North East	1824	25400
EnvAgencyNW	North West	3213	23900
EnvAgencySE	South East	779	25200
EnvAgencySW	South West	3367	25200
envagencymids	Midlands	1535	21600
EnvAgencyAnglia	East Anglia	1429	11000

Methodology

To quantitatively understand the interactions between the public and the EA, a sentiment analysis was conducted to our dataset of tweets. We selected tweets mentioning EA accounts because the pragmatics governing speech on the Twitter platform suggest that mentioning a user name in a tweet be interpreted as implication in a conversation. For instance, mentioning a user name can mean addressing a user directly with a question or remark, or tweeting to followers in such a way as to bring the mentioned user’s attention to that tweet, thus inviting a reply from the mentioned user in both cases.

The following libraries were used in order to build our sentiment model.

library(sentimentr)
library(SentimentAnalysis)
library(RSQLite)
library(DBI)
library(tm)
library(ggplot2)
library(quanteda) 
library(dplyr)
library(patchwork)
library(tmap)
library(remotes)
library(tidytext)
library(textdata)
get_sentiments("nrc")

A cron job was set up to mine tweets and store them in a RSQLite database. This data was then saved as a dataframe and subsetted to only include the text of a tweet.

setwd("Twitter-Mining/environment_Agency_Analysis/")
EnvAgencyTweetsDB <- dbConnect(RSQLite::SQLite(), "EnvAgency_Tweet_DB.db")
envAgencyMentions <- dbGetQuery(EnvAgencyTweetsDB, 'SELECT * FROM EnvAgency_Mentions')
mergedTweetsSub <- subset(envAgencyMentions,
                          select = c(
                            'text'))

We then calculated the average sentiment per sentence to understand the distribution of our dataset. In doing so we find that we have a normal distribution.

# get average sentiment score for each sentence
sentiment_support <- sentiment_by(get_sentences(envAgencyMentions$text))

#plot the score distribution
ggplot(sentiment_support,aes(ave_sentiment)) +
  geom_histogram(bins = 50) +
  labs(title = "Sentiment Histogram of Tweets", x = "Sentiment Score") +
  theme_bw() +
  theme(plot.title = element_text(size = 14, face = "bold",hjust = 0.5)) +
  geom_vline(xintercept = 0, color = "red")

To explore the emotion behind the tweets directed at the EA we used the NRC Word-Emotion Association Lexicon (Mohammad and Turney 2013). Each tweet was assigned several scores based on trust, sadness, positive, negative, joy, fear, disgust, anticipation and anger emotions; the results of which were then summed up to classify a percentage of emotion for our dataset.

#Tokenizing character vector file 'tweets'.
token = data.frame(text=mergedTweetsSub, stringsAsFactors = FALSE) %>% unnest_tokens(word, text)

#Matching sentiment words from the 'NRC' sentiment lexicon
senti = inner_join(token, get_sentiments("nrc")) %>%
  count(sentiment)
senti$percent = (senti$n/sum(senti$n))*100

While the overall sentiment gives a general overview of how users interact with the EA, we wanted to explore the sentiment regarding a number of frequent topics. To do so, we created a function to extract a word (topic) from the dataset and transform it into its own corpus. Following this the NRC Word-Emotion Association Lexicon was applied to understand the emotion underlying that word.

sentimentPerWord <- function(word) {
  corpus = corpus(mergedTweetsSub)
  corpus = (corpus_water = subset(corpus, grepl(word, texts(corpus))))
  token_word = data.frame(text=corpus, stringsAsFactors = FALSE) %>% unnest_tokens(word, text)
  senti_word = inner_join(token_word, get_sentiments("nrc")) %>%
    count(sentiment)
  senti_word$percent = (senti_word$n/sum(senti_word$n))*100
  ggplot(senti_word, aes(sentiment, percent)) +
    geom_bar(aes(fill = sentiment), position = 'dodge', stat = 'identity')+
    ggtitle(paste(word, sep = " ","word sentiment \nfrom Environment Agency Mentions"))+
    coord_flip() +
    theme(legend.position = 'none', plot.title = element_text(size=18, face = 'bold'),
          axis.text=element_text(size=16),
          axis.title=element_text(size=14,face="bold"))

}

Results

The following results are from the sample of 18,000 tweets. Overall sentiment of tweets mentioning the EA is characterised by terms associated with positive emotions significantly outweighing those linked to negative emotions. Emotionally charged tweets also express high levels of trust, followed by anticipation and, to a lesser degree, fear.

#Plotting the sentiment summary
ggplot(senti, aes(sentiment, percent)) +
  geom_bar(aes(fill = sentiment), position = 'dodge', stat = 'identity')+
  ggtitle("Overall Sentiment of Environment Agency mentions \n on Twitter (based on lexicon: 'NRC')")+
  coord_flip()

With slight variation, this pattern repeats across the terms fish, nature, river, water and wildlife: positive outweigh negative emotions and tweets display high levels of trust. An interesting outlier is the term flood: whereas positive emotions dramatically outweigh negative emotions and sentiment analysis detects high levels of trust, tweets also frequently contain terms associated with fear and anticipation. This could reflect the character of flooding as a highly disruptive and potentially hazardous event as well as the use of Twitter by the EA to issue flood risk warnings.

sentimentPerWord(word = 'flood')

In contrast, sentiment for tweets containing the terms pollut, sewag and waste is characterised by negative emotion dramatically outweighing positive emotion. Significantly lower levels of trust are outweighed by a high level of disgust followed to a lesser degree by fear and sadness.

sentimentPerWord(word = 'pollut')

sentimentPerWord(word = 'sewag')

sentimentPerWord(word = 'wast')

Discussion

We group the issues about which Twitter users are likely to interact with the EA into three broad categories with distinct sentiment patterns:

Conservation (fish, wildlif, river, nature, water)
Flooding (flood)
Pollution (sewag, wast, pollut)

A comparison of the three distinct sentiment patterns found in the results suggests that tweets in the category A) tend to be positive and display high levels of trust. Tweets in the category B) tend to be positive and with high levels of trust yet also show elevated levels of fear and anticipation. In contrast, tweets in the category C) tend to be negative with low levels of trust and elevated levels of disgust.

Overall we find that a majority of the interactions on the general issues of ’water’ and ‘river’ as well as terms related to fish and wildlife conservation appear positive and confident. We find the opposite for terms related to pollution. However, without systematic close reading of a sample of tweets for each category it is difficult to ascertain with any confidence the extent to which the sentiment patterns in each category relate to the substance of the issue or the EA’s perceived competence in handling the issue.

For instance, the co-occurrence of high levels of fear and trust in tweets about flooding could indicate the perceived severity of flooding as a hazard while simultaneously expressing confidence in the agency’s competence at handling flood risk. Similarly, the co-occurrence of low levels of trust and high levels of disgust in tweets about pollution could indicate the revulsion provoked by an experience of pollution as a transgressive act of contamination simultaneously with disappointment in the agency’s response to such incidents. While this interpretation would be consonant with small-n qualitative evidence, we caution that absent further investigation other possible explanations remain equally plausible. For instance, low levels of trust in the category pollution could also result from user sentiment towards pollution as anti-social, i.e. rule-breaking behaviour.

sentimentPerWord(word = 'wildlif')

sentimentPerWord(word = 'water')

sentimentPerWord(word = 'natur')

sentimentPerWord(word = 'environ')

sentimentPerWord(word = 'river')

sentimentPerWord(word = 'fish')

As a provisional attempt at validating the model through qualitative analysis we manually coded 100 random tweets from the samples containing the key term ‘flood’, using a simple coding scheme comprising three codes for the presence of positive emotion, negative emotion, and trust. We subsequently considered the tweets for close reading. Overall validation confirmed both the proportionality of the computational estimate of sentiment distribution and our interpretation that tweets expressed confidence in flood risk management. However, it is important to note that users expressed confidence in a variety of actors, such as local authorities, rather than in the EA alone. Moreover, there were notable exceptions in the sample which had not been picked up by the sentiment analysis model. For instance, a user replied to a BBC report on the reintroduction of beavers in Exmoor National Park with a tweet remarking that beavers would “complete more positive and practical flood prevention work in a year than the EnvAgency has done in a decade and at a fraction of the cost”. Whereas the model ranked this statement as positive, it can hardly be read as a vote of confidence in the flood risk management competence of the EA. This example is notable because there currently exists a controversy about the role of beavers in natural flood management in the UK; a subtlety which would be missed by computational analysis alone. Nevertheless positive and confident statements dominated the manually coded sample for the category flooding.

We repeated manual coding for the term ‘sewag’. Of 100 tweets only 12 expressed trust. The majority of tweets either reported incidents of sewage pollution or expressed various forms of disappointment in the EA’s management of sewage pollution. The sizable number of complaints about perceived mismanagement of sewage pollution which did not specify location or incident likely responded to prominent media reports of water companies’ practice of spilling sewage into rivers, which were published and broadcast over the previous year. In this case, tweets dominated that were highly critical of EA performance, frequently deploying emotive language and mentioning EA leadership by name. Users took issue with the agency’s perceived lack of monitoring and enforcement of water quality vis a vis the private sector, which they held responsible for polluting water bodies. By choosing to mention the EA in such a way as to bring agency communications’ staff attention to these highly critical tweets, users deploy a technical affordance of the social media platform in order to actively perform public opinion vis á vis a publicly accountable statutory body (cf. Marres, 2017: 156). In addition, some of the complaints were directed at water companies or housing developers instead of the EA. Others expressed more ambiguous sentiments, thanking the EA for its work yet inquiring about its measures against sewage pollution. In this sample, too, sarcastic and ironic tweets obtained that are likely to be misclassified by computational methods.

Finally, from the 18,000 tweets we mined we found that 900 had geoinformation that would allow plotting their location on a map

library(sf)
tweets_location <- filter(envAgencyMentions, !is.na(envAgencyMentions$lat))
tweets_location_sf <- st_as_sf(tweets_location, coords = c("lng", "lat"), crs = 4326)

#Analyze sentiment
sentiment <- analyzeSentiment(tweets_location_sf$text)
#Extract dictionary-based sentiment according to the QDAP dictionary
sentiment2 <- sentiment$SentimentQDAP
#View sentiment direction (i.e. positive, neutral and negative)
tweets_location_sf$sentiment <- convertToDirection(sentiment$SentimentQDAP)

tmap_mode("view")
tm_basemap(leaflet::providers$Stamen.TonerLite) +
    tm_shape(tweets_location_sf) +
    tm_dots(col = "sentiment", palette="RdYlBu",popup.vars = c("text")) +
    tm_layout(
        main.title.size = 0.7 ,
        legend.position = c("right", "bottom"),
        legend.title.size = 0.8
    )

However, the sample size is too small at this point to discern any spatial patterns.

Hyp 2

Helge Peters and Nathanael Sheehan