UK and US: Speaking with One Voice in Politics?

A Comparative Study on the UK and US Political Discourse from the Perspective of Digital Humanities

Author

ZHANG WAER, School of English and International Studies
Beijing Foreign Studies University

1. Introduction

It is no secret that the United States and the United Kingdom have long maintained a special relationship, not only culturally, binded by the same language and shared history, but also politically. Winston Churchill first coined the term “special relationship” to describe the close allyship and cooperation between the two countries as early as in 1944. However, this term has gained a much broader sense as to refer to their shared values and common positions in the face of global incidents as “the democracies”. This closeness has also been frequently quoted in the leader’s political narrative in both countries. George W. Bush anchored the relationship “on the surest foundations” – their “deep and obiding love of liberty”.

Nevertheless, this special relationship does not lead to similarities in their own politics. The two countries diverge significantly on the systems under which their governments are brought together, the ways their political leaders are elected and the targets people look to the leaders to deliver. These key differences could have a considerable impact on what the political leaders or candidates choose to pitch in their electoral speeches. Intuitively, one may assume that US presidents might be more inclined to playing with identity issues, competition with China and employment rates etc., while British Prime Ministers might prioritise social welfare, including housing, jobs, and NHS in particular, and more recently Brexit and its aftermath. Yet this assumption is usually based on one’s empirical experience and second-hand newsfeeds instead of impartial facts and statistics. Now, the digitial humanities tools have enabled us to look at this issue from a different angle. By doing fact-based statistical research into the political discourse, we may find the firm grounds to back up our guessses. Therefore, the present research will use the digital tools to explore the following questions:

Are there any differences in the topics that constitute the core of political discourse in the UK and the US? If so, what are their preferences?
What are the general sentiments in their political narratives respectively?

By juxtaposing the political discourse of the two countries, the study is also intended to reveal that even connected with close cultural ties, the UK and US may not share the same universal values as they assume.

2. Data & Methods

2.1 Data

This research uses two databases from The American Presidency Project ¹ and House of Commons Library ² to represent the political discourse in the two countries. For the US, the author used the transcripts of Presidential Candidate Debates since 1960 (on four-year intervals) from The American Presidency Project. These debates are pivotal moments in the US electoral process where candidates present their policies, visions, and critiques of their opponents. They are highly publicised events that capture the essence of what candidates believe are the key issues resonating with voters, making them excellent sources for understanding the core themes of US political discourse.

Yet on the UK side, it is not easy to find a perfect counterpart due to the differences in their electoral systems. The UK Prime Ministers are not directly elected by the people but from a representative democratic process, that is, voted by the members of his or her own party who, in turn, represent their respective consituencies. This means the Prime Ministers usually do not give national level campaign speeches or debates to win consituencies. To maximise comparability, the author used the maiden speeches in the House of Commons in the same years from the House of Commons Library. Although not directly tied to election campaigns, these speeches, as the first formal addresses by newly elected MPs, often outline their political priorities, which can reflect the broader political climate and the issues they think are important to their electorate, thus providing insights into UK political discourse.

In addition, the temporal range and alignment makes it possible to track the shift of political sentiments have evolved over time in both countries,, which could be influenced by global events, domestic policy shifts, or leaders’ personalities.

2.2 Methods

The cutting-edge digital humanities arsenal has empowered us to carry out more macro, factual and insightful research than ever before. This particular research leverages the two most prevalent digital tools, i.e. topic modelling and sentiment analysis, and seeks to reveal the cores of US and UK political discourse and the evolution of political sentiments over the past 60 years in the two countries.

2.2.1 Topic Modelling: the cores of US and UK political discourse

Topic modeling, essentially, is a text mining technique that applies unsupervised learning on large sets of texts to produce a summary set of terms derived from the documents that represent the collection’s overall primary set of topics. Multiple algorithms have been developed for topic modelling. This study will adopt the Latent Dirichlet Allocation (LDA) approach, which is “a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics” ³.The section will only show the codes for processing American data as most of them are the same. Only different configurations will be noted.

Setup and Text Preprocessing

This section initialises R studio with the packages needed and preprocesses the texts to be analysed.

# Load required packages
library(pacman)
p_load(    
  tidyverse,
  tidytext,        
  topicmodels,    # For LDA
  LDAvis,         # For interactive visualization
  stringr,         
  stopwords      
)

# Load the dataset 
us_speeches <- read_csv("~/Desktop/Digital Humanities/term project/data/us/6.2 Presidential Candidates Debates _1960-2024_ _ The American Presidency Project.csv")

Rows: 175 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Text
dbl (1): Year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Load the stopwords database
data("stop_words")

# Preprocess the texts
us_tokenised_speeches <- us_speeches %>%
  # Create unique document IDs for easy tracing
  mutate(doc_id = row_number()) %>%
  # Tokenise the texts with n-grams (1-3 words)
  unnest_tokens(
    word, "Text",
    token = "ngrams", 
    n_min = 1, n = 3,
    stopwords = c(
      stopwords::stopwords("en"),
      stopwords::stopwords(source = "smart"),"people","senator","mr","president","applause","america","bush","trump","biden","make","obama","state","secretary","sanders","country","government","clinton","governor","american","percent","lehrer","back","things","states","year","governor romney","romney","hillary","mccain","years","edwards","donald","christie","keyes","bradley","paul","gingrich","united","klobuchar","rubio","wallace","put","vice","giuliani","huckabee","cooper","warren","harris","americans","work","lot","made","smith","nixon","kennedy","tapper","shadel","question","time","howe","hoge","mondale","carter","moderator","drew","gannon","valeriani","bush","mashek","ferraro","vanocur","good","ryerson","gore","shaw","kerry","brokaw","sharpton","dean","tom","ifill","cheney","gibson","brownback","tancredo","hunter","gilmore","hume","thompson","goler","sen","blitzer","vaughn","gov","ms","hall","rep","russert","williams","gravel","richardson","smiley","kucinich","dodd","olbermann","stephanopoulos","laughter","crosstalk","mccarthy","video","palin","schieffer","demint","bachmann","speaker","representative","ron","mitt","michele","ramaswamy","joe","plan","cruz","mayor","buttigieg","working","care","thing","talk","fact","day","give","cain","world","republican","today","campaign","haley","policy","john","dole","raddatz","issue","bauer","desantis","forbes","washington","santorum","perry","congressman","part","kind","system","important","big","congress","debate","great","cut","bentsen","pence","quayle","kaine","uh","baier","talking","king","ford","administration","problem","pay","middle","federal","number","program","kasich","florina","ryan","welker","reagan","party","scott","move","bring","end","start","making","deal","wall","o'malley","graham","tonight","todd","stage","respond","issues","point","programs","carson","holt","coming","win","kelly","candidates","barack","obama","republicans","fiorina","street","huntsman","hatch"
    ) # This is a customised stopwords dictionary collated by the author
  ) %>%
  anti_join(stop_words) %>% #Re-executing stopwords removal due to unexpected stopwords in results
  # Removing 'null' values in the data frame
  filter(!is.na(word)&!is.null(word))

Joining with `by = join_by(word)`

Explanations:

Stopwords removal was repeated due to unexpected appearances of stopwords in the processed result. The number of observations did drop after re-executing stopwords removal using anti_join();
The customised stopwords dictionary was collated through an iterative process of executing the codes and examining the results by the author.The customised dictionary for the UK topic modelling is as follows:

stopwords = c(
      stopwords::stopwords("en"),
      stopwords::stopwords(source = "smart"),"people","senator","mr","president","applause","america","bush","trump","biden","make","obama","state","secretary","sanders","country","government","clinton","governor","american","percent","lehrer","back","things","states","year","governor romney","romney","hillary","mccain","years","edwards","donald","christie","keyes","bradley","paul","gingrich","united","klobuchar","rubio","wallace","put","vice","giuliani","huckabee","cooper","warren","harris","americans","work","lot","made","smith","nixon","kennedy","tapper","shadel","question","time","howe","hoge","mondale","carter","moderator","drew","gannon","valeriani","bush","mashek","ferraro","vanocur","good","ryerson","gore","shaw","kerry","brokaw","sharpton","dean","tom","ifill","cheney","gibson","brownback","tancredo","hunter","gilmore","hume","thompson","goler","sen","blitzer","vaughn","gov","ms","hall","rep","russert","williams","gravel","richardson","smiley","kucinich","dodd","olbermann","stephanopoulos","laughter","crosstalk","mccarthy","video","palin","schieffer","demint","bachmann","speaker","representative","ron","mitt","michele","ramaswamy","joe","plan","cruz","mayor","buttigieg","working","care","thing","talk","fact","day","give","cain","world","republican","today","campaign","haley","policy","john","dole","raddatz","issue","bauer","desantis","forbes","washington","santorum","perry","congressman","part","kind","system","important","big","congress","debate","great","cut","bentsen","pence","quayle","kaine","uh","baier","talking","king","ford","administration","problem","pay","middle","federal","number","program","kasich","florina","ryan","welker","reagan","party","scott","move","bring","end","start","making","deal","wall","o'malley","graham","tonight","todd","stage","respond","issues","point","programs","carson","holt","coming","win","kelly","candidates","barack","obama","republicans","fiorina","street","huntsman","hatch","deputy","speaker","hon","leicester","windsor","speech","maiden","hon members","member","members","place","birmingham","tottenham","hodge","hill","hodge hill","glasgow","parliament","friend","service","portsmouth","commitment","rotherham","sir","council","cardiff","sheffield","column","location","cent","hartlepool","area","predecessor","north","hope","support","election","elected","constituency","constituents","problems","high","showing","show"
    )

Due to the unexpected appearances of many ‘null’ values in the processed data frame, the author run the filter(!is.na(word)&!is.null(word)) line to get rid of them.

Create the Document-Term Matrix and filter rare terms (n=1)

us_dtm <- us_tokenised_speeches %>% 
  count(doc_id, word) %>%
  filter(n > 1) %>%
  cast_dtm(doc_id, word, n)

us_dtm

<<DocumentTermMatrix (documents: 175, terms: 50040)>>
Non-/sparse entries: 190257/8566743
Sparsity           : 98%
Maximal term length: 39
Weighting          : term frequency (tf)

Running the LDA Model

The author tried several values and finally determine the optimal k as 8.For the UK analysis, k is set to 6.

# Set global seed for reproducibility
set.seed(1239)

# Create our topic model
us_lda_model <- LDA(
  us_dtm,
  k = 8,                # Number of topics to find
  method = "Gibbs",     
  control = list(
    seed = 1239,        # For reproducible results
    iter = 2000,        # Number of iterations
    thin = 100,         # Save every n iteration
    best = TRUE,        # Return the best model
    verbose = FALSE     # Switch on to show progress and monitor convergence
  )
)

# Extract top terms per topic
us_top_terms <- tidy(us_lda_model, matrix = "beta") %>%
  group_by(topic) %>%
  slice_max(beta, n = 10) %>%
  ungroup()

Visualising the LDA Results

The results of LDA topic modelling are visualised using ggplot2.

# Visualize top terms
ggplot(us_top_terms) +
  geom_col(aes(beta, reorder_within(term, beta, topic), fill = factor(topic))) +
  facet_wrap(~topic, scales = "free_y") +
  scale_y_reordered() +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(
    title = "LDC Topic Modelling: Major Topics in American Political Narrative",
    subtitle= "Top Terms by Topic",
    x = "Beta (Topic-Term Probability)",
    y = ""
  )

2.2.2 Sentiment Analysis: the evolution of political sentiments

Sentiment analysis, also known as opinion mining, is a technique used to determine the emotional tone behind a body of text. Commonly used lexicons include AFINN, Bing and NRC. In this study, AFINN will be used to analyse the sentiment shifts in political discourse.The following section displays the codes for analysing the US presidential candidate debates. The UK data are analysed using the same method.

Setup and Text Preprocessing

This section initialises R studio with the packages needed and preprocesses the texts to be analysed.

library(pacman)
p_load(
  tidyverse,       
  tidytext,        
  textdata,       # Contains sentiment lexicons (dictionaries)
  stringr
)

# Get the sentiment lexicon first
afinn_lex <- get_sentiments("afinn")

# Load the dataset 
us_speeches <- read_csv("~/Desktop/Digital Humanities/term project/data/us/6.2 Presidential Candidates Debates _1960-2024_ _ The American Presidency Project.csv") %>% 
  mutate(chapter=row_number())

Rows: 175 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Text
dbl (1): Year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

us_tidy_speeches <- us_speeches %>% 
  separate_rows(Text, sep="(?<=[.!?])\\s+(?=[A-Z])") %>% 
  mutate(linenumber=row_number()) %>% 
  # Remove punctuation
  mutate(Text = str_replace_all(Text, "[[:punct:]]", "")) %>%
  # Remove special characters (including emojis)
  mutate(Text = str_replace_all(Text, "[^[:alnum:][:space:]]", "")) %>% 
  # Remove numbers
  mutate(Text = str_replace_all(Text, "\\d+", "")) %>% 
  # Shift all words to lowercase
  mutate(Text=tolower(Text))

# Load the stopwords dictionary
data("stop_words")

#Tokenisation
us_tokenised_speeches <- us_tidy_speeches %>% 
  unnest_tokens(
    word, "Text",
    token = "words", 
    stopwords = c(
      stopwords::stopwords("en"),
      stopwords::stopwords(source = "smart"),"people","senator","mr","president","applause","america","bush","trump","biden","make","obama","state","secretary","sanders","country","government","clinton","governor","american","percent","lehrer","back","things","states","year","governor romney","romney","hillary","mccain","years","edwards","donald","christie","keyes","bradley","paul","gingrich","united","klobuchar","rubio","wallace","put","vice","giuliani","huckabee","cooper","warren","harris","americans","work","lot","made","smith","nixon","kennedy","tapper","shadel","question","time","howe","hoge","mondale","carter","moderator","drew","gannon","valeriani","bush","mashek","ferraro","vanocur","good","ryerson","gore","shaw","kerry","brokaw","sharpton","dean","tom","ifill","cheney","gibson","brownback","tancredo","hunter","gilmore","hume","thompson","goler","sen","blitzer","vaughn","gov","ms","hall","rep","russert","williams","gravel","richardson","smiley","kucinich","dodd","olbermann","stephanopoulos","laughter","crosstalk","mccarthy","video","palin","schieffer","demint","bachmann","speaker","representative","ron","mitt","michele","ramaswamy","joe","plan","cruz","mayor","buttigieg","working","care","thing","talk","fact","day","give","cain","world","republican","today","campaign","haley","policy","john","dole","raddatz","issue","bauer","desantis","forbes","washington","santorum","perry","congressman","part","kind","system","important","big","congress","debate","great","cut","bentsen","pence","quayle","kaine","uh","baier","talking","king","ford","administration","problem","pay","middle","federal","number","program","kasich","florina","ryan","welker","reagan","party","scott","move","bring","end","start","making","deal","wall","o'malley","graham","tonight","todd","stage","respond","issues","point","programs","carson","holt","coming","win","kelly","candidates","barack","obama","republicans","fiorina","street","huntsman","hatch","ive","theyve","youre","theyre","weve","im","dont","didnt","id","hes","youve","whats","stand"
    ) 
  )%>% 
  anti_join(stop_words) %>%
  filter(!is.na(word)&!is.null(word))

Joining with `by = join_by(word)`

AFINN Analysis

# Summative Analysis for AFINN
us_afinn_summary <- us_tokenised_speeches %>%
  inner_join(afinn_lex, by = "word", relationship = "many-to-many") %>%  # Match words with their sentiment scores
  group_by(chapter) %>%                                                     # Group all words by book
  summarise(
    mean_sentiment = mean(value),                                        # Average sentiment per book
    total_words = n(),                                                   # Count of sentiment-scored words
    most_negative = min(value),                                          # Most negative score
    most_positive = max(value)                                           # Most positive score
  ) 

# Re-join the 'Year' column for diachronic analysis
us_afinn_summary <- left_join(us_afinn_summary,us_speeches,by="chapter", relationship="many-to-one")

Diachronic Analysis of Sentiment Shifts

This study does not include an emotional arc because, unlike literary tomes, maiden speeches are not chronological. Thus, an emotional arc showing sentiments per hundred lines will only give rise to very limited conclusions.

Instead, the present research seeks to show the diachronic changes in political sentiments by calculating the mean sentiment values throughout the years to reveal.

# Calculate the mean sentiment value grouped by year
us_afinn_summary_year <- us_afinn_summary %>% 
  group_by(Year) %>% 
  summarise(sentiment=mean(mean_sentiment),groups="drop")

Visualising Sentiment Shifts

ggplot(us_afinn_summary_year,aes(x=Year,y=sentiment))+
  geom_line()+
  geom_smooth(method = "loess", color = "red", se = TRUE)+
  theme_minimal() +
  scale_fill_brewer(palette = "Set2") +
  theme(legend.position = "none")+
  labs(
    title = "AFINN Sentiment Analysis: Emotional Trajectories in American Political Narrative",
    subtitle = "Tracking sentiment on a scale from -5 (negative) to +5 (positive)",
    x = "Year",
    y = "Average Sentiment Score"
  )

`geom_smooth()` using formula = 'y ~ x'

3. Analysis & Findings

3.1 Core Topics

Figure 1: LDC Topic Modelling: Major Topics in American Political Narrative

Figure 2: LDC Topic Modelling: Major Topics in British Political Narrative

According to the results from topic modelling, the most prominent topics in US and UK political narratives are as follows:

🇺🇸US Topics:

War in Middle East
Trade war with China
Social policies
Employment and taxes
Cold War
Immigration and Security

⚠️Possibly due to the instability of machine learning, Topics 7 and 8 do not show any significant topics and are somewhat repetitive with previous topics. Therefore, they are deemed as invalid data and are integrated into previous topics.

🇬🇧UK Topics:

Education
Industrial development
Local communities
Service sector
Security and defence
Pride and “Britishness”

Despite the limitations of machine learning in summarising fully satisfactory topics, the distinctions in political narratives between the two countries are already easy to discern, some of which align closely with our empirical experience. While American political narratives demonstrate a clear inclination towards external affairs, which is usually named, satirically, by Chinese diplomats as “long arm jurisdiction”, British politicians pay more attention to what is going on within their borders, for example, people’s education, local development and the service sector.

Besides, the American discourse also shows a propensity for realism, with a significant emphasis on security and national defence. Though the British also have security on the list, it is nowhere near the importance it bears for Americans. This could be evidenced by the realist foreign policy US pursued during the could war, which focused on national interest and propped up or tolerated dictatorships as long as they opposed the Soviet Union. The two rivals had little faith in international institutions or universal ideals except for propaganda purposes. but leveraged regional arrangements to knit together their allies. Even after the end of the Cold War, US foreign policy still involved a confusing mixture of hard-nosed security intersts and idealistic rhetoric.⁴

3.2 Evolution of Political Sentiments

Figure 3: AFINN Sentiment Analysis: Emotional Trajectories in American Political Narrative

The political sentiment in the United States has experienced a continuous decline since the 1960s, with a particularly pronounced downturn observed in the 2020s. This trend can be partially attributed to the escalating political tensions that emerged following the conclusion of the Trump administration, often referred to as Trump 1.0. The polarisation between the two major parties, the Republicans and Democrats, has intensified, manifesting in violent incidents such as the January 6 U.S. Capitol Attack and the assassination attempt of Donald Trump in Pennsylvania.This heightened polarisation has significantly fuelled the negative sentiment in politics. Besides the domestic instability, the 2020s have also been chracterised by nasty global pandemic, geopolitical tensions, and brutal wars in Ukraine and Gaza, all of which helped shape the negative political discourse.

Figure 4: AFINN Sentiment Analysis: Emotional Trajectories in British Political Narrative

In contrast, political sentiment in the United Kingdom has remained largely stable and predominantly positive over the same period, with the notable exception of an extreme dip around 1980. This anomaly can be attributed to at least two significant factors:

Economic Recession and Unemployment: The late 1970s into the early 1980s saw the UK grappling with a severe economic downturn, known as the “Winter of Discontent” in 1978-1979, characterized by widespread strikes, high inflation, and significant unemployment. By 1980, unemployment was at its highest since the Great Depression, which would naturally lead to negative sentiments in political discourse.
Thatcher’s Austerity and Economic Policies: Right after Margaret Thatcher came to power in May 1979, she initiated a series of economic reforms aimed at reducing inflation, which included significant cuts in public spending and privatisation of state industries. These policies were highly controversial and led to immediate economic hardship for many sectors of society, fostering a negative public and political mood.

Figure 5: AFINN Sentiment Analysis: US vs. UK

Furthermore, by presenting the sentiment trajectories of the UK and the US on the same graph, it becomes evident that the mean sentiment value in the UK has almost consistently surpassed that of the US across the decades. This could lead to questions about circumstantial differences: maiden speeches at the Parliament generally tend to be more positive than presidential debates on language use. However, this does not exclude the message itself, which, if intended to drive positive changes, will inevitably bring up the unsatisfactory reality. Further, this result corresponds with another social media-based survey on the tone of political discourse done by the Jandoli Institute of St Bonaventure University, which revealed that political discourse in the United Kingdom is more civil than in the United States, with 84% of British participants saying the political discourse is generally positive, but only 43% for the US.

Figure 6: Survey on the Tone of Political Discourse, St Bonaventure University

Interestingly, the combined trajectories somehow verify the “special relationship” between the two countries. The unfitted curves between year 2000 and 2020 showed a highly synchronised “M” shape, representing parallel sentiment changes during the same period. Yet alternatively, this could also be the result of global incidents such as the 2008 financial crisis, which took a huge toll on most capitalist countries and may have elicited similar sentiments and responses.

4. Reflections

The application of such digital humanities tools as sentiment analysis and topic modelling conduces to research on a larger scale and across a longer period of time, allowing researchers to discover more general and overarching patterns, which had been impossible with mere experience. However, the digital tools do have a few limitations:

Lack of flexibility vis-à-vis human language: A typical example would be stopwords removal. Even after removing the words in the stopwords dictionary, the author still found a ton of clearly meaningless words, most of which in abbreviated forms, for example “ive” “youre” “hes” “theyre” etc. Besides this, some task specific stopwords must be removed manually by the author through an iterative process.
Low explainability: The LDA topic modelling is basically achieved by running thousands of iterations and selecting the best portfolio based on machine learning. The categorisation of words can be highly repetitive, with tax related topics occurring 3 times in the US model, and also sometimes simply wrong and illogical, as in Topic 7 and Topic 8 for the US.

Apart from the systemic errors, the selection of data is also likely to undermine the external validity of the research. As is explained in previous sections, due to the different electoral processes, it is difficult to find two fully comparable counterparts for the present research. Further improvements of the methods will merit future studies.

Footnotes

The American Presidency Project. (n.d.). Presidential Candidate Debates. Retrieved from https://www.presidency.ucsb.edu/people/other/presidential-candidate-debates↩︎
House of Commons Library. (2021, September 13). Maiden speeches in the House of Commons since 1918. Retrieved from https://commonslibrary.parliament.uk/research-briefings/sn04588/↩︎
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.↩︎
Posner, E. (2021, September 3). America’s return to realism. Project Syndicate. Retrieved from https://www.project-syndicate.org/commentary/america-return-to-foreign-policy-realism-by-eric-posner-2021-09↩︎