Text Mining for Uncertainty: A Case Study Using the Loughran-McDonald Dictionary
Author
Saurabh C Srivastava
Published
May 1, 2025
Objective of the Analysis
The goal of this project is to examine the language of uncertainty in the Unabomber Manifesto using a financial sentiment lexicon — the Loughran-McDonald Dictionary. By identifying and analyzing uncertainty-related terms, this project aims to provide insight into the author’s psychological state and rhetorical strategies, particularly in terms of expressing ambiguity, doubt, or fear about modern society.
Practical Implementation
The techniques and approach used in this project can be extended to a wide range of real-world applications across different fields:
Business & Finance: Detecting uncertainty in earnings calls, annual reports (10-Ks), and investor communications to assess market sentiment or potential risks.
Public Policy & Government: Analyzing political speeches, public health updates, or government statements to monitor shifts in tone, public confidence, or emerging societal concerns.
Legal & Compliance: Identifying vague or hedging language in contracts, testimonies, or legal opinions to evaluate risk, clarity, or intent.
Media & Journalism: Exploring how uncertainty is framed in news articles or editorials, especially during crises or breaking events (e.g., elections, pandemics, economic downturns).
Education & Research: Examining academic writing or student essays to assess argumentative clarity or confidence in claims.
Customer Experience & Product Research: Applying uncertainty detection in product reviews or survey responses to highlight areas where customers express hesitation, doubt, or dissatisfaction.
By combining dictionary-based sentiment analysis with NLP techniques like tokenization, stopword removal, and stemming, this approach provides a structured way to turn raw text into insights — adaptable across domains where understanding language tone and intent is critical.
Brief Overview of Code
library(lingmatch) # Download and read sentiment dictionaries
Loading required package: Matrix
library(dplyr) # Data manipulation
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidytext) # Tokenization and text preprocessinglibrary(tibble) # Tibble format for clean dataframeslibrary(stringr) # String processinglibrary(rvest) # (Unused here but often for scraping)library(ggplot2) # Data visualizationlibrary(SnowballC) # Word stemming
1. Load the Loughran-McDonald Dictionary
Downloads the financial sentiment dictionary.
Reads it into a named list, where categories like "Uncertainty" hold words associated with that sentiment.
lingmatch::download.dict("loughranmcdonald", dir =tempdir())
Matches words in the manifesto with the dictionary.
Filters only uncertainty-related words.
Counts their frequency.
Creates a horizontal bar chart to display the top uncertainty terms.
uncertainty_df <- bomber_tib %>%inner_join(dict_df, by ="word") %>%filter(sentiment =='Uncertainty') %>% dplyr::count(word, sort =TRUE) %>%ggplot(aes(x =reorder(word, n), y = n)) +geom_col(fill ="#008080") +coord_flip() +labs(title ="Uncertainty in the Unabomber Manifesto",x ="Stemmed Uncertainty Terms", y ="Word Frequency",subtitle ="Based on Loughran-McDonald Dictionary | April 9, 2025",caption ="Prepared by Saurabh Srivastava") +theme(legend.position ="none",plot.title =element_text(size =14, face ="bold", hjust =0.5))
Warning in inner_join(., dict_df, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6 of `x` matches multiple rows in `y`.
ℹ Row 2027 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
uncertainty_df
Conclusion
The analysis of the Unabomber’s manifesto using the Loughran-McDonald Uncertainty dictionary reveals a distinct linguistic pattern centered around conditional or qualified statements. The overwhelming frequency of the word “depend” indicates a tendency to frame arguments with contingencies rather than absolutes. While other uncertainty-related terms like predict, suggest, and believe are also present, their comparatively lower frequency suggests that the narrative leans more toward cautious justification rather than chaotic ambiguity.
The frequent use of uncertain words shows that the writer was careful and cautious in expressing ideas, especially about modern technology. It wasn’t wild or emotional language, but rather a more thought-out and logical way of showing doubt. This helps us understand that the message wasn’t just emotional anger, but more of a planned and reasoned criticism — which is important when analyzing such extreme writings.