Sentiment Profiling of CT Mirror Using the Loughran-McDonald Financial Dictionary

Objective of the Analysis

The objective of this analysis is to scrape the latest content from the CT Mirror news website and analyze the emotional or sentiment orientation of the content using the Loughran-McDonald Dictionary, which is widely used in financial and risk communication contexts.

Unlike general sentiment dictionaries, this one classifies words into categories like Uncertainty, Litigious, Negative, Positive, and more — offering insights into how media language may reflect risk, doubt, or optimism in ongoing news cycles.

This script is dynamic — it can be re-run on different days to reflect the changing nature of media coverage.

Practical Implementation

This method can be used in areas such as:

Financial journalism analysis: To assess how often uncertain or risk-related language is used in economic reporting.
Policy communication audits: Evaluating sentiment in government or institutional statements.
Investor behavior modeling: Linking tone in financial news to stock market sentiment.
Media bias or agenda studies: Observing dominant tones across news outlets over time.
Crisis communication: Rapidly assess emotional signals in real-time during emergencies.

Brief Overview of Code

1. Libraries loaded:

For web scraping, sentiment analysis, visualization, and text processing.

library(lingmatch)    # For downloading and using financial sentiment dictionary

Loading required package: Matrix

library(dplyr)        # Data manipulation


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tidytext)     # Text tokenization
library(tibble)       # Working with tibbles
library(stringr)      # String operations
library(rvest)        # Web scraping
library(ggplot2)      # Data visualization
library(SnowballC)    # Word stemming

2. Dictionary download:

The Loughran-McDonald dictionary is loaded using lingmatch.

lingmatch::download.dict("loughranmcdonald", dir = tempdir())

loughranmcdonald dict downloaded:
  /private/var/folders/rp/mkwcs2k94r570kpc9y6hsywc0000gn/T/RtmpiOD7t8/loughranmcdonald.dic

lm_dict <- read.dic(file.path(tempdir(), "loughranmcdonald.dic"))

# Convert dictionary to tidy format with stemming
dict_df <- tibble::enframe(lm_dict, name = "sentiment", value = "word") %>%
  tidyr::unnest(word) %>%
  mutate(word = wordStem(word))

3. Data collection:

All <p> elements from https://ctmirror.org/ are scraped as text.

page <- read_html("https://ctmirror.org/")
paragraphs <- page %>% html_nodes("p") %>% html_text()

4. Preprocessing:

Text is tokenized, cleaned of stopwords, and stemmed.

ct_df <- tibble(text = paragraphs) %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words, by = "word") %>%
  mutate(word = wordStem(word))

5. Filtering:

Common non-informative local words (like “CT” or “Connecticut”) are removed.

ct_df <- ct_df %>%
  filter(!str_detect(word, "^[0-9]+$")) %>%
  filter(!word %in% c("ct", "connecticut", "connecticut's", "mirror", "mirror's",
                      "70,000", "john", "rosen", "2.9"))

6. Sentiment analysis and Visualization:

The processed words are matched to sentiment categories using the Loughran-McDonald dictionary. A horizontal bar chart then visualizes the frequency of terms associated with each sentiment, offering a clear snapshot of the emotional tone present in the article.

ct_df %>%
  inner_join(dict_df, by = "word") %>%
  dplyr::count(sentiment, sort = TRUE) %>%
  ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "CT Mirror Sentiment - Using Loughran-McDonald Dictionary",
    x = "Sentiment Category", y = "Frequency",
    subtitle = paste("CT Mirror Analysis |", format(Sys.Date(), "%B %d, %Y")),
    caption = "Prepared by Saurabh Srivastava"
  ) +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5)
  )

Warning in inner_join(., dict_df, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 18 of `x` matches multiple rows in `y`.
ℹ Row 3575 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Conclusion

By applying the Loughran-McDonald dictionary to CT Mirror’s live news content, we gain an immediate and structured understanding of how current media narratives frame uncertainty, risk, or positivity. This approach is scalable and adaptable to other sources, enabling robust monitoring of tone across sectors like finance, governance, or social policy.

It also demonstrates how scraping + sentiment analysis can create powerful, real-time insights — no manual tagging required.