Tracking the Emotional Pulse of News: A Text Mining Case Study on CT News Junkie
Author
Saurabh C Srivastava
Published
May 1, 2025
Objective of the Analysis
The goal of this project is to analyze the emotional tone of current articles from CT News Junkie using text mining techniques. Since the content on the website updates daily, this script offers a dynamic, automated way to detect and visualize changing emotional trends over time, without any manual updates to the source.
Practical Implementation
This approach can be extended to:
News sentiment tracking: Analyze how emotional tones shift in political or economic coverage.
Brand monitoring: Track emotion trends in online reviews or news articles about a company.
Public opinion mining: Scrape and assess comment sections or blogs on ongoing societal issues.
Academic or media research: Compare emotional tones across multiple outlets or over time.
Crisis communication: Rapidly assess emotional signals in real-time during emergencies.
Brief Overview of Code
1. Required Libraries
library(rvest) # For web scraping HTML contentlibrary(tibble) # For creating tidy data frameslibrary(dplyr) # For data manipulation (filtering, grouping, counting)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidytext) # For text tokenization and sentiment lexiconslibrary(stringr) # For regex-based string filteringlibrary(ggplot2) # For plotting frequency and emotion graphs
2. Scrapes all <p> elements from the webpage to extract article text.
# 1. Web Scraping: Pull content from CT News Junkiejohn = rvest::read_html("https://ctnewsjunkie.com/")jr = john %>%html_nodes("p") %>%html_text()
3. Converts the raw text into a structured tibble format with a source label.
# 2. Create a dataframe for analysisjr.df =tibble(text = jr, name ="junkie")
4. Uses tidytext to tokenize text and remove common English stopwords.
# 3. Tokenization: Break text into individual wordsjr.df = jr.df %>%unnest_tokens(word, text) %>%anti_join(stop_words)
Joining with `by = join_by(word)`
5. Removes numbers and known irrelevant words to improve result quality and then Plots the most common meaningful words in the scraped text.
# 4. Text Cleaning: Filter out noise termsjr.df %>%filter(!str_detect(word, "^[0-9]*$")) %>%filter(!str_detect(word, "ct")) %>%filter(!str_detect(word, "connecticut")) %>%filter(!str_detect(word, "70,000")) %>%filter(!str_detect(word, "john")) %>%filter(!str_detect(word, "rosen")) %>%filter(!str_detect(word, "2.9")) %>%group_by(word) %>% dplyr::count(word, sort =TRUE) %>%ungroup() %>%slice(1:12)%>%ggplot(aes(reorder(word,n), n, fill =as.factor(n))) +geom_col() +coord_flip() +theme(legend.position ="none")
6. Loads NRC Emotion Lexicon and filters only emotion categories (not polarity).
# 6. Emotion Mapping: Get emotion labels from NRC lexiconnrc =get_sentiments("nrc")nrc_emotion = nrc %>%filter(sentiment !="negative"& sentiment !="positive")
7. Generates a faceted bar plot showing the top emotion words for each category.
Warning in inner_join(., nrc_emotion): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 59 of `x` matches multiple rows in `y`.
ℹ Row 1184 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Conclusion
The project demonstrates how simple R scripts can automatically pull news data and extract meaningful emotion-driven insights. By filtering common noise words and applying the NRC Emotion Lexicon, it becomes possible to profile how emotionally charged the content is — giving readers, analysts, and media watchers a new layer of interpretation. The output, updated daily with fresh headlines, offers a valuable tool for sentiment trendspotting.