Tracking the Emotional Pulse of News: A Text Mining Case Study on CT News Junkie

Objective of the Analysis

The goal of this project is to analyze the emotional tone of current articles from CT News Junkie using text mining techniques. Since the content on the website updates daily, this script offers a dynamic, automated way to detect and visualize changing emotional trends over time, without any manual updates to the source.

Practical Implementation

This approach can be extended to:

News sentiment tracking: Analyze how emotional tones shift in political or economic coverage.
Brand monitoring: Track emotion trends in online reviews or news articles about a company.
Public opinion mining: Scrape and assess comment sections or blogs on ongoing societal issues.
Academic or media research: Compare emotional tones across multiple outlets or over time.
Crisis communication: Rapidly assess emotional signals in real-time during emergencies.

Brief Overview of Code

1. Required Libraries

library(rvest)        # For web scraping HTML content
library(tibble)       # For creating tidy data frames
library(dplyr)        # For data manipulation (filtering, grouping, counting)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tidytext)     # For text tokenization and sentiment lexicons
library(stringr)      # For regex-based string filtering
library(ggplot2)      # For plotting frequency and emotion graphs

2. Scrapes all `<p>` elements from the webpage to extract article text.

# 1. Web Scraping: Pull content from CT News Junkie
john = rvest::read_html("https://ctnewsjunkie.com/")
jr = john %>% html_nodes("p") %>% html_text()

3. Converts the raw text into a structured tibble format with a source label.

# 2. Create a dataframe for analysis
jr.df = tibble(text = jr, name = "junkie")

4. Uses `tidytext` to tokenize text and remove common English stopwords.

# 3. Tokenization: Break text into individual words
jr.df = jr.df %>% unnest_tokens(word, text) %>% anti_join(stop_words)

Joining with `by = join_by(word)`

5. Removes numbers and known irrelevant words to improve result quality and then Plots the most common meaningful words in the scraped text.

# 4. Text Cleaning: Filter out noise terms
jr.df %>% 
      filter(!str_detect(word, "^[0-9]*$")) %>%
      filter(!str_detect(word, "ct")) %>%
      filter(!str_detect(word, "connecticut")) %>%
      filter(!str_detect(word, "70,000")) %>%
      filter(!str_detect(word, "john")) %>%
      filter(!str_detect(word, "rosen")) %>%
      filter(!str_detect(word, "2.9")) %>%
      group_by(word) %>% 
      dplyr::count(word, sort = TRUE) %>%
      ungroup() %>%
      slice(1:12)%>%
      ggplot(aes(reorder(word,n), n, fill = as.factor(n))) + 
      geom_col() + 
      coord_flip() +
      theme(legend.position = "none")

6. Loads NRC Emotion Lexicon and filters only emotion categories (not polarity).

# 6. Emotion Mapping: Get emotion labels from NRC lexicon
nrc = get_sentiments("nrc")
nrc_emotion = nrc %>% filter(sentiment != "negative" & sentiment != "positive")

7. Generates a faceted bar plot showing the top emotion words for each category.

# 7. Emotion Visualization: Top 20 emotion words per category
jr.df %>% 
      inner_join(nrc_emotion) %>% 
      #count(word, sentiment, sort=TRUE) %>% 
      group_by(sentiment) %>% 
      dplyr::count(word, sentiment, sort=TRUE) %>%
      ungroup() %>%
      slice_max(n, n = 20) %>%
      mutate(word=reorder(word, n)) %>% 
      ggplot(aes(word, n)) + 
      geom_col(aes(fill=sentiment)) +
      facet_wrap(~sentiment, scale="free_y") + 
      coord_flip()+
      labs(
           title = "Emotional Content", 
           subtitle = "CT News Junkie",
           caption = "Saurabh's Work")+
           theme(legend.position = "none")

Joining with `by = join_by(word)`

Warning in inner_join(., nrc_emotion): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 59 of `x` matches multiple rows in `y`.
ℹ Row 1184 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Conclusion

The project demonstrates how simple R scripts can automatically pull news data and extract meaningful emotion-driven insights. By filtering common noise words and applying the NRC Emotion Lexicon, it becomes possible to profile how emotionally charged the content is — giving readers, analysts, and media watchers a new layer of interpretation. The output, updated daily with fresh headlines, offers a valuable tool for sentiment trendspotting.