Sentiment Analysis and Keyword Extraction of an Economic Commentary on Inflation

Objective of the Analysis

The goal of this project is to extract meaningful insights from a public economic article (“Fighting Inflation: Divided We Fall” from Hoover Institution) by applying Natural Language Processing (NLP) techniques in R. This includes:

Identifying the top 20 most frequently used keywords in the article.
Performing sentiment analysis using the NRC lexicon to visualize and differentiate between the most prominent positive and negative emotional words.
Presenting the findings through informative visualizations using ggplot2 and cowplot.

Practical Implementation

Marketing Teams can apply similar methods to assess public sentiment in editorial content, customer feedback, or competitor messaging to refine campaigns and communication strategies.
Financial Analysts and Economists can rapidly extract themes and emotional tone from policy papers, economic commentary, or market outlooks to inform macro-level analysis and forecasting.
Content Strategists may use this approach to evaluate the emotional impact of thought leadership content and align messaging with target audience sentiment.
Criminologists and Investigative Analysts could leverage sentiment and thematic analysis — like in the Unabomber Manifesto case — to uncover psychological patterns, emotional tone shifts, or ideological drivers embedded in long-form manifestos or threatening letters. This approach may assist in profiling or detecting radicalization narratives based on linguistic cues.

Brief Overview of Code

This R script performs a Natural Language Processing (NLP) analysis on a piece of economic commentary titled “Fighting Inflation: Divided We Fall”. The goal is to extract insights into the most frequently used terms and analyze their emotional tone using lexicon-based sentiment analysis.

library(stringr)        # For string manipulation: str_remove_all(), str_trim(), str_squish()
library(dplyr)          # For data manipulation: %>%, filter(), count(), mutate(), etc.


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(purrr)          # For functional programming: reduce()
library(tibble)         # For creating tibbles: tibble()
library(tidytext)       # For tokenization: unnest_tokens(), stop_words, get_sentiments()
library(ggplot2)        # For creating bar plots: ggplot(), geom_col(), etc.
library(cowplot)        # For combining multiple ggplots: plot_grid()
library(textclean)      # For text normalization: replace_number() to convert digits into written words

grumpy_text = c(" The Fed cannot cure this inflation alone. 
    Relying on it to do so will only lead to cycles of stagflation.
    Our inflation stems from fiscal policy. We are seeing the effects of 
    about $5 trillion of printed or borrowed money, most sent out as checks. 
    But that alone need not cause inflation. The new money is reserves,  
    which pay interest, and so are equivalent to Treasury debt. 
    The United States can borrow and spend without inflation, 
    if people have faith that debt will be repaid, and that Treasury debt 
    is a good investment. Then those who wish to spend will sell it to those who 
    wish to save. With this faith, the United States has had many deficits without 
    inflation. The fact that this stimulus led to infla-tion implies a broader 
    loss of faith that the United States will repay debt.The Fed’s tools to 
    offset this inflation are blunt. By raising interest rates, 
    the Fed pushes the economy toward recession. 
    It hopes to push just enough to offset the fiscal boost.  
    But an economy with a floored fiscal gas pedal and monetary brakes is not healthy. 
    Our economy is not a simple Keynesian cup, which one can fill or empty 
    with “aggregate demand” from any source. Raising interest rates can tank 
    asset markets and raise borrowing costs, cutting house building, car purchases, 
    and corporate investment. The Fed can interrupt the flow of credit. 
    But higher interest rates don’t do much to discourage the consumption 
    spending that fiscal stimulus checks shot off—the desire to spend 
    the government’s money and debt on something. We have at best an 
    unbalanced economy. Our economy needs investment and housing. 
    Today’s demand is tomorrow’s supply.
VICIOUS CIRCLES
Slowing the economy is not guaranteed to durably lower infla-tion anyway. 
Even during the 2008 recession, with unemploy-ment above 8 percent, 
core inflation fell only from 2.4 percent in December 2007 to 0.6 percent in October
2010, and then bounced right back to 2.3 percent in December 2011. 
At this rate, even temporarily curing the 6 percent May 2022 core inflation 
would take an astronomical recession.In 1970 and 1974, 
the Fed raised interest rates more promptly and more sharply than now, 
from 4 percent to 9 percent in 1970, and from 3.5 percent to 13 percent in 1974. 
Each increase produced a bruising recession. Each lowered inflation. 
Each time, inflation roared back.The “Phillips curve,” by which the 
Fed believes slowing economic activity via interest rates lowers inflation, 
is ephemeral. Some recessions and rate hikes even feature higher inflation, 
especially in countries with fiscal problems. A recession would trigger more 
stimulus and another financial bailout. But that’s how we got in this mess in 
the first place. Those would lead to more inflation. A recession without 
the expected spending, stimulus, and bailout  would be severe.
Higher interest rates would directly worsen deficits by adding to 
the inter-est costs on the debt. In 1980, federal debt was under 
25 percent of GDP. Lowering inflation was hard enough. Now the debt 
exceeds 100 percent. Each percentage point of higher interest rate 
means $250 billion more inflation-inducing deficit.Our governments 
are now addressing inflation by borrowing or printing even more money 
to pay people’s higher bills. That will just make matters worse. 
A witch hunt for “greed,” “monopoly,” and “profiteers” will fail, 
as it has for centuries. Price controls or political pres-sure 
to lower prices will  just create long lines and worsen supply-chain snafus. 
Endless excuses and spin just convince people that our governments 
have no idea what they’re doing
The Fed can’t do it alone. To durably end inflation, 
the government also has to fix the underlying fiscal problem. 
Short-run deficit reduction, temporary measures, or accounting gimmicks 
will not work. Nor will a bout of high-tax, growth-killing “austerity,” 
which would only make matters worse. The United States has to persuade 
people that over the long haul of several decades, 
it  will return to its tradition of running small primary surpluses 
that gradually repay debts")

The analysis begins by cleaning and preparing the raw text (grumpy_text) using a pipeline of string operations. All newline characters are removed, lines that contain only numbers are eliminated, and extra whitespace is trimmed and condensed. The resulting text is then split into individual words, and reassembled into a single string using purrr::reduce(paste).

result = grumpy_text %>% 
      str_remove_all("\n") %>% 
      str_remove_all("^[0-9]*$") %>%
      str_trim(side = c("both")) %>% 
      str_squish() %>%
      strsplit(split = " ") %>%
      purrr::reduce(paste)

Next, replace_number() is applied to convert numeric words into their written equivalents without removing them. A tibble (result_df) is then created with one row per line and the cleaned text assigned as a column. This forms the foundation for text tokenization.

result = replace_number(result, num.paste = FALSE, remove = FALSE)

Using unnest_tokens(), the text is broken down into individual words. Numbers are filtered out, stop words (common English words) are removed using the built-in stop_words dataset, and duplicates are eliminated to retain only distinct, meaningful tokens. These tokens are then stored in grumpy_tokens, with line numbers assigned for reference.

result_df <- tibble(line = 1:length(result), text = result)
tail(result_df)

# A tibble: 6 × 2
   line text     
  <int> <chr>    
1   651 primary  
2   652 surpluses
3   653 that     
4   654 gradually
5   655 repay    
6   656 debts

head(result_df,30)

# A tibble: 30 × 2
    line text     
   <int> <chr>    
 1     1 The      
 2     2 Fed      
 3     3 cannot   
 4     4 cure     
 5     5 this     
 6     6 inflation
 7     7 alone.   
 8     8 Relying  
 9     9 on       
10    10 it       
# ℹ 20 more rows

str(result_df)

tibble [656 × 2] (S3: tbl_df/tbl/data.frame)
 $ line: int [1:656] 1 2 3 4 5 6 7 8 9 10 ...
 $ text: chr [1:656] "The" "Fed" "cannot" "cure" ...

prod_tokens = result_df %>%
      unnest_tokens(word, text) %>% 
      filter(!str_detect(word, "^[0-9]*$")) %>% 
      anti_join(stop_words) %>%
      distinct()

Joining with `by = join_by(word)`

head(prod_tokens)

# A tibble: 6 × 2
   line word     
  <int> <chr>    
1     2 fed      
2     4 cure     
3     6 inflation
4     8 relying  
5    16 lead     
6    18 cycles

tail(prod_tokens)

# A tibble: 6 × 2
   line word     
  <int> <chr>    
1   649 running  
2   651 primary  
3   652 surpluses
4   654 gradually
5   655 repay    
6   656 debts

grumpy_tokens = cbind.data.frame(linenumber = 1:nrow(prod_tokens), prod_tokens)
tail(grumpy_tokens)

    linenumber line      word
324        324  649   running
325        325  651   primary
326        326  652 surpluses
327        327  654 gradually
328        328  655     repay
329        329  656     debts

To visualize the most frequently used terms, the top 20 words are counted and plotted in a horizontal bar chart (g1) using ggplot2. This chart provides insight into the core vocabulary used throughout the article.

g1 <- grumpy_tokens %>%
      dplyr::count(word, sort = TRUE) %>% 
      slice(1:20) %>% 
      ggplot(aes(reorder(word, n), n, fill = word)) +
      geom_col()+ coord_flip() + 
      xlab("Word") + 
      ylab("Count")+
      theme(legend.position = "")+
      ggtitle("Top 20 Words") +
      theme(
        plot.title = element_text(hjust = 0.5, size = 16),
        axis.title.x = element_text(size = 14),
        axis.title.y = element_text(size = 14)
      )
    
g1

To analyze the emotional tone of the text, the NRC sentiment lexicon is loaded (nrc_dict). The tokenized words are joined with this dictionary to classify them as either “positive” or “negative” sentiments. The top 20 sentiment-related words are then visualized in a second horizontal bar chart (g2), color-coded by sentiment.

nrc_dict = get_sentiments("nrc")
head(nrc_dict)

# A tibble: 6 × 2
  word      sentiment
  <chr>     <chr>    
1 abacus    trust    
2 abandon   fear     
3 abandon   negative 
4 abandon   sadness  
5 abandoned anger    
6 abandoned fear

table(nrc_dict$sentiment)


       anger anticipation      disgust         fear          joy     negative 
        1245          837         1056         1474          687         3316 
    positive      sadness     surprise        trust 
        2308         1187          532         1230

g2 <- nrc_dict %>%
      inner_join(grumpy_tokens, by = "word") %>% 
      filter(sentiment %in% c("positive", "negative")) %>% 
      count(word, sentiment, sort = TRUE) %>%
      slice_max(n, n = 20, with_ties = FALSE) %>%
      ggplot(aes(x = reorder(word, n), y = n, fill = sentiment)) +
      geom_col() +
      coord_flip() +
      labs(title = "Top Sentiment Words", x = "Word", y = "Count", caption = "Saurabh's Work") +
      theme(
        plot.title = element_text(hjust = 0.5, size = 16),
        axis.title.x = element_text(size = 14),
        axis.title.y = element_text(size = 14)
      )

Warning in inner_join(., grumpy_tokens, by = "word"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 2610 of `x` matches multiple rows in `y`.
ℹ Row 310 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

g2

Finally, both charts (g1 and g2) are combined side-by-side using cowplot::plot_grid() to create a unified view that displays both the most frequent words and the emotional intensity of the language used in the article.

cowplot::plot_grid(g1,g2)

Conclusion

The analysis reveals that emotionally charged words such as inflation, recession, debt, faith, and stimulus dominate the discussion. Negative sentiment is more prominent in the text, which aligns with the tone of economic caution and criticism toward current fiscal and monetary policies. The use of sentiment analysis effectively highlights the emotional undertone of the article, offering both quantitative and qualitative understanding of the text.