Chapter 2 Base Codes

This section reproduces codes from Chapter 2 of “Text Mining with R: A Tidy Approach” (Silge & Robinson, 2017).

library(tidytext)
library(janeaustenr)
library(httr)
library(jsonlite)
library(tidyverse)
library(syuzhet)

# extracts novels, assigns line numbers
# divides into chapters
tidy_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    line_number = row_number(),
    chapter = cumsum(str_detect(
      text, regex(
        "^chapter[\\divxlc]", ignore_case=TRUE
      )
    ))
  ) %>%
  ungroup() %>%
  unnest_tokens(word, text) # splits text into words

head(tidy_books)

## # A tibble: 6 × 4
##   book                line_number chapter word       
##   <fct>                     <int>   <int> <chr>      
## 1 Sense & Sensibility           1       0 sense      
## 2 Sense & Sensibility           1       0 and        
## 3 Sense & Sensibility           1       0 sensibility
## 4 Sense & Sensibility           3       0 by         
## 5 Sense & Sensibility           3       0 jane       
## 6 Sense & Sensibility           3       0 austen

# uses Bing here as an example to label positive or negative
bing <- get_sentiments("bing")
book_sentiment <- tidy_books %>%
  inner_join(bing, by="word") %>%
  count(book, index=line_number %/%80, sentiment) %>%
  pivot_wider(
    names_from = sentiment, 
    values_from = n,
    values_fill = 0) %>%
  mutate(sentiment=positive-negative)

head(book_sentiment)

## # A tibble: 6 × 5
##   book                index negative positive sentiment
##   <fct>               <dbl>    <int>    <int>     <int>
## 1 Sense & Sensibility     0       16       32        16
## 2 Sense & Sensibility     1       19       53        34
## 3 Sense & Sensibility     2       12       31        19
## 4 Sense & Sensibility     3       15       31        16
## 5 Sense & Sensibility     4       16       34        18
## 6 Sense & Sensibility     5       16       51        35

Visualize Bing sentiments

ggplot(book_sentiment, aes(index, sentiment, fill=book))+
  geom_col(show.legend = FALSE)+
  facet_wrap(~book, scales="free_x")+
  labs(x="Contribution to sentiment",
       y=NULL)

NYT Political Articles with Syuzhet

This section imports political news from NYT Top Stories API; calculate numeric sentiment scores using syuzhet; visualize the sentiments of articles.

# conceal api key
api_key <- Sys.getenv("NYT_API_KEY")

# import political articles from top stories API
url <- paste0("https://api.nytimes.com/svc/topstories/v2/politics.json?api-key=", api_key)

res <- GET(url)
if (res$status_code != 200) {
  warning("Request failed with status: ", res$status_code)
} else {
  nyt_data <- fromJSON(content(res, "text"), flatten = TRUE)
}

data <- fromJSON(content(res,"text"), flatten = TRUE)
df <- data$results %>%
  transmute(
    title = title,
    abstract = abstract,
    section = section,
    published_date = as.Date(published_date),
    url = url,
    text = paste(title, abstract, sep = ".")
  )

Let’s move on to the sentiment analysis using syuzhet.

Syuzhet is a text sentiment method which turns raw text into numeric values. It adds up all the word scores in an article and divides by the number of tokens. If the final score is positive then the article is overall positive and vice versa.

texts <- df$text
# create a new variable for sentiment score per article
df$sentiment_score <- get_sentiment(texts, method = "syuzhet")
head(df %>% select(title, sentiment_score))

##                                                                         title
## 1 New Weapons Testing Won’t Include Nuclear Explosions, Energy Secretary Says
## 2     N.Y.C. rabbi moves off sidelines to condemn Mamdani amid Jewish divide.
## 3     Syria’s President to Visit Washington for First Time Since Taking Power
## 4                Latest Strike on Boat in Caribbean Sea Kills 3, Hegseth Says
## 5   Food Stamp Cuts Expose Trump’s Strategy to Use Shutdown to Advance Agenda
## 6             Anger Over ICE Raids Is Driving Some Latino Voters to the Polls
##   sentiment_score
## 1            2.00
## 2           -0.25
## 3            0.40
## 4           -2.30
## 5            0.70
## 6           -2.85

summary(df$sentiment_score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -2.950  -1.050   0.400  -0.018   0.850   2.250

# the middle 50% of data is between [-1.05, 0.85]

Visualization

# histogram to check the distribution
ggplot(df, aes(x = sentiment_score))+ 
  geom_histogram(bins=10, binwidth = 0.4,fill = "steelblue", color = "white") +
  labs(
    title = "Sentiment Distribution of NYT Political Articles",
    x = "Sentiment Score",
    y = "Count"
  )

Among 25 political articles, most of the articles are slightly positive around 0-1, while a few articles are strongly negative which make it left-skewed. It’s normal regarding political articles because they might contain words about conflict, criticism, political biases, which make the scores negative. But overall, the articles are mildly positive.

ggplot(df, aes(sample = sentiment_score)) +
  stat_qq() +
  stat_qq_line(color = "red", linetype = "dashed") +
  labs(
    title = "Q-Q Plot of Sentiment Scores",
    x = "",
    y= "")

Right tail on the line indicates the most positive articles are normal; middle slightly above the line indicates higher quantiles than expected; left tail under the line matches with the histogram that the most negative articles are more extreme than expected. The qq plot confirms that the normality is approximate, but it’s acceptable since the size is only 25.

Conclusion

Analysis of 25 political articles from the NYT Top Stories API, using syuzhet lexicon, indicates that the overall tone of the articles is mildly positive. The distribution of scores is slightly left-skewed, with most articles showing mild positivity and a few heavily negative outliers. A Q-Q plot confirms with the histogram that while the right tail aligns with normality, the left tail is heavier with negative outliers. Overall, the dataset suggests the NYT political articles are somewhat balanced, with occasional strongly negative articles.

Sentiment Analysis with Political Articles and Syuzhet

Haoming Chen

2025-11-02

Introduction

Chapter 2 Base Codes

NYT Political Articles with Syuzhet

Conclusion