Data 607 Week 10 Assignment

Introduction

For this assignment, I continued to work with the New York Times API, similar to last week. I think it would be interesting to explore whether there is any bias in the overall sentiment of recent top articles from the NY Times’ US section. This seems particularly relevant given the ongoing discussions about bias in news sources. My goal is to investigate whether the NY Times actually exhibits a negative sentiment toward recent news in the US.

Load sentiment datasets

library(tidytext)

## Warning: package 'tidytext' was built under R version 4.4.3

library(textdata)

## Warning: package 'textdata' was built under R version 4.4.3

library(gutenbergr)

## Warning: package 'gutenbergr' was built under R version 4.4.3

get_sentiments("afinn")

## # A tibble: 2,477 × 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # ℹ 2,467 more rows

get_sentiments("bing")

## # A tibble: 6,786 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # ℹ 6,776 more rows

get_sentiments("nrc")

## # A tibble: 13,872 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # ℹ 13,862 more rows

Load in other necessary packages

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.4.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(stringr)
library(httr)

## Warning: package 'httr' was built under R version 4.4.3

## 
## Attaching package: 'httr'

## The following object is masked from 'package:textdata':
## 
##     cache_info

library(jsonlite)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.4.2

Load in NY times articles

nyt_data <- fromJSON("https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=Kvwbcb6A0F0rOKRfIMVlCWUPGNVbpSVn")

nyt_df <- as.data.frame(nyt_data)

Extract necessary variables

articles <- nyt_df %>%
   filter(results.section == "U.S.") %>%
  select(results.title, results.abstract)

Filter by sentiment using nrc

articles_sentiment <- articles %>%
  unite("text", results.abstract, results.title, sep = "") %>%
  unnest_tokens(word, text) %>%
  inner_join(get_sentiments("nrc")) %>%
  count(sentiment, sort=TRUE)

## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("nrc")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 56 of `x` matches multiple rows in `y`.
## ℹ Row 11669 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

print(articles_sentiment)

##       sentiment  n
## 1      positive 20
## 2      negative 16
## 3         trust 14
## 4         anger  9
## 5  anticipation  7
## 6          fear  6
## 7           joy  6
## 8       sadness  6
## 9      surprise  5
## 10      disgust  2

Data visualization

articles_sentiment %>%
  ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +  
  labs(title = "Sentiment in Most Popular NYT Articles in Last 7 Days",
       x = "Emotion",
       y = "Frequency")

Word cloud analysis

library(wordcloud)

## Warning: package 'wordcloud' was built under R version 4.4.3

## Loading required package: RColorBrewer

library(reshape2)

## Warning: package 'reshape2' was built under R version 4.4.2

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

articles %>%
  unnest_tokens(word, results.abstract) %>%
  inner_join(get_sentiments("bing"), by = "word") %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud()

Conclusion

From this analysis, we can see that the sentiment was more positive from the top US articles in the last 7 days, although not too different in number. However, after looking at the results in the word cloud, I noticed “trump” was placed in the positive bucket when it’s a name. I suspect this is because bing is seeing trump as the actual word, not as a name. I think this may have skewed my outcomes a bit and next time I may have to try using a different sentiment lexicon or filter this out. A little search is telling me that Named Entity Recognition may also be helpful if I try this again.