For this assignment, I continued to work with the New York Times API, similar to last week. I think it would be interesting to explore whether there is any bias in the overall sentiment of recent top articles from the NY Times’ US section. This seems particularly relevant given the ongoing discussions about bias in news sources. My goal is to investigate whether the NY Times actually exhibits a negative sentiment toward recent news in the US.
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.4.3
library(textdata)
## Warning: package 'textdata' was built under R version 4.4.3
library(gutenbergr)
## Warning: package 'gutenbergr' was built under R version 4.4.3
get_sentiments("afinn")
## # A tibble: 2,477 × 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # ℹ 2,467 more rows
get_sentiments("bing")
## # A tibble: 6,786 × 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # ℹ 6,776 more rows
get_sentiments("nrc")
## # A tibble: 13,872 × 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
## 7 abandoned negative
## 8 abandoned sadness
## 9 abandonment anger
## 10 abandonment fear
## # ℹ 13,862 more rows
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(httr)
## Warning: package 'httr' was built under R version 4.4.3
##
## Attaching package: 'httr'
## The following object is masked from 'package:textdata':
##
## cache_info
library(jsonlite)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
nyt_data <- fromJSON("https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=Kvwbcb6A0F0rOKRfIMVlCWUPGNVbpSVn")
nyt_df <- as.data.frame(nyt_data)
articles <- nyt_df %>%
filter(results.section == "U.S.") %>%
select(results.title, results.abstract)
articles_sentiment <- articles %>%
unite("text", results.abstract, results.title, sep = "") %>%
unnest_tokens(word, text) %>%
inner_join(get_sentiments("nrc")) %>%
count(sentiment, sort=TRUE)
## Joining with `by = join_by(word)`
## Warning in inner_join(., get_sentiments("nrc")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 56 of `x` matches multiple rows in `y`.
## ℹ Row 11669 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
print(articles_sentiment)
## sentiment n
## 1 positive 20
## 2 negative 16
## 3 trust 14
## 4 anger 9
## 5 anticipation 7
## 6 fear 6
## 7 joy 6
## 8 sadness 6
## 9 surprise 5
## 10 disgust 2
articles_sentiment %>%
ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
coord_flip() +
labs(title = "Sentiment in Most Popular NYT Articles in Last 7 Days",
x = "Emotion",
y = "Frequency")
library(wordcloud)
## Warning: package 'wordcloud' was built under R version 4.4.3
## Loading required package: RColorBrewer
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.4.2
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
articles %>%
unnest_tokens(word, results.abstract) %>%
inner_join(get_sentiments("bing"), by = "word") %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud()
From this analysis, we can see that the sentiment was more positive from the top US articles in the last 7 days, although not too different in number. However, after looking at the results in the word cloud, I noticed “trump” was placed in the positive bucket when it’s a name. I suspect this is because bing is seeing trump as the actual word, not as a name. I think this may have skewed my outcomes a bit and next time I may have to try using a different sentiment lexicon or filter this out. A little search is telling me that Named Entity Recognition may also be helpful if I try this again.