For this assignment, I created an API key for the most popular articles in the NY times and will import the JSON data into R studio. Using the data, I wanted to see if there were any trends with keywords that came up frequently in the most popular articles.
library(httr)
## Warning: package 'httr' was built under R version 4.4.3
library(jsonlite)
library(tidyr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
nyt_data <- fromJSON("https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key=Kvwbcb6A0F0rOKRfIMVlCWUPGNVbpSVn")
nyt_df <- as.data.frame(nyt_data)
# Split keywords column into individual rows
nyt_df_split <- nyt_df %>%
separate_rows(results.adx_keywords, sep=";")
# Replace blank cells with NA
nyt_df_split <- nyt_df_split %>%
mutate(across(everything(), ~ ifelse(. == "" | is.na(.), "NA", .)))
# Convert columns into correct categories
nyt_df_split$results.des_facet <- as.character(nyt_df_split$results.des_facet)
nyt_df_split$results.org_facet <- as.character(nyt_df_split$results.org_facet)
nyt_df_split$results.per_facet <- as.character(nyt_df_split$results.per_facet)
nyt_df_split$results.geo_facet <- as.character(nyt_df_split$results.geo_facet)
From a quick glance, I noticed that many of the popular articles are in the US section so I thought it could be more interesting to narrow my analysis to popular keywords within the popular articles in the US section. To improve readability, I limited the keywords shown in the graph to words that came up more than once.
# Filter only US articles
nyt_us <- nyt_df_split %>%
filter(results.section == "U.S.")
# Count keywords in the US articles
popular_keywords <- nyt_us %>%
count(results.adx_keywords, sort=TRUE)
# Create histogram
ggplot(popular_keywords %>% slice_max(n > 1),
aes(x=reorder(results.adx_keywords, n), y= n)) +
geom_bar(stat="identity") +
coord_flip() +
labs(title = "Most Common Keywords in Popular US Articles",
x = "Keywords",
y = "Frequency")
Unsurprisingly, Donald Trump and the US politics and government as a whole are the keywords most frequently seen in the most popular articles in US focused articles in the last 30 days. Beyond news about US government, the death of Gene Hackman was the next news of interest for many readers of US section of the NY Times. I did a quick analysis of the number of articles by section since I was surprised that Donald Trump was only a keyword in 5 articles. Since there are 10 US articles in the dataframe, 50% still seems like a significant amount!
section_count <- nyt_df %>%
count(results.section)
print(section_count)
## results.section n
## 1 Arts 1
## 2 Movies 2
## 3 Opinion 2
## 4 Polls 1
## 5 Style 1
## 6 U.S. 9
## 7 Well 2
## 8 World 2