Week 9 Assignment

Introduction

For this assignment, I created an API key for the most popular articles in the NY times and will import the JSON data into R studio. Using the data, I wanted to see if there were any trends with keywords that came up frequently in the most popular articles.

Load necessary packages

library(httr)

## Warning: package 'httr' was built under R version 4.4.3

library(jsonlite)
library(tidyr)
library(dplyr)

## Warning: package 'dplyr' was built under R version 4.4.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.4.2

Load api file from nytimes

nyt_data <- fromJSON("https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key=Kvwbcb6A0F0rOKRfIMVlCWUPGNVbpSVn")

nyt_df <- as.data.frame(nyt_data)

Clean data

# Split keywords column into individual rows
nyt_df_split <- nyt_df %>%
  separate_rows(results.adx_keywords, sep=";")

# Replace blank cells with NA 
nyt_df_split <- nyt_df_split %>%
  mutate(across(everything(), ~ ifelse(. == "" | is.na(.), "NA", .)))

# Convert columns into correct categories
nyt_df_split$results.des_facet <- as.character(nyt_df_split$results.des_facet)
nyt_df_split$results.org_facet <- as.character(nyt_df_split$results.org_facet)
nyt_df_split$results.per_facet <- as.character(nyt_df_split$results.per_facet)
nyt_df_split$results.geo_facet <- as.character(nyt_df_split$results.geo_facet)

Data analysis

From a quick glance, I noticed that many of the popular articles are in the US section so I thought it could be more interesting to narrow my analysis to popular keywords within the popular articles in the US section. To improve readability, I limited the keywords shown in the graph to words that came up more than once.

# Filter only US articles 
nyt_us <- nyt_df_split %>%
  filter(results.section == "U.S.")

# Count keywords in the US articles 
popular_keywords <- nyt_us %>%
  count(results.adx_keywords, sort=TRUE)

# Create histogram 
ggplot(popular_keywords %>% slice_max(n > 1), 
       aes(x=reorder(results.adx_keywords, n), y= n)) +
         geom_bar(stat="identity") + 
         coord_flip() +
         labs(title = "Most Common Keywords in Popular US Articles",
                x = "Keywords",
                y = "Frequency")

Conclusion

Unsurprisingly, Donald Trump and the US politics and government as a whole are the keywords most frequently seen in the most popular articles in US focused articles in the last 30 days. Beyond news about US government, the death of Gene Hackman was the next news of interest for many readers of US section of the NY Times. I did a quick analysis of the number of articles by section since I was surprised that Donald Trump was only a keyword in 5 articles. Since there are 10 US articles in the dataframe, 50% still seems like a significant amount!

section_count <- nyt_df %>%
  count(results.section)
print(section_count)

##   results.section n
## 1            Arts 1
## 2          Movies 2
## 3         Opinion 2
## 4           Polls 1
## 5           Style 1
## 6            U.S. 9
## 7            Well 2
## 8           World 2