In this assignment, I use the New York Times API to import and analyze the JSON of stories on their home page.
In this code block, I load the necessary libraries and import the JSON data.
library(jsonlite)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
times_data <- fromJSON("https://api.nytimes.com/svc/topstories/v2/home.json?api-key=wdVU8ZzTUEpitgVaTSo3nT2jDAaKKvXl")
times_data_frame <- as.data.frame(times_data)
In this code block I clean, widen, and lengthen the data frame.
times_data_frame$results.des_facet <- as.character(times_data_frame$results.des_facet)
times_data_frame$results.org_facet <- as.character(times_data_frame$results.org_facet)
times_data_frame$results.per_facet <- as.character(times_data_frame$results.per_facet)
times_data_frame$results.geo_facet <- as.character(times_data_frame$results.geo_facet)
times_data_frame_wide <- times_data_frame %>%
unnest(results.multimedia, names_sep = ".")
In this code block, I filter the data frame to count the number of articles from each section of the paper that appear on the homepage and display the results in a table and bar plot.
analysis_data <- times_data_frame_wide %>%
select(results.section, results.subsection, results.title) %>%
distinct() %>%
group_by(results.section) %>%
mutate(section_count = n()) %>%
ungroup() %>%
select(results.section, section_count) %>%
distinct() %>%
arrange(desc(section_count))
library(knitr)
kable(analysis_data, format = "pipe", title = "New York Times Homepage Breakdown", col.names = c("Section", "# of Articles"), align = "lc")
Section | # of Articles |
---|---|
us | 6 |
world | 4 |
nyregion | 4 |
opinion | 4 |
upshot | 1 |
business | 1 |
movies | 1 |
style | 1 |
library(ggplot2)
ggplot(data = analysis_data, aes(x = reorder(results.section, section_count), y = section_count, fill = results.section)) +
geom_bar(stat = "identity", show.legend = FALSE) +
labs(title = "New York Times Homepage Breakdown", x = "Section", y = "# of Articles") +
scale_y_continuous(breaks = 1:6) +
coord_flip()
Unsurprisingly, the New York Times front page mostly focuses on US stories (disappointed, not surprised). World is tied for second place with the NY region and opinion sections. We’re currently in a news cycle with a very high-profile international story, and it occurs to me to wonder how frequently the world section is ranked this high. I was kind of curious to know what the one movie article on the front page was (the code I used to answer this is shown below), and it turned out to be an article about Nicholas Cage. I’m not sure what I was expecting, but it wasn’t that.
movie_frontpage <- times_data_frame_wide %>%
filter(results.section == "movies") %>%
select(results.section, results.title, results.abstract) %>%
distinct()
kable(movie_frontpage, format = "pipe", title = "New York Times Homepage - Movie Section", col.names = c("Section", "Title", "Abstract"), align = "lll")
Section | Title | Abstract |
---|---|---|
movies | With ‘Dream Scenario,’ Nicolas Cage Reclaims the Memes | In his new dark comedy, the star plays a man who begins popping up in people’s dreams. It’s a metaphor for viral fame that he found cathartic. |