Data-607-Assignment-9.knit

Overview

In this assignment, I use the New York Times API to import and analyze the JSON of stories on their home page.

Importing Data

In this code block, I load the necessary libraries and import the JSON data.

library(jsonlite)
library(tidyr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

times_data <- fromJSON("https://api.nytimes.com/svc/topstories/v2/home.json?api-key=wdVU8ZzTUEpitgVaTSo3nT2jDAaKKvXl")

times_data_frame <- as.data.frame(times_data)

In this code block I clean, widen, and lengthen the data frame.

times_data_frame$results.des_facet <- as.character(times_data_frame$results.des_facet)
times_data_frame$results.org_facet <- as.character(times_data_frame$results.org_facet)
times_data_frame$results.per_facet <- as.character(times_data_frame$results.per_facet)
times_data_frame$results.geo_facet <- as.character(times_data_frame$results.geo_facet)
times_data_frame_wide <- times_data_frame %>%
  unnest(results.multimedia, names_sep = ".")

Data Analysis

In this code block, I filter the data frame to count the number of articles from each section of the paper that appear on the homepage and display the results in a table and bar plot.

analysis_data <- times_data_frame_wide %>%
  select(results.section, results.subsection, results.title) %>%
  distinct() %>%
  group_by(results.section) %>%
  mutate(section_count = n()) %>%
  ungroup() %>%
  select(results.section, section_count) %>%
  distinct() %>%
  arrange(desc(section_count))
  
library(knitr)

kable(analysis_data, format = "pipe", title = "New York Times Homepage Breakdown", col.names = c("Section", "# of Articles"), align = "lc")

Section	# of Articles
us	6
world	4
nyregion	4
opinion	4
upshot	1
business	1
movies	1
style	1

library(ggplot2)

ggplot(data = analysis_data, aes(x = reorder(results.section, section_count), y = section_count, fill = results.section)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  labs(title = "New York Times Homepage Breakdown", x = "Section", y = "# of Articles") +
  scale_y_continuous(breaks = 1:6) +
  coord_flip()

Findings and Recommendations

Unsurprisingly, the New York Times front page mostly focuses on US stories (disappointed, not surprised). World is tied for second place with the NY region and opinion sections. We’re currently in a news cycle with a very high-profile international story, and it occurs to me to wonder how frequently the world section is ranked this high. I was kind of curious to know what the one movie article on the front page was (the code I used to answer this is shown below), and it turned out to be an article about Nicholas Cage. I’m not sure what I was expecting, but it wasn’t that.

movie_frontpage <- times_data_frame_wide %>%
  filter(results.section == "movies") %>%
  select(results.section, results.title, results.abstract) %>%
  distinct()

kable(movie_frontpage, format = "pipe", title = "New York Times Homepage - Movie Section", col.names = c("Section", "Title", "Abstract"), align = "lll")

Section	Title	Abstract
movies	With ‘Dream Scenario,’ Nicolas Cage Reclaims the Memes	In his new dark comedy, the star plays a man who begins popping up in people’s dreams. It’s a metaphor for viral fame that he found cathartic.