API’s are a convienient way of obtaining data from the web without the use of scraping techniques. In this assingment, data is pulled from the New York Times API, and is stored in R as a dataframe. The most viewed articles from the week starting March 24, 2019 will be pulled.
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.5.3
library(dplyr)
library(magrittr)
library(ggplot2)
the jsonlite package makes it easy to get the data into R.
# url obtained from https://developer.nytimes.com/docs/most-popular-product/1/overview
url <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/1.json?q=&facet_filter=true&api-key=UqvZ79vy3AyYhFwEMTHCT9mpyzlC6GPp"
# the fromJSON function will return a nested list - the flatten parameter will allow for a dataframe structure to be used
NYT.mostviewed <- fromJSON(url, flatten=T) %>% data.frame()
The returned dataframe can be tidied. Variables of interest were selected and renamed to provide a more concise table. Finally a barplot showing which sections the most viewed articles belong to is included
#Select metadata variables of interest. The url of the articles, section, author, title, date, and view rank are selected.
NYT.mostviewed %<>% select(link='results.url',
section='results.section',
author='results.byline',
title='results.title',
published_date='results.published_date',
view_rank='results.views')
ggplot(NYT.mostviewed,aes(section)) + geom_bar() +
coord_flip()
# Information on the most viewed artical of the week.
NYT.mostviewed %>% filter(view_rank==1)
## link
## 1 https://www.nytimes.com/2019/03/29/science/dinosaurs-extinction-asteroid.html
## section author
## 1 Science By WILLIAM J. BROAD and KENNETH CHANG
## title
## 1 Fossil Site Reveals Day That Meteor Hit Earth and, Maybe, Wiped Out Dinosaurs
## published_date view_rank
## 1 2019-03-29 1