NYT_Most_Viewed

Author

Amanda Fox

Published

March 22, 2024

Introduction

The NYT provides a number of APIs to access data. In this example, I used the jsonlite package to pull the top 20 most viewed articles in the last seven days and created a dataframe in R.

I began with the libraries:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(jsonlite)

Attaching package: 'jsonlite'

The following object is masked from 'package:purrr':

    flatten

The site provided very clear documentation and instructions to get started. First I obtained an API key, and then I located the data I wanted from the “Most Popular” API and followed their examples for syntax. Then I simply used the fromJSON function to pull and flatten the data, converting it to a dataframe:

myurl <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=jnf9jCrYLAyPORPG8zeeDNBhc0rbvNbu" 

df_mostviewed <- fromJSON(myurl,flatten = TRUE) %>% 
  data.frame()

The resulting dataframe contained clean, tidy data with one row per article. It had 25 columns with data such as date, section, subsection, title, author, and an abstract. A column of keywords (“results.adx_keywords”) might be particularly useful for analysis: it contained a long delimited list of standardized categories for each article, which might be very useful to analyze articles by topic instead of searching for words in titles or abstracts.

I did see one limitation in that the query returned only 20 rows and I did not see an option to get more “top viewed” articles. It would be interesting to explore some of the other APIs like the general article searches to see what their capabilities are.

In any case, the process to pull this data from an API into a dataframe was very quick and easy and appears to be a very efficient way to tap into many new data sources.

# Sample row:

head(df_mostviewed,1)
  status                                                            copyright
1     OK Copyright (c) 2024 The New York Times Company.  All Rights Reserved.
  num_results                                        results.uri
1          20 nyt://article/d10d7d61-ce13-51d2-8294-aef6357bb947
                                                                                     results.url
1 https://www.nytimes.com/2024/03/22/us/politics/congress-spending-bill-government-shutdown.html
  results.id results.asset_id results.source results.published_date
1      1e+14            1e+14 New York Times             2024-03-22
      results.updated results.section results.subsection results.nytdsection
1 2024-03-22 21:18:50            U.S.           Politics                u.s.
                                                                                                                  results.adx_keywords
1 Law and Legislation;United States Politics and Government;Federal Budget (US);Johnson, Mike (1972- );House of Representatives;Senate
  results.column     results.byline results.type
1             NA By Catie Edmondson      Article
                                                          results.title
1 House Passes Spending Bill to Avert Shutdown, Prompting G.O.P. Revolt
                                                                                                                                                                                             results.abstract
1 The bipartisan vote split Republicans and prompted a threat to oust Speaker Mike Johnson. It was not clear whether Senate Republicans would allow the spending measure to pass in time to avoid a shutdown.
                                                                results.des_facet
1 Law and Legislation, United States Politics and Government, Federal Budget (US)
                 results.org_facet      results.per_facet results.geo_facet
1 House of Representatives, Senate Johnson, Mike (1972- )                  
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             results.media
1 image, photo, “Democracy is messy,” Speaker Mike Johnson said on Thursday., Kent Nishimura for The New York Times, 1, https://static01.nyt.com/images/2024/03/22/multimedia/22dc-spend-sub/22dc-johnson-hgtk-thumbStandard.jpg, https://static01.nyt.com/images/2024/03/22/multimedia/22dc-spend-sub/22dc-johnson-hgtk-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2024/03/22/multimedia/22dc-spend-sub/22dc-johnson-hgtk-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
  results.eta_id
1              0