NY Times Web API

Introduction

The New York Times offers a standardized way for people to search their database of articles under different criteria. This API allows for user to embedd within their code the proper calls to pull the information they need.

Signup Process and Personal API Key

To access their data through their API, the NY Times requires every user to register and request a unique KEY The process also required to enable the KEY for each of the different categories of data the NY Times offers. In my case for this assigment I enabled most popular articles.

Let’s go ahead and initialize our packages and our assigned API Key.

library(httr)
library("jsonlite")
## Warning: package 'jsonlite' was built under R version 4.1.3
library("rjson")
## 
## Attaching package: 'rjson'
## The following objects are masked from 'package:jsonlite':
## 
##     fromJSON, toJSON
library("rvest")
rm(list = ls())

Now let’s initialize our personal key.

# My personal key
my_api_key <- "Aoc02FjJ8N3rShPDaZXD05UAcEFtpQMO"

Accesing Data through the API

Setting things up to use the WEB API

We decided to use the API for the MOST POPULAR ARTICLES. We will test using the MOST VIEWED ARTICLES. This part of the API gives us the flexibility to access the most viewed articles in the last 1, 7 and 30 days.

Let’s start by accesing the most viewed articles in the last 7 days.

We will contruct the proper URL as per NY Times API specs.

# Can be 1, 7, or 30
my_period <- 7

times_url_p1 <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/"

time_url_p2 <- ".json?api-key="

my_times_url <- paste0(times_url_p1,my_period,time_url_p2,my_api_key)
my_times_url
## [1] "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=Aoc02FjJ8N3rShPDaZXD05UAcEFtpQMO"

We checked the URL and it looks well constructed. So far so good.

Testing API Calls to access data

We will use the GET command to test there are no issues with our API calls and all is properly set.

most_popular1 <- GET(my_times_url)
str(most_popular1)
## List of 10
##  $ url        : chr "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=Aoc02FjJ8N3rShPDaZXD05UAcEFtpQMO"
##  $ status_code: int 200
##  $ headers    :List of 22
##   ..$ date                         : chr "Sun, 27 Mar 2022 13:33:20 GMT"
##   ..$ content-type                 : chr "application/json; charset=utf-8"
##   ..$ transfer-encoding            : chr "chunked"
##   ..$ connection                   : chr "keep-alive"
##   ..$ cache-control                : chr "max-age=60"
##   ..$ x-nyt-most-popular-values    : chr "VIEWED 7"
##   ..$ x-request-id                 : chr "1648387967758583052"
##   ..$ content-encoding             : chr "gzip"
##   ..$ x-cloud-trace-context        : chr "b697d46a308477707ec8df6fba8a9389;o=1"
##   ..$ server                       : chr "Google Frontend"
##   ..$ accept-ranges                : chr "bytes"
##   ..$ via                          : chr "1.1 varnish"
##   ..$ age                          : chr "33"
##   ..$ x-served-by                  : chr "cache-iad-kjyo7100122-IAD"
##   ..$ x-cache                      : chr "HIT"
##   ..$ x-cache-hits                 : chr "1"
##   ..$ x-timer                      : chr "S1648388000.489232,VS0,VE1"
##   ..$ vary                         : chr "Accept-Encoding"
##   ..$ access-control-allow-origin  : chr "*"
##   ..$ access-control-allow-headers : chr "Accept, Content-Type, X-Forwarded-For, X-Prototype-Version, X-Requested-With"
##   ..$ access-control-expose-headers: chr "Content-Length, X-JSON"
##   ..$ access-control-allow-methods : chr "GET, OPTIONS"
##   ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ all_headers:List of 1
##   ..$ :List of 3
##   .. ..$ status : int 200
##   .. ..$ version: chr "HTTP/1.1"
##   .. ..$ headers:List of 22
##   .. .. ..$ date                         : chr "Sun, 27 Mar 2022 13:33:20 GMT"
##   .. .. ..$ content-type                 : chr "application/json; charset=utf-8"
##   .. .. ..$ transfer-encoding            : chr "chunked"
##   .. .. ..$ connection                   : chr "keep-alive"
##   .. .. ..$ cache-control                : chr "max-age=60"
##   .. .. ..$ x-nyt-most-popular-values    : chr "VIEWED 7"
##   .. .. ..$ x-request-id                 : chr "1648387967758583052"
##   .. .. ..$ content-encoding             : chr "gzip"
##   .. .. ..$ x-cloud-trace-context        : chr "b697d46a308477707ec8df6fba8a9389;o=1"
##   .. .. ..$ server                       : chr "Google Frontend"
##   .. .. ..$ accept-ranges                : chr "bytes"
##   .. .. ..$ via                          : chr "1.1 varnish"
##   .. .. ..$ age                          : chr "33"
##   .. .. ..$ x-served-by                  : chr "cache-iad-kjyo7100122-IAD"
##   .. .. ..$ x-cache                      : chr "HIT"
##   .. .. ..$ x-cache-hits                 : chr "1"
##   .. .. ..$ x-timer                      : chr "S1648388000.489232,VS0,VE1"
##   .. .. ..$ vary                         : chr "Accept-Encoding"
##   .. .. ..$ access-control-allow-origin  : chr "*"
##   .. .. ..$ access-control-allow-headers : chr "Accept, Content-Type, X-Forwarded-For, X-Prototype-Version, X-Requested-With"
##   .. .. ..$ access-control-expose-headers: chr "Content-Length, X-JSON"
##   .. .. ..$ access-control-allow-methods : chr "GET, OPTIONS"
##   .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ cookies    :'data.frame': 0 obs. of  7 variables:
##   ..$ domain    : logi(0) 
##   ..$ flag      : logi(0) 
##   ..$ path      : logi(0) 
##   ..$ secure    : logi(0) 
##   ..$ expiration: 'POSIXct' num(0) 
##   ..$ name      : logi(0) 
##   ..$ value     : logi(0) 
##  $ content    : raw [1:37331] 7b 22 73 74 ...
##  $ date       : POSIXct[1:1], format: "2022-03-27 13:33:20"
##  $ times      : Named num [1:6] 0 0.00718 0.03715 0.11006 0.3646 ...
##   ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##  $ request    :List of 7
##   ..$ method    : chr "GET"
##   ..$ url       : chr "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=Aoc02FjJ8N3rShPDaZXD05UAcEFtpQMO"
##   ..$ headers   : Named chr "application/json, text/xml, application/xml, */*"
##   .. ..- attr(*, "names")= chr "Accept"
##   ..$ fields    : NULL
##   ..$ options   :List of 2
##   .. ..$ useragent: chr "libcurl/7.64.1 r-curl/4.3.2 httr/1.4.2"
##   .. ..$ httpget  : logi TRUE
##   ..$ auth_token: NULL
##   ..$ output    : list()
##   .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
##   ..- attr(*, "class")= chr "request"
##  $ handle     :Class 'curl_handle' <externalptr> 
##  - attr(*, "class")= chr "response"
headers(most_popular1)
## $date
## [1] "Sun, 27 Mar 2022 13:33:20 GMT"
## 
## $`content-type`
## [1] "application/json; charset=utf-8"
## 
## $`transfer-encoding`
## [1] "chunked"
## 
## $connection
## [1] "keep-alive"
## 
## $`cache-control`
## [1] "max-age=60"
## 
## $`x-nyt-most-popular-values`
## [1] "VIEWED 7"
## 
## $`x-request-id`
## [1] "1648387967758583052"
## 
## $`content-encoding`
## [1] "gzip"
## 
## $`x-cloud-trace-context`
## [1] "b697d46a308477707ec8df6fba8a9389;o=1"
## 
## $server
## [1] "Google Frontend"
## 
## $`accept-ranges`
## [1] "bytes"
## 
## $via
## [1] "1.1 varnish"
## 
## $age
## [1] "33"
## 
## $`x-served-by`
## [1] "cache-iad-kjyo7100122-IAD"
## 
## $`x-cache`
## [1] "HIT"
## 
## $`x-cache-hits`
## [1] "1"
## 
## $`x-timer`
## [1] "S1648388000.489232,VS0,VE1"
## 
## $vary
## [1] "Accept-Encoding"
## 
## $`access-control-allow-origin`
## [1] "*"
## 
## $`access-control-allow-headers`
## [1] "Accept, Content-Type, X-Forwarded-For, X-Prototype-Version, X-Requested-With"
## 
## $`access-control-expose-headers`
## [1] "Content-Length, X-JSON"
## 
## $`access-control-allow-methods`
## [1] "GET, OPTIONS"
## 
## attr(,"class")
## [1] "insensitive" "list"

Based on the the output of the GET command, all seems fine and we can continue.

Making an API Call to access the 7 most viewed articles

To pull the articles we will use the jsonlite package and the fromJSON function. Also we will use the as.data.frame function to convert the pulled data into a neat dataframe

data_json <- jsonlite::fromJSON(my_times_url, flatten = TRUE)
df_json <- as.data.frame(data_json)

The dataframe is quite large for easy printing. I suggest to use View command within RStudio where you can check the datframe.

For this assignment, let’s list all columns to check all columns as per specs well pulled correctly.

colnames(df_json)
##  [1] "status"                 "copyright"              "num_results"           
##  [4] "results.uri"            "results.url"            "results.id"            
##  [7] "results.asset_id"       "results.source"         "results.published_date"
## [10] "results.updated"        "results.section"        "results.subsection"    
## [13] "results.nytdsection"    "results.adx_keywords"   "results.column"        
## [16] "results.byline"         "results.type"           "results.title"         
## [19] "results.abstract"       "results.des_facet"      "results.org_facet"     
## [22] "results.per_facet"      "results.geo_facet"      "results.media"         
## [25] "results.eta_id"

The colnames call prints all columns name and we can check they all match the specs of the API.

Viewing pulled data

The data as explained before is not for easy printing, but we can check some of the fields to make sure everything is fine.

View Sections for the most viewed articles

df_json %>%
    select(results.section)
##    results.section
## 1         New York
## 2             U.S.
## 3           Health
## 4            World
## 5            World
## 6             U.S.
## 7           Health
## 8           Health
## 9             U.S.
## 10           Style
## 11         Science
## 12           World
## 13            U.S.
## 14        New York
## 15           Style
## 16            Well
## 17      Technology
## 18            Well
## 19           World
## 20        Magazine

View Key Words for the most viewed articles

df_json %>%
    select(results.adx_keywords) %>%
  head()
##                                                                                                                                                                                                                                                                                                                  results.adx_keywords
## 1                                                                                                      Trump Tax Returns;Frauds and Swindling;Tax Evasion;District Attorneys;United States Politics and Government;Pomerantz, Mark F;Dunne, Carey R;Bragg, Alvin;Trump, Donald J;Vance, Cyrus R Jr;Trump Organization;Manhattan (NYC)
## 2 Presidential Election of 2020;United States Politics and Government;News and News Media;Diaries;First Amendment (US Constitution);Biden, Ashley (1981- );O'Keefe, James E III;Biden, Joseph R Jr;Trump, Donald J Jr;Trump, Donald J;Harris, Aimee (1982- );Paoletta, Mark;Kurlander, Robert (1963- );Project Veritas;New York Times
## 3                                                                                                            Grief (Emotion);Mental Health and Disorders;Depression (Mental);Therapy and Rehabilitation;Psychiatry and Psychiatrists;your-feed-healthcare;Prigerson, Holly G;Shear, Katherine;American Psychiatric Assn;United States
## 4                                                                                                                                                                                                  Russian Invasion of Ukraine (2022);Defense and Military Forces;Military Aircraft;North Atlantic Treaty Organization;Ukraine;Russia
## 5                                                                                                                                                                        Russian Invasion of Ukraine (2022);Defense and Military Forces;Deaths (Fatalities);Politics and Government;Putin, Vladimir V;Shoigu, Sergei K;Russia;Ukraine
## 6                                     Presidential Election of 2020;Storming of the US Capitol (Jan, 2021);United States Politics and Government;Text Messaging;Project: Democracy;Voter Fraud (Election Fraud);Right-Wing Extremism and Alt-Right;Conspiracy Theories;Thomas, Virginia Lamp;Meadows, Mark R (1959- );Trump, Donald J

Thank you!