The New York Times offers a standardized way for people to search their database of articles under different criteria. This API allows for user to embedd within their code the proper calls to pull the information they need.
To access their data through their API, the NY Times requires every user to register and request a unique KEY The process also required to enable the KEY for each of the different categories of data the NY Times offers. In my case for this assigment I enabled most popular articles.
Let’s go ahead and initialize our packages and our assigned API Key.
library(httr)
library("jsonlite")
## Warning: package 'jsonlite' was built under R version 4.1.3
library("rjson")
##
## Attaching package: 'rjson'
## The following objects are masked from 'package:jsonlite':
##
## fromJSON, toJSON
library("rvest")
rm(list = ls())
Now let’s initialize our personal key.
# My personal key
my_api_key <- "Aoc02FjJ8N3rShPDaZXD05UAcEFtpQMO"
We decided to use the API for the MOST POPULAR ARTICLES. We will test using the MOST VIEWED ARTICLES. This part of the API gives us the flexibility to access the most viewed articles in the last 1, 7 and 30 days.
Let’s start by accesing the most viewed articles in the last 7 days.
We will contruct the proper URL as per NY Times API specs.
# Can be 1, 7, or 30
my_period <- 7
times_url_p1 <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/"
time_url_p2 <- ".json?api-key="
my_times_url <- paste0(times_url_p1,my_period,time_url_p2,my_api_key)
my_times_url
## [1] "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=Aoc02FjJ8N3rShPDaZXD05UAcEFtpQMO"
We checked the URL and it looks well constructed. So far so good.
We will use the GET command to test there are no issues with our API calls and all is properly set.
most_popular1 <- GET(my_times_url)
str(most_popular1)
## List of 10
## $ url : chr "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=Aoc02FjJ8N3rShPDaZXD05UAcEFtpQMO"
## $ status_code: int 200
## $ headers :List of 22
## ..$ date : chr "Sun, 27 Mar 2022 13:33:20 GMT"
## ..$ content-type : chr "application/json; charset=utf-8"
## ..$ transfer-encoding : chr "chunked"
## ..$ connection : chr "keep-alive"
## ..$ cache-control : chr "max-age=60"
## ..$ x-nyt-most-popular-values : chr "VIEWED 7"
## ..$ x-request-id : chr "1648387967758583052"
## ..$ content-encoding : chr "gzip"
## ..$ x-cloud-trace-context : chr "b697d46a308477707ec8df6fba8a9389;o=1"
## ..$ server : chr "Google Frontend"
## ..$ accept-ranges : chr "bytes"
## ..$ via : chr "1.1 varnish"
## ..$ age : chr "33"
## ..$ x-served-by : chr "cache-iad-kjyo7100122-IAD"
## ..$ x-cache : chr "HIT"
## ..$ x-cache-hits : chr "1"
## ..$ x-timer : chr "S1648388000.489232,VS0,VE1"
## ..$ vary : chr "Accept-Encoding"
## ..$ access-control-allow-origin : chr "*"
## ..$ access-control-allow-headers : chr "Accept, Content-Type, X-Forwarded-For, X-Prototype-Version, X-Requested-With"
## ..$ access-control-expose-headers: chr "Content-Length, X-JSON"
## ..$ access-control-allow-methods : chr "GET, OPTIONS"
## ..- attr(*, "class")= chr [1:2] "insensitive" "list"
## $ all_headers:List of 1
## ..$ :List of 3
## .. ..$ status : int 200
## .. ..$ version: chr "HTTP/1.1"
## .. ..$ headers:List of 22
## .. .. ..$ date : chr "Sun, 27 Mar 2022 13:33:20 GMT"
## .. .. ..$ content-type : chr "application/json; charset=utf-8"
## .. .. ..$ transfer-encoding : chr "chunked"
## .. .. ..$ connection : chr "keep-alive"
## .. .. ..$ cache-control : chr "max-age=60"
## .. .. ..$ x-nyt-most-popular-values : chr "VIEWED 7"
## .. .. ..$ x-request-id : chr "1648387967758583052"
## .. .. ..$ content-encoding : chr "gzip"
## .. .. ..$ x-cloud-trace-context : chr "b697d46a308477707ec8df6fba8a9389;o=1"
## .. .. ..$ server : chr "Google Frontend"
## .. .. ..$ accept-ranges : chr "bytes"
## .. .. ..$ via : chr "1.1 varnish"
## .. .. ..$ age : chr "33"
## .. .. ..$ x-served-by : chr "cache-iad-kjyo7100122-IAD"
## .. .. ..$ x-cache : chr "HIT"
## .. .. ..$ x-cache-hits : chr "1"
## .. .. ..$ x-timer : chr "S1648388000.489232,VS0,VE1"
## .. .. ..$ vary : chr "Accept-Encoding"
## .. .. ..$ access-control-allow-origin : chr "*"
## .. .. ..$ access-control-allow-headers : chr "Accept, Content-Type, X-Forwarded-For, X-Prototype-Version, X-Requested-With"
## .. .. ..$ access-control-expose-headers: chr "Content-Length, X-JSON"
## .. .. ..$ access-control-allow-methods : chr "GET, OPTIONS"
## .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
## $ cookies :'data.frame': 0 obs. of 7 variables:
## ..$ domain : logi(0)
## ..$ flag : logi(0)
## ..$ path : logi(0)
## ..$ secure : logi(0)
## ..$ expiration: 'POSIXct' num(0)
## ..$ name : logi(0)
## ..$ value : logi(0)
## $ content : raw [1:37331] 7b 22 73 74 ...
## $ date : POSIXct[1:1], format: "2022-03-27 13:33:20"
## $ times : Named num [1:6] 0 0.00718 0.03715 0.11006 0.3646 ...
## ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
## $ request :List of 7
## ..$ method : chr "GET"
## ..$ url : chr "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=Aoc02FjJ8N3rShPDaZXD05UAcEFtpQMO"
## ..$ headers : Named chr "application/json, text/xml, application/xml, */*"
## .. ..- attr(*, "names")= chr "Accept"
## ..$ fields : NULL
## ..$ options :List of 2
## .. ..$ useragent: chr "libcurl/7.64.1 r-curl/4.3.2 httr/1.4.2"
## .. ..$ httpget : logi TRUE
## ..$ auth_token: NULL
## ..$ output : list()
## .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
## ..- attr(*, "class")= chr "request"
## $ handle :Class 'curl_handle' <externalptr>
## - attr(*, "class")= chr "response"
headers(most_popular1)
## $date
## [1] "Sun, 27 Mar 2022 13:33:20 GMT"
##
## $`content-type`
## [1] "application/json; charset=utf-8"
##
## $`transfer-encoding`
## [1] "chunked"
##
## $connection
## [1] "keep-alive"
##
## $`cache-control`
## [1] "max-age=60"
##
## $`x-nyt-most-popular-values`
## [1] "VIEWED 7"
##
## $`x-request-id`
## [1] "1648387967758583052"
##
## $`content-encoding`
## [1] "gzip"
##
## $`x-cloud-trace-context`
## [1] "b697d46a308477707ec8df6fba8a9389;o=1"
##
## $server
## [1] "Google Frontend"
##
## $`accept-ranges`
## [1] "bytes"
##
## $via
## [1] "1.1 varnish"
##
## $age
## [1] "33"
##
## $`x-served-by`
## [1] "cache-iad-kjyo7100122-IAD"
##
## $`x-cache`
## [1] "HIT"
##
## $`x-cache-hits`
## [1] "1"
##
## $`x-timer`
## [1] "S1648388000.489232,VS0,VE1"
##
## $vary
## [1] "Accept-Encoding"
##
## $`access-control-allow-origin`
## [1] "*"
##
## $`access-control-allow-headers`
## [1] "Accept, Content-Type, X-Forwarded-For, X-Prototype-Version, X-Requested-With"
##
## $`access-control-expose-headers`
## [1] "Content-Length, X-JSON"
##
## $`access-control-allow-methods`
## [1] "GET, OPTIONS"
##
## attr(,"class")
## [1] "insensitive" "list"
Based on the the output of the GET command, all seems fine and we can continue.
To pull the articles we will use the jsonlite package and the fromJSON function. Also we will use the as.data.frame function to convert the pulled data into a neat dataframe
data_json <- jsonlite::fromJSON(my_times_url, flatten = TRUE)
df_json <- as.data.frame(data_json)
The dataframe is quite large for easy printing. I suggest to use View command within RStudio where you can check the datframe.
For this assignment, let’s list all columns to check all columns as per specs well pulled correctly.
colnames(df_json)
## [1] "status" "copyright" "num_results"
## [4] "results.uri" "results.url" "results.id"
## [7] "results.asset_id" "results.source" "results.published_date"
## [10] "results.updated" "results.section" "results.subsection"
## [13] "results.nytdsection" "results.adx_keywords" "results.column"
## [16] "results.byline" "results.type" "results.title"
## [19] "results.abstract" "results.des_facet" "results.org_facet"
## [22] "results.per_facet" "results.geo_facet" "results.media"
## [25] "results.eta_id"
The colnames call prints all columns name and we can check they all match the specs of the API.
The data as explained before is not for easy printing, but we can check some of the fields to make sure everything is fine.
df_json %>%
select(results.url)
## results.url
## 1 https://www.nytimes.com/2022/03/23/nyregion/trump-investigation-felony-resignation-pomerantz.html
## 2 https://www.nytimes.com/2022/03/20/us/politics/project-veritas-ashley-biden-diary.html
## 3 https://www.nytimes.com/2022/03/18/health/prolonged-grief-disorder.html
## 4 https://www.nytimes.com/2022/03/22/world/europe/ukraine-air-force-russia.html
## 5 https://www.nytimes.com/2022/03/22/world/europe/putin-russia-military-planning.html
## 6 https://www.nytimes.com/2022/03/24/us/politics/ginni-thomas-trump-mark-meadows.html
## 7 https://www.nytimes.com/2022/03/23/health/covid-africa-deaths.html
## 8 https://www.nytimes.com/2022/03/19/health/covid-ba2-surge-variant.html
## 9 https://www.nytimes.com/2022/03/23/us/politics/biden-russia-nuclear-weapons.html
## 10 https://www.nytimes.com/2022/03/24/style/ketanji-brown-jackson-daughter-photo.html
## 11 https://www.nytimes.com/2022/03/21/science/russia-nuclear-ukraine.html
## 12 https://www.nytimes.com/2022/03/24/world/europe/switzerland-montreux-family-balcony-deaths.html
## 13 https://www.nytimes.com/2022/03/23/us/madeleine-albright-dead.html
## 14 https://www.nytimes.com/2022/03/23/nyregion/mark-pomerantz-resignation-letter.html
## 15 https://www.nytimes.com/2021/01/11/style/kamala-harris-vogue.html
## 16 https://www.nytimes.com/2022/03/23/well/how-to-do-squats.html
## 17 https://www.nytimes.com/2022/03/23/technology/russia-american-far-right-ukraine.html
## 18 https://www.nytimes.com/article/4th-covid-shot-2nd-booster.html
## 19 https://www.nytimes.com/2022/03/21/world/asia/flight-path-eastern-airlines.html
## 20 https://www.nytimes.com/2022/03/22/magazine/ethicist-teenage-sex-parenting.html
df_json %>%
select(results.section)
## results.section
## 1 New York
## 2 U.S.
## 3 Health
## 4 World
## 5 World
## 6 U.S.
## 7 Health
## 8 Health
## 9 U.S.
## 10 Style
## 11 Science
## 12 World
## 13 U.S.
## 14 New York
## 15 Style
## 16 Well
## 17 Technology
## 18 Well
## 19 World
## 20 Magazine
df_json %>%
select(results.adx_keywords) %>%
head()
## results.adx_keywords
## 1 Trump Tax Returns;Frauds and Swindling;Tax Evasion;District Attorneys;United States Politics and Government;Pomerantz, Mark F;Dunne, Carey R;Bragg, Alvin;Trump, Donald J;Vance, Cyrus R Jr;Trump Organization;Manhattan (NYC)
## 2 Presidential Election of 2020;United States Politics and Government;News and News Media;Diaries;First Amendment (US Constitution);Biden, Ashley (1981- );O'Keefe, James E III;Biden, Joseph R Jr;Trump, Donald J Jr;Trump, Donald J;Harris, Aimee (1982- );Paoletta, Mark;Kurlander, Robert (1963- );Project Veritas;New York Times
## 3 Grief (Emotion);Mental Health and Disorders;Depression (Mental);Therapy and Rehabilitation;Psychiatry and Psychiatrists;your-feed-healthcare;Prigerson, Holly G;Shear, Katherine;American Psychiatric Assn;United States
## 4 Russian Invasion of Ukraine (2022);Defense and Military Forces;Military Aircraft;North Atlantic Treaty Organization;Ukraine;Russia
## 5 Russian Invasion of Ukraine (2022);Defense and Military Forces;Deaths (Fatalities);Politics and Government;Putin, Vladimir V;Shoigu, Sergei K;Russia;Ukraine
## 6 Presidential Election of 2020;Storming of the US Capitol (Jan, 2021);United States Politics and Government;Text Messaging;Project: Democracy;Voter Fraud (Election Fraud);Right-Wing Extremism and Alt-Right;Conspiracy Theories;Thomas, Virginia Lamp;Meadows, Mark R (1959- );Trump, Donald J