Preparing Data (25 points) Choose one of the New York Times APIs, request API key (1 point) Construct an interface in R to read in the JSON data (14 points) Transform data to an R dataframe (10 points)
Reproducibility (2 points) Using R Markdown text and headers (2 points) Workflow (2 points) Included a brief description of the assigned problem. Included an overview of your approach. Explained your reasoning. Provided a conclusion (including any findings and recommendations). Submission (1 points) Published to rpubs and provided a link in your assignment submission. Published to GitHub and provided a link in your assignment submission
In this assignment we are ask to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R Dataframe.To do this I use NYT developer API link https://developer.nytimes.com/apis and then sign up for my own API key. I selected most popular new York Times article API for for this assignment. Here is MY API Key: xVAwOUxWWObztqjKP4SE1i11AuG1Gb57
library(httr)
library(jsonlite)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
my_url<- paste0("https://api.nytimes.com/svc/mostpopular/v2/shared/1/facebook.json?api-key=xVAwOUxWWObztqjKP4SE1i11AuG1Gb57")
my_raw_result<-httr::GET(my_url)
str(my_raw_result)
## List of 10
## $ url : chr "https://api.nytimes.com/svc/mostpopular/v2/shared/1/facebook.json?api-key=xVAwOUxWWObztqjKP4SE1i11AuG1Gb57"
## $ status_code: int 200
## $ headers :List of 22
## ..$ date : chr "Sat, 02 Apr 2022 15:16:56 GMT"
## ..$ content-type : chr "application/json; charset=utf-8"
## ..$ transfer-encoding : chr "chunked"
## ..$ connection : chr "keep-alive"
## ..$ cache-control : chr "max-age=60"
## ..$ x-nyt-most-popular-values : chr "FACEBOOK 1"
## ..$ x-request-id : chr "1648912615674601221"
## ..$ content-encoding : chr "gzip"
## ..$ x-cloud-trace-context : chr "fd26e1e2d8d585d002bd188051b2e8dc;o=1"
## ..$ server : chr "Google Frontend"
## ..$ accept-ranges : chr "bytes"
## ..$ via : chr "1.1 varnish"
## ..$ age : chr "0"
## ..$ x-served-by : chr "cache-fty21329-FTY"
## ..$ x-cache : chr "MISS"
## ..$ x-cache-hits : chr "0"
## ..$ x-timer : chr "S1648912616.657659,VS0,VE215"
## ..$ vary : chr "Accept-Encoding"
## ..$ access-control-allow-origin : chr "*"
## ..$ access-control-allow-headers : chr "Accept, Content-Type, X-Forwarded-For, X-Prototype-Version, X-Requested-With"
## ..$ access-control-expose-headers: chr "Content-Length, X-JSON"
## ..$ access-control-allow-methods : chr "GET, OPTIONS"
## ..- attr(*, "class")= chr [1:2] "insensitive" "list"
## $ all_headers:List of 1
## ..$ :List of 3
## .. ..$ status : int 200
## .. ..$ version: chr "HTTP/1.1"
## .. ..$ headers:List of 22
## .. .. ..$ date : chr "Sat, 02 Apr 2022 15:16:56 GMT"
## .. .. ..$ content-type : chr "application/json; charset=utf-8"
## .. .. ..$ transfer-encoding : chr "chunked"
## .. .. ..$ connection : chr "keep-alive"
## .. .. ..$ cache-control : chr "max-age=60"
## .. .. ..$ x-nyt-most-popular-values : chr "FACEBOOK 1"
## .. .. ..$ x-request-id : chr "1648912615674601221"
## .. .. ..$ content-encoding : chr "gzip"
## .. .. ..$ x-cloud-trace-context : chr "fd26e1e2d8d585d002bd188051b2e8dc;o=1"
## .. .. ..$ server : chr "Google Frontend"
## .. .. ..$ accept-ranges : chr "bytes"
## .. .. ..$ via : chr "1.1 varnish"
## .. .. ..$ age : chr "0"
## .. .. ..$ x-served-by : chr "cache-fty21329-FTY"
## .. .. ..$ x-cache : chr "MISS"
## .. .. ..$ x-cache-hits : chr "0"
## .. .. ..$ x-timer : chr "S1648912616.657659,VS0,VE215"
## .. .. ..$ vary : chr "Accept-Encoding"
## .. .. ..$ access-control-allow-origin : chr "*"
## .. .. ..$ access-control-allow-headers : chr "Accept, Content-Type, X-Forwarded-For, X-Prototype-Version, X-Requested-With"
## .. .. ..$ access-control-expose-headers: chr "Content-Length, X-JSON"
## .. .. ..$ access-control-allow-methods : chr "GET, OPTIONS"
## .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
## $ cookies :'data.frame': 0 obs. of 7 variables:
## ..$ domain : logi(0)
## ..$ flag : logi(0)
## ..$ path : logi(0)
## ..$ secure : logi(0)
## ..$ expiration: 'POSIXct' num(0)
## ..$ name : logi(0)
## ..$ value : logi(0)
## $ content : raw [1:36776] 7b 22 73 74 ...
## $ date : POSIXct[1:1], format: "2022-04-02 15:16:56"
## $ times : Named num [1:6] 0 0.0389 0.0662 0.1273 0.5744 ...
## ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
## $ request :List of 7
## ..$ method : chr "GET"
## ..$ url : chr "https://api.nytimes.com/svc/mostpopular/v2/shared/1/facebook.json?api-key=xVAwOUxWWObztqjKP4SE1i11AuG1Gb57"
## ..$ headers : Named chr "application/json, text/xml, application/xml, */*"
## .. ..- attr(*, "names")= chr "Accept"
## ..$ fields : NULL
## ..$ options :List of 2
## .. ..$ useragent: chr "libcurl/7.64.1 r-curl/4.3.2 httr/1.4.2"
## .. ..$ httpget : logi TRUE
## ..$ auth_token: NULL
## ..$ output : list()
## .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
## ..- attr(*, "class")= chr "request"
## $ handle :Class 'curl_handle' <externalptr>
## - attr(*, "class")= chr "response"
str(my_raw_result$content)
## raw [1:36776] 7b 22 73 74 ...
my_content<-httr::content(my_raw_result, as= 'text')
str(my_content)
## chr "{\"status\":\"OK\",\"copyright\":\"Copyright (c) 2022 The New York Times Company. All Rights Reserved.\",\"num"| __truncated__
my_content_from_json<-jsonlite::fromJSON(my_content)
dplyr::glimpse(my_content_from_json)
## List of 4
## $ status : chr "OK"
## $ copyright : chr "Copyright (c) 2022 The New York Times Company. All Rights Reserved."
## $ num_results: int 20
## $ results :'data.frame': 20 obs. of 22 variables:
## ..$ uri : chr [1:20] "nyt://article/f48c9735-b5ed-5bfe-9eb3-3af102646af4" "nyt://article/f96527c0-8479-5b72-b48f-ee5e2577a8a1" "nyt://article/1e78a9a9-52c8-5d0f-807d-8f6d3c69b14f" "nyt://article/a7209726-b424-542d-a283-1f659d2002b5" ...
## ..$ url : chr [1:20] "https://www.nytimes.com/2022/03/31/us/census-data-1950.html" "https://www.nytimes.com/2022/03/31/opinion/putin-history-russians.html" "https://www.nytimes.com/2022/04/01/opinion/biden-putin-ukraine-nuclear-weapons.html" "https://www.nytimes.com/2022/03/31/health/insulin-price-house-bill-democrats.html" ...
## ..$ id : num [1:20] 1e+14 1e+14 1e+14 1e+14 1e+14 ...
## ..$ asset_id : num [1:20] 1e+14 1e+14 1e+14 1e+14 1e+14 ...
## ..$ source : chr [1:20] "New York Times" "New York Times" "New York Times" "New York Times" ...
## ..$ published_date: chr [1:20] "2022-03-31" "2022-03-31" "2022-04-01" "2022-03-31" ...
## ..$ updated : chr [1:20] "2022-04-01 11:28:24" "2022-04-01 16:49:40" "2022-04-01 07:33:06" "2022-03-31 23:13:20" ...
## ..$ section : chr [1:20] "U.S." "Opinion" "Opinion" "Health" ...
## ..$ subsection : chr [1:20] "" "" "" "" ...
## ..$ nytdsection : chr [1:20] "u.s." "opinion" "opinion" "health" ...
## ..$ adx_keywords : chr [1:20] "Census;Genealogy;Population;Archives and Records;Computers and the Internet;History (Academic Subject);Nineteen"| __truncated__ "Russian Invasion of Ukraine (2022);Politics and Government;Propaganda;Immigration and Emigration;Putin, Vladimi"| __truncated__ "Russian Invasion of Ukraine (2022);Nuclear Weapons;United States International Relations;Biden, Joseph R Jr;Put"| __truncated__ "Diabetes;Law and Legislation;Drugs (Pharmaceuticals);Insulin;Prices (Fares, Fees and Rates);United States Polit"| __truncated__ ...
## ..$ column : logi [1:20] NA NA NA NA NA NA ...
## ..$ byline : chr [1:20] "By Michael Wines" "By Serge Schmemann" "By Steven Simon and Jonathan Stevenson" "By Margot Sanger-Katz" ...
## ..$ type : chr [1:20] "Article" "Article" "Article" "Article" ...
## ..$ title : chr [1:20] "Seven Decades Later, the 1950 Census Bares Its Secrets" "Putin ‘Just Threw Over the Chess Board,’ and Russians Feel Shame and Dismay" "Why Putin Went Straight for the Nuclear Threat" "House Passes Bill to Limit Cost of Insulin to $35 a Month" ...
## ..$ abstract : chr [1:20] "Federal law kept the answers on millions of census forms secret for 72 years. The forms went online on Friday, "| __truncated__ "Once again, Russia has become a pariah spreading lies and death." "Vladimir Putin made a threat of nuclear weapons. The U.S. and NATO should be less deferential." "The bill stands to benefit millions of Americans with diabetes, but to become law, it will need to attract at l"| __truncated__ ...
## ..$ des_facet :List of 20
## .. ..$ : chr [1:9] "Census" "Genealogy" "Population" "Archives and Records" ...
## .. ..$ : chr [1:4] "Russian Invasion of Ukraine (2022)" "Politics and Government" "Propaganda" "Immigration and Emigration"
## .. ..$ : chr [1:3] "Russian Invasion of Ukraine (2022)" "Nuclear Weapons" "United States International Relations"
## .. ..$ : chr [1:6] "Diabetes" "Law and Legislation" "Drugs (Pharmaceuticals)" "Insulin" ...
## .. ..$ : chr [1:6] "Academy Awards (Oscars)" "Movies" "Television" "Assaults" ...
## .. ..$ : chr [1:5] "Elections, Senate" "Primaries and Caucuses" "Midterm Elections (2022)" "United States Politics and Government" ...
## .. ..$ : chr(0)
## .. ..$ : chr [1:3] "Russian Invasion of Ukraine (2022)" "United States Politics and Government" "Defense and Military Forces"
## .. ..$ : chr [1:5] "Organized Labor" "Labor and Jobs" "Computers and the Internet" "E-Commerce" ...
## .. ..$ : chr [1:8] "Real Estate and Housing (Residential)" "Single Persons" "Dating and Relationships" "Divorce, Separations and Annulments" ...
## .. ..$ : chr [1:3] "Presidents and Presidency (US)" "Photography" "Book Trade and Publishing"
## .. ..$ : chr(0)
## .. ..$ : chr [1:3] "Marijuana" "Law and Legislation" "United States Politics and Government"
## .. ..$ : chr [1:5] "School Discipline (Students)" "Suits and Litigation (Civil)" "Pledge of Allegiance" "Discrimination" ...
## .. ..$ : chr [1:4] "Voting Rights, Registration and Requirements" "State Legislatures" "Citizenship and Naturalization" "United States Politics and Government"
## .. ..$ : chr [1:5] "Comedy and Humor" "Discrimination" "Women and Girls" "Black People" ...
## .. ..$ : chr [1:7] "Voting Rights, Registration and Requirements" "Voting Rights Act (1965)" "Decisions and Verdicts" "United States Politics and Government" ...
## .. ..$ : chr [1:2] "Opera" "Russian Invasion of Ukraine (2022)"
## .. ..$ : chr [1:2] "Demonstrations, Protests and Riots" "Abortion"
## .. ..$ : chr [1:3] "Indigenous People" "Apologies" "Child Abuse and Neglect"
## ..$ org_facet :List of 20
## .. ..$ : chr [1:2] "Census Bureau" "National Archives and Records Administration"
## .. ..$ : chr(0)
## .. ..$ : chr "North Atlantic Treaty Organization"
## .. ..$ : chr [1:3] "Democratic Party" "House of Representatives" "Senate"
## .. ..$ : chr "Academy of Motion Picture Arts and Sciences"
## .. ..$ : chr [1:2] "Senate" "Supreme Court (US)"
## .. ..$ : chr(0)
## .. ..$ : chr "North Atlantic Treaty Organization"
## .. ..$ : chr [1:3] "Amazon Labor Union" "Amazon.com Inc" "Retail, Wholesale and Department Store Union"
## .. ..$ : chr(0)
## .. ..$ : chr(0)
## .. ..$ : chr "Republican Party"
## .. ..$ : chr [1:3] "House of Representatives" "Democratic Party" "Republican Party"
## .. ..$ : chr [1:2] "American Atheists" "Klein Oak High School (Spring, Tex)"
## .. ..$ : chr "Republican Party"
## .. ..$ : chr [1:2] "Senate Committee on the Judiciary" "Supreme Court (US)"
## .. ..$ : chr(0)
## .. ..$ : chr "Novosibirsk Opera and Ballet Theater"
## .. ..$ : chr "Metropolitan Police Department (DC)"
## .. ..$ : chr [1:4] "Assembly of First Nations" "Kamloops Indian Residential School (British Columbia)" "Roman Catholic Church" "Truth and Reconciliation Commission (Canada)"
## ..$ per_facet :List of 20
## .. ..$ : chr(0)
## .. ..$ : chr "Putin, Vladimir V"
## .. ..$ : chr [1:2] "Biden, Joseph R Jr" "Putin, Vladimir V"
## .. ..$ : chr "Biden, Joseph R Jr"
## .. ..$ : chr [1:3] "Rock, Chris" "Smith, Will" "Packer, Will (1974- )"
## .. ..$ : chr [1:4] "Murkowski, Lisa" "Jackson, Ketanji Brown (1970- )" "Tshibaka, Kelly" "Trump, Donald J"
## .. ..$ : chr(0)
## .. ..$ : chr [1:3] "Biden, Joseph R Jr" "Putin, Vladimir V" "Zelensky, Volodymyr"
## .. ..$ : chr "O'Brien, Sean M"
## .. ..$ : chr(0)
## .. ..$ : chr [1:2] "Trump, Donald J" "Craighead, Shealah"
## .. ..$ : chr [1:2] "Palin, Sarah" "Young, Don"
## .. ..$ : chr "Mace, Nancy"
## .. ..$ : chr [1:2] "Oliver, Mari" "Arnold, Benjie"
## .. ..$ : chr "Ducey, Doug (1964- )"
## .. ..$ : chr [1:7] "Smith, Will" "Rock, Chris" "Smith, Jada Pinkett" "Jackson, Ketanji Brown (1970- )" ...
## .. ..$ : chr [1:2] "Walker, Mark E (1967- )" "DeSantis, Ron"
## .. ..$ : chr [1:2] "Netrebko, Anna" "Putin, Vladimir V"
## .. ..$ : chr "Handy, Lauren (1993- )"
## .. ..$ : chr "Benedict XVI"
## ..$ geo_facet :List of 20
## .. ..$ : chr "United States"
## .. ..$ : chr [1:2] "Russia" "Ukraine"
## .. ..$ : chr [1:3] "Ukraine" "Europe" "Russia"
## .. ..$ : chr(0)
## .. ..$ : chr(0)
## .. ..$ : chr "Alaska"
## .. ..$ : chr(0)
## .. ..$ : chr [1:2] "Ukraine" "Russia"
## .. ..$ : chr "Staten Island (NYC)"
## .. ..$ : chr(0)
## .. ..$ : chr(0)
## .. ..$ : chr "Alaska"
## .. ..$ : chr(0)
## .. ..$ : chr [1:2] "Texas" "Spring (Tex)"
## .. ..$ : chr "Arizona"
## .. ..$ : chr(0)
## .. ..$ : chr "Florida"
## .. ..$ : chr(0)
## .. ..$ : chr "Washington (DC)"
## .. ..$ : chr "Canada"
## ..$ media :List of 20
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 0 obs. of 0 variables
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 1 obs. of 6 variables:
## .. ..$ :'data.frame': 0 obs. of 0 variables
## ..$ eta_id : int [1:20] 0 0 0 0 0 0 0 0 0 0 ...
df_json <- as.data.frame(my_content_from_json)
View(df_json)
## I Use colnames function to see all the columns
colnames(df_json)
## [1] "status" "copyright" "num_results"
## [4] "results.uri" "results.url" "results.id"
## [7] "results.asset_id" "results.source" "results.published_date"
## [10] "results.updated" "results.section" "results.subsection"
## [13] "results.nytdsection" "results.adx_keywords" "results.column"
## [16] "results.byline" "results.type" "results.title"
## [19] "results.abstract" "results.des_facet" "results.org_facet"
## [22] "results.per_facet" "results.geo_facet" "results.media"
## [25] "results.eta_id"
df_json %>%
select(results.url)
## results.url
## 1 https://www.nytimes.com/2022/03/31/us/census-data-1950.html
## 2 https://www.nytimes.com/2022/03/31/opinion/putin-history-russians.html
## 3 https://www.nytimes.com/2022/04/01/opinion/biden-putin-ukraine-nuclear-weapons.html
## 4 https://www.nytimes.com/2022/03/31/health/insulin-price-house-bill-democrats.html
## 5 https://www.nytimes.com/2022/04/01/movies/oscars-will-smith-slap.html
## 6 https://www.nytimes.com/2022/03/31/us/politics/lisa-murkowski-ketanji-brown-jackson.html
## 7 https://www.nytimes.com/2022/04/01/business/student-loan-payment-pause-biden.html
## 8 https://www.nytimes.com/2022/04/01/us/politics/us-tanks-ukraine.html
## 9 https://www.nytimes.com/2022/04/01/technology/amazon-union-staten-island.html
## 10 https://www.nytimes.com/2022/04/01/realestate/separated-living-together.html
## 11 https://www.nytimes.com/2022/03/31/us/politics/trump-photographer-shealah-craighead.html
## 12 https://www.nytimes.com/2022/04/01/us/politics/sarah-palin-running-congress-alaska.html
## 13 https://www.nytimes.com/2022/04/01/us/politics/marijuana-legalization.html
## 14 https://www.nytimes.com/2022/03/31/us/texas-pledge-of-allegiance-lawsuit.html
## 15 https://www.nytimes.com/2022/03/31/us/politics/arizona-voting-bill-citizenship.html
## 16 https://www.nytimes.com/2022/03/29/opinion/culture/will-smith-oscars-roxane-gay.html
## 17 https://www.nytimes.com/2022/03/31/us/politics/florida-voting-law.html
## 18 https://www.nytimes.com/2022/04/01/arts/music/anna-netrebko-putin-ukraine-backlash.html
## 19 https://www.nytimes.com/2022/03/31/us/fetus-anti-abortion-home.html
## 20 https://www.nytimes.com/2022/04/01/world/europe/pope-apology-indigenous-people-canada.html
df_json %>%
select(results.published_date)
## results.published_date
## 1 2022-03-31
## 2 2022-03-31
## 3 2022-04-01
## 4 2022-03-31
## 5 2022-04-01
## 6 2022-03-31
## 7 2022-04-01
## 8 2022-04-01
## 9 2022-04-01
## 10 2022-04-01
## 11 2022-03-31
## 12 2022-04-01
## 13 2022-04-01
## 14 2022-03-31
## 15 2022-03-31
## 16 2022-03-29
## 17 2022-03-31
## 18 2022-04-01
## 19 2022-03-31
## 20 2022-04-01
df_json %>%
select(results.section)
## results.section
## 1 U.S.
## 2 Opinion
## 3 Opinion
## 4 Health
## 5 Movies
## 6 U.S.
## 7 Business
## 8 U.S.
## 9 Technology
## 10 Real Estate
## 11 U.S.
## 12 U.S.
## 13 U.S.
## 14 U.S.
## 15 U.S.
## 16 Opinion
## 17 U.S.
## 18 Arts
## 19 U.S.
## 20 World
This is my first time playing with web API and converting into a dataframe in R. I found GET function is a simpler way to load Web API in R.However, I had some challenges with raw data converting into most usable form of R. Then I use httr as = text to convert raw to usable format with the content function. I used jsonlite package for JSON to parse the result. Its main strength is that it implements a bidirectional mapping between JSON data and the most important R data types.