Brief Description
This assignment uses the New York Times web API to obtain information about top science stories from NYT.
Approach
To access the NYT web API, a developer account is needed. So, I created a developer account on the NYT website after which I requested an API key for their top science stories. To use the API to access the data, I used the GET() function from the httrpackage to make a GET request to the NYT server. After a successful GET request with a status code of 200 which signifies that the request was successful, I parsed the content of the JSON response, used the fromJSON() from jsonlite package to read the JSON file, flattened the output and then converted the results to dataframe.
Load the required libraries
library(tidyverse)
library(rvest)
library(httr)
library(jsonlite)
library(rlist)
Make API call
nyt_top_stories <- GET("https://api.nytimes.com/svc/topstories/v2/science.json?api-key=zYpgBL2MhyOOlSeLaDAKfQJQo2BeQZlz")
Inspect the status of the API requests
# Check the status of the api call
nyt_top_stories
## Response [https://api.nytimes.com/svc/topstories/v2/science.json?api-key=zYpgBL2MhyOOlSeLaDAKfQJQo2BeQZlz]
## Date: 2021-10-25 00:38
## Status: 200
## Content-Type: application/json
## Size: 67.8 kB
Since the status code is 200, it means it is OK.
# Parse the content of the response from the API call
nyt_parse <- content(nyt_top_stories, "parse")
nyt_parse_results <- nyt_parse$results
nyt_top <- fromJSON(rawToChar(nyt_top_stories$content))
#check the names of the ny_top
names(nyt_top)
## [1] "status" "copyright" "section" "last_updated" "num_results"
## [6] "results"
# Top stories data frame
nyt_top_flat <- flatten(nyt_top$results)
nyt_top_flat <- as.data.frame(nyt_top_flat)
nyt_top_flat
Some columns came in as a list and some as a dataframe.
#Take a look at the multimedia dataframe which is a column in the
as.data.frame(nyt_top_flat$multimedia)
# Check the elements of des_facet column that came in as a list
nyt_top_flat$des_facet
## [[1]]
## [1] "Women and Girls" "Space and Astronomy"
## [3] "Politics and Government" "Content Type: Personal Profile"
## [5] "Discrimination"
##
## [[2]]
## [1] "Coronavirus Delta Variant" "Disease Rates"
##
## [[3]]
## [1] "Clinical Trials"
## [2] "Coronavirus (2019-nCoV)"
## [3] "Vaccination and Immunization"
## [4] "Children and Childhood"
## [5] "Coronavirus Risks and Safety Concerns"
##
## [[4]]
## [1] "Global Warming" "Greenhouse Gas Emissions"
## [3] "El Nino Southern Oscillation" "Weather"
## [5] "Hurricanes and Tropical Storms"
##
## [[5]]
## [1] "Global Warming"
## [2] "Greenhouse Gas Emissions"
## [3] "Alternative and Renewable Energy"
## [4] "Regulation and Deregulation of Industry"
## [5] "American Jobs Plan (2021)"
## [6] "United States Politics and Government"
##
## [[6]]
## [1] "Vaccination and Immunization"
## [2] "Laboratories and Scientific Equipment"
## [3] "Coronavirus (2019-nCoV)"
## [4] "Drugs (Pharmaceuticals)"
## [5] "RNA (Ribonucleic Acid)"
## [6] "Biotechnology and Bioengineering"
## [7] "Factories and Manufacturing"
## [8] "Third World and Developing Countries"
## [9] "Intellectual Property"
## [10] "Public-Private Sector Cooperation"
## [11] "International Trade and World Market"
##
## [[7]]
## [1] "Data-Mining and Database Marketing" "Quantum Computing"
## [3] "Artificial Intelligence" "Espionage and Intelligence Services"
## [5] "Genetics and Heredity"
##
## [[8]]
## [1] "your-feed-science"
## [2] "Vaccination and Immunization"
## [3] "Coronavirus (2019-nCoV)"
## [4] "Immune System"
## [5] "Disease Rates"
## [6] "United States Politics and Government"
##
## [[9]]
## [1] "Bats"
## [2] "Coronavirus (2019-nCoV)"
## [3] "Research"
## [4] "United States Politics and Government"
##
## [[10]]
## [1] "Global Warming" "Greenhouse Gas Emissions"
## [3] "El Nino Southern Oscillation" "Drought"
##
## [[11]]
## [1] "Coronavirus (2019-nCoV)" "Vaccination and Immunization"
## [3] "internal-open-access" "Children and Childhood"
## [5] "Parenting"
##
## [[12]]
## [1] "Global Warming"
## [2] "Greenhouse Gas Emissions"
## [3] "Migrants (Environmental)"
## [4] "Disasters and Emergencies"
## [5] "National Intelligence Estimates"
## [6] "United States Defense and Military Forces"
## [7] "United States Politics and Government"
## [8] "International Relations"
##
## [[13]]
## [1] "Dinosaurs" "Paleontology" "Research"
## [4] "your-feed-science" "your-feed-animals"
##
## [[14]]
## [1] "Satellites"
## [2] "Missiles and Missile Defense Systems"
## [3] "Space and Astronomy"
##
## [[15]]
## [1] "Vaccination and Immunization"
## [2] "Coronavirus (2019-nCoV)"
## [3] "Drugs (Pharmaceuticals)"
## [4] "United States Politics and Government"
##
## [[16]]
## [1] "Sex Crimes" "Pain" "Opioids and Opiates"
## [4] "your-feed-science" "your-feed-health"
##
## [[17]]
## [1] "Sharks" "Animal Behavior"
## [3] "Fish and Other Marine Life" "Surfing"
## [5] "Oceans and Seas" "Endangered and Extinct Species"
## [7] "Drones (Pilotless Planes)" "Seals (Animals) and Sealing"
## [9] "Summer (Season)" "First Aid"
## [11] "Maritime Accidents and Safety" "Swimming"
## [13] "Beaches" "Fear (Emotion)"
##
## [[18]]
## [1] "Horses" "Genetics and Heredity"
## [3] "Archaeology and Anthropology" "Research"
## [5] "your-feed-science" "your-feed-animals"
##
## [[19]]
## [1] "Meteors and Meteorites" "Comets" "Space and Astronomy"
## [4] "Content Type: Service" "Moon"
##
## [[20]]
## [1] "Colleges and Universities" "Science and Technology"
## [3] "Freedom of Speech and Expression" "Academic Freedom"
## [5] "Black People" "Race and Ethnicity"
## [7] "Discrimination" "Affirmative Action"
## [9] "Admissions Standards"
##
## [[21]]
## [1] "Global Warming"
## [2] "Greenhouse Gas Emissions"
## [3] "Oil (Petroleum) and Gasoline"
## [4] "Natural Gas"
## [5] "Coal"
## [6] "Production"
## [7] "Mines and Mining"
## [8] "Alternative and Renewable Energy"
## [9] "United Nations Framework Convention on Climate Change"
##
## [[22]]
## [1] "PFAS (Per- and Polyfluoroalkyl Substances)"
## [2] "Chemicals"
## [3] "Hazardous and Toxic Substances"
## [4] "Water Pollution"
## [5] "Air Pollution"
## [6] "Suits and Litigation (Civil)"
## [7] "Factories and Manufacturing"
## [8] "Conflicts of Interest"
## [9] "Cancer"
##
## [[23]]
## [1] "Molnupiravir (Drug)"
## [2] "Coronavirus (2019-nCoV)"
## [3] "International Trade and World Market"
## [4] "Drugs (Pharmaceuticals)"
##
## [[24]]
## [1] "Nursing Homes"
## [2] "Vaccination and Immunization"
## [3] "Coronavirus (2019-nCoV)"
## [4] "Disease Rates"
## [5] "Regulation and Deregulation of Industry"
## [6] "United States Politics and Government"
## [7] "your-feed-healthcare"
# Check the elements of the org_facet column that also came in as a list
nyt_top_flat$org_facet
## [[1]]
## [1] "China Aerospace"
##
## [[2]]
## [1] "Centers for Disease Control and Prevention"
##
## [[3]]
## [1] "Pfizer Inc" "BioNTech SE"
## [3] "Food and Drug Administration"
##
## [[4]]
## character(0)
##
## [[5]]
## [1] "Senate"
##
## [[6]]
## [1] "Serum Institute of India" "BioNTech SE"
## [3] "Pfizer Inc" "Moderna Inc"
##
## [[7]]
## [1] "National Counterintelligence and Security Center"
##
## [[8]]
## [1] "Advisory Committee on Immunization Practices"
## [2] "Centers for Disease Control and Prevention"
## [3] "Johnson & Johnson"
## [4] "Moderna Inc"
## [5] "BioNTech SE"
## [6] "Pfizer Inc"
##
## [[9]]
## [1] "EcoHealth Alliance" "National Institutes of Health"
## [3] "Wuhan Institute of Virology (China)"
##
## [[10]]
## [1] "National Oceanic and Atmospheric Administration"
##
## [[11]]
## character(0)
##
## [[12]]
## [1] "National Security Council"
## [2] "Homeland Security Department"
## [3] "Office of the Director of National Intelligence"
##
## [[13]]
## [1] "Historical Biology (Journal)"
##
## [[14]]
## [1] "Space Exploration Technologies Corp"
##
## [[15]]
## [1] "Food and Drug Administration" "Johnson & Johnson"
## [3] "Moderna Inc" "Pfizer Inc"
## [5] "BioNTech SE"
##
## [[16]]
## [1] "Justice Department" "Beth Israel Medical Center"
## [3] "Capital Health Hospitals" "Drexel University"
##
## [[17]]
## [1] "Ocearch" "Center for Coastal Studies"
##
## [[18]]
## [1] "Nature (Journal)"
##
## [[19]]
## character(0)
##
## [[20]]
## [1] "Massachusetts Institute of Technology"
##
## [[21]]
## [1] "United Nations"
##
## [[22]]
## [1] "Chemours Company" "DuPont Co"
## [3] "3M Company" "Environmental Protection Agency"
##
## [[23]]
## [1] "Gates, Bill and Melinda, Foundation" "Merck & Company Inc"
##
## [[24]]
## [1] "Centers for Medicare and Medicaid Services"
Conclusion
The response from NYT server was a nested JSON file. When converted to dataframe, I found out that some columns contained lists while another column contained a dataframe. So a dataframe is a column in another dataframe. To access the content of that dataframe, I obtained the column containing a dataframe and displayed its results.
In conclusion, when a highly nested JSON file is read into a dataframe, some columns of the dataframe may be a list while some may be a dataframe. JSON files are very much effective in storing large amounts of unstructured data. No wonder most if not all responses of API calls in most websites returns a JSON file.