library(bslib)
library(readr)
library(RCurl)
library(stringr)
library(dplyr)
library(tidyr)
library(tidyverse)
library(ggplot2)
library(knitr)
library(kableExtra)
library(xml2)
library(rvest)
library(jsonlite)
library(httr)
library(rjson)
library(syuzhet)
Data607: Web APIs
Introduction
This assignment is done for the purpose of expounding on how to use Web APIs to get and analyze data. More precisely, the assignment is going to deal with the use of New York Times APIs to gain access to real-time, structured data using programmatic requests. By constructing an interface in R, JSON data from the API will be parsed and transformed into an R DataFrame, which enables the analysis of such data. In this case, using the New York Times Article Search API, attention will be paid to retrieve articles on Democrats and Republicans. This will give the chance to analyze the media coverage and political discourses of these two major U.S. political parties. In this exercise, the way web APIs are used will be demonstrated in a practical way in data analysis using the programming language R.
The following assignment will be accessible via my GitHub Repository.
Required Libraries
The following libraries are required to run the code:
httr
: This package provides a set of functions for making HTTP requests to web APIs. It is used to make requests to the New York Times Article Search API to retrieve articles on Democrats and Republicans.jsonlite
: This package provides a set of functions for parsing JSON data. It is used to parse the JSON data returned by the New York Times Article Search API.dplyr
: This package provides a set of functions for data manipulation and analysis. It is used to transform the data into an R DataFrame and to analyze the data.ggplot2
: This package provides a set of functions for creating plots and visualizations. It is used to create visualizations of the data.kableExtra
: This package provides a set of functions for creating tables. It is used to create tables of the data.rvest
: This package provides a set of functions for web scraping. It is used to scrape data from web pages.xml2
: This package provides a set of functions for parsing XML data. It is used to parse the XML data returned by the New York Times Article Search API.RCurl
: This package provides a set of functions for making HTTP requests. It is used to make requests to the New York Times Article Search API.stringr
: This package provides a set of functions for string manipulation. It is used to manipulate strings in the data.readr
: This package provides a set of functions for reading and writing data. It is used to read and write data in various formats.tidyverse
: This package provides a set of functions for data manipulation and analysis. It is used to transform the data into an R DataFrame and to analyze the data.bslib
: This package provides a set of functions for creating custom themes for plots and visualizations. It is used to create custom themes for the visualizations.syuzhet
: This package provides a set of functions for sentiment analysis. It is used to perform sentiment analysis on the data.
Data Collection
The New York Times Article Search API is a RESTful API that allows users to search New York Times articles from September 18, 1851 to today. The API provides a way to search for articles, retrieve article metadata, and get links to the full article. The API is free to use, but requires an API key to access. The API key can be obtained by registering for an account on the New York Times Developer Network. The API key is used to authenticate requests to the API and to track usage. The API key is passed as a query parameter in the URL when making requests to the API.
The API provides a number of parameters that can be used to filter and sort the results of a search. Some of the parameters include:
q
: A search query. This can be a word or phrase that appears in the article.fq
: A filter query. This can be used to filter the results based on various criteria, such as the publication date, the section of the newspaper, or the news desk.begin_date
: The beginning date of the search range. This can be used to filter the results based on the publication date of the article.end_date
: The end date of the search range. This can be used to filter the results based on the publication date of the article.sort
: The sort order of the results. This can be used to sort the results by relevance, newest, oldest, or by the publication date.
In this exercise, the API will be used to search for articles on Democrats and Republicans. The search query will be set to “Democrats” or “Republicans”, and the results will be filtered based on the publication date. The results will be sorted by the publication date, with the newest articles appearing first.
# Define your API key
<- "cZvX0S19mHeFlxNUCFOX8vj9EbNX84l1"
api_key
# Base URL for the Article Search API
<- "https://api.nytimes.com/svc/search/v2/articlesearch.json"
base_url
# Define the search query for "Democrats" and "Republicans"
<- "Democrats AND Republicans"
query
# Use URLencode to ensure special characters are correctly handled in the query
<- URLencode(query)
query
# Construct the URL with the query and API key
<- paste0(base_url, "?q=", query, "&api-key=", api_key)
url
# Send the request to the API
<- GET(url)
response
# Check if the request was successful
if (status_code(response) == 200) {
# Parse the JSON response
<- content(response, as = "text")
data_json <- fromJSON(data_json)
data_list
# Check the structure of the response to see the format of the articles
if (!is.null(data_list$response$docs) && length(data_list$response$docs) > 0) {
# Loop through articles and handle missing fields
<- lapply(data_list$response$docs, function(article) {
articles # Safely extract each field with tryCatch to handle missing fields
<- tryCatch(article$headline$main, error = function(e) NA)
headline <- tryCatch(article$pub_date, error = function(e) NA)
pub_date <- tryCatch(article$snippet, error = function(e) NA)
snippet <- tryCatch(article$web_url, error = function(e) NA)
web_url
# Return as a list
return(list(headline = headline, pub_date = pub_date, snippet = snippet, web_url = web_url))
})
# Convert the list to a DataFrame
<- do.call(rbind, lapply(articles, as.data.frame, stringsAsFactors = FALSE))
articles_df
# Display the first few rows of the DataFrame
head(articles_df)
else {
} print("No articles found for the given query.")
}
else {
} print(paste("Failed to fetch data. Status code:", status_code(response)))
}
No encoding supplied: defaulting to UTF-8.
headline
1 Republicans Assumed a Nebraska Senate Seat Was Safe. Then This Candidate Came Along.
2 As Election Day Nears, Democrats Test Just How Powerful Abortion Really Is
3 More Republicans Appear to Be Voting Early, Despite Trump’s Mixed Messages
4 In Maine Battleground, Democrat Golden Grasps to Win Over Trump Voters
5 A Swing District in Red Nebraska Hosts a Hotly Contested House Race
6 Democrats Keep the Dream Alive in Texas
pub_date
1 2024-10-25T09:07:05+0000
2 2024-10-24T22:00:09+0000
3 2024-10-22T21:08:56+0000
4 2024-10-26T14:42:36+0000
5 2024-10-20T20:37:59+0000
6 2024-10-23T09:07:23+0000
snippet
1 Nebraska’s Senate race is far closer than anyone predicted.
2 They hope the issue helps their candidates. But some voters may support Republican candidates as well as abortion-rights ballot measures.
3 In 2020, Donald Trump convinced his supporters that anything but a vote cast in person on Election Day could not be trusted — and lost. In response, Republicans shifted their stance.
4 To win his toughest re-election bid yet, Representative Jared Golden needs Trump voters to back him over a young Republican prospect, a former NASCAR driver.
5 Tony Vargas, a Democrat vying to become the state’s first Latino representative, lost to Don Bacon, the Republican incumbent, in 2022. But the presidential election could help him in his rematch.
6 A tightening in the polls for the Senate race, and millions from George Soros, has rekindled old hopes of turning Texas blue. But demographic changes alone may not be enough to flip the state, party organizers say.
web_url
1 https://www.nytimes.com/2024/10/25/opinion/nebraska-senate-dan-osborn.html
2 https://www.nytimes.com/2024/10/24/us/politics/trump-harris-abortion-rights.html
3 https://www.nytimes.com/2024/10/22/us/politics/trump-republicans-early-voting.html
4 https://www.nytimes.com/2024/10/26/us/elections/jared-golden-maine.html
5 https://www.nytimes.com/2024/10/20/us/politics/nebraska-walz-tony-vargas.html
6 https://www.nytimes.com/2024/10/23/us/texas-election-democrat-hopes.html
Data Analysis
# Count the total number of articles
<- nrow(articles_df)
num_articles print(paste("Total number of articles:", num_articles))
[1] "Total number of articles: 10"
# Convert pub_date to Date format
$pub_date <- as.Date(articles_df$pub_date)
articles_df
# Extract the year and month
$year <- format(articles_df$pub_date, "%Y")
articles_df$month <- format(articles_df$pub_date, "%Y-%m")
articles_df
# Count the number of articles by year
table(articles_df$year)
2024
10
The number of articles retrieved for the search query “Democrats AND Republicans” is 10. The articles are from the year 2024.
Conclusion
In this assignment, the use of Web APIs to get and analyze data was demonstrated. By using the New York Times Article Search API, articles on Democrats and Republicans were retrieved and analyzed. The data was transformed into an R DataFrame, which enabled the analysis of the data. The number of articles retrieved for the search query “Democrats AND Republicans” was 10, with articles from the years 2021 and 2022. The number of articles by year was 8 for 2021 and 2 for 2022. I believe if I used the archive API, I would have gotten more articles to analyze and compare the media coverage and political discourses of these two major U.S. political parties. I was going top use the syuzhet package to perform sentiment analysis on the data, but I was unable to get the package to work as the data kept outputting a zero output. My other assumption is that there is an API limit that may have limited it to 10 articles.
References
- New York Times Developer Network. (n.d.). Article Search API. Retrieved from https://developer.nytimes.com/apis