Assignment 6 - Learning a New API

Author

KB

Introduction to APIs and The Chicago Art Institute

An Application Programming Interface (API) is a structured way for software systems to communicate. APIs enable one software application to access data or functionality from another—powering much of the technology we use today, from smartphone apps to digital services.

The Art Institute of Chicago’s API is a public, REST-style service that provides access to the museum’s collection in a structured JSON format. REST (Representational State Transfer) organizes the API around “resources” (e.g., artworks, artists, exhibitions), each accessible through a specific URL. Developers can send a GET request to retrieve information and use it in apps, websites, or data projects.

In this example, we’ll demonstrate how to:

  • Define an API endpoint
  • Create a reusable function to generate a search URL.
  • Use GET() to retrieve data about paintings that include the word “cat” in their title.
  • Extract and compile results into a clean data frame.

First, load the necessary R packages.

library(tidyverse)
library(jsonlite)
library(httr)
library(magrittr)

1. Define the API endpoint

We’ll begin by defining the base URL (endpoint) for the Art Institute’s artwork search.

sc_endpoint <- "https://api.artic.edu/api/v1/artworks"

2. Create a function to build an API URL

This function creates a customized search URL based on the artwork title, page number, and limit of results per page.

# Create a url function
create_art_url <- 
  function(art_title, page, limit) {
    
  # Insert endpoint and define parameters
  art_api_endpoint <- 
  "https://api.artic.edu/api/v1/artworks/search"  
  
  art_title <- paste("?q=", art_title, sep = "")
  
  parameters <- 
    paste("&fields=id,title,artist_title,is_on_view,style_title")
  
  page <- paste("&page=", page, sep = "")
  
  limit <- paste("&limit=", limit, sep = "")
  #Final API Call
  aic_api_url <- 
    paste(art_api_endpoint, art_title, parameters, page, limit, sep = "")
  
  return(aic_api_url)
  }

3. Request Data from the API

With the final URL now available, you can use it to bring the JSON document returned from the API request into R.

Use the HTTP verb “GET” from the HTTR package. Your request for information will be sent to the API, and you will receive a response object.

Together, these lines convert the API’s JSON response into a structured format that you can filter, visualize, or analyze with tidyverse tools or base R functions.

#Function to determine the number of pages to return
request_art_data <- function(art_title = "", page = 1, limit = 10) {
  # Build the full URL
  url <- create_art_url(art_title = art_title, page = page, limit = limit)
  
  # Make the GET request
  response <- GET(url)
  
  # Parse the content
  result <- content(response, as = "text", encoding = "UTF-8")
  result_json <- fromJSON(result, flatten = TRUE)
  
  # Extract the 'data' part of the JSON
  art_data <- result_json$data
  
  return(art_data)
}

# View the API URL for searching "cats":

create_art_url(art_title = "cats", 
               page = 0,
               limit = 100)
[1] "https://api.artic.edu/api/v1/artworks/search?q=cats&fields=id,title,artist_title,is_on_view,style_title&page=0&limit=100"

4. Turn API Data into a Data Frame

Now let’s wrap everything into a function that returns a tidy data frame.

# Retrieve the data frame from the url
request_aic_api_df <- 
  function(art_title, page, limit) {
    aic_api_url <- 
      create_art_url(art_title, page, limit)
  # Use the URL to retrieve the JSON data and create the data frame  
  museum_data <- 
    aic_api_url %>% 
    GET() %>% 
    content(as = "text",
            encoding = "UTF-8") %>% 
    fromJSON() %>% 
    use_series(data) %>% 
    bind_rows()
  
  return(museum_data)
  }

Request and store a sample data frame:

cats_art_df <- 
  request_aic_api_df(art_title = "cats",
                     page = 0,
                     limit = 15)

5. Loop the data in a blank data frame and bind results

This loop collects data from multiple pages and appends the results into a single data frame.

# Attempted loop
artworks_df <- data.frame()
# Harvest the first 3 pages of the data frame
for (i in 1:3) {
  artworks_df <- 
    request_aic_api_df(art_title = "cats",
                       page = i-1,
                       limit = 5) %>% 
    bind_rows(artworks_df)
  
  print(paste("Page", i,"of", 3, "results collected", sep = " "))
  
  if (i<3) {
    Sys.sleep(1)
  }
}
[1] "Page 1 of 3 results collected"
[1] "Page 2 of 3 results collected"
[1] "Page 3 of 3 results collected"

6. View results

head(cats_art_df)
     _score     id                                title is_on_view
1 136.05936    656 Lion (One of a Pair, South Pedestal)       TRUE
2 119.95461 117241                        Girl with Cat       TRUE
3 116.41442  45259                       Nude with Cats      FALSE
4 103.93291  16227                        Cat Making Up      FALSE
5  98.46439  22482                         Homesickness      FALSE
6  93.83458  51719             Winter: Cat on a Cushion      FALSE
                         artist_title                 style_title
1                       Edward Kemeys                        <NA>
2                             Balthus                        <NA>
3                       Pablo Picasso                      Cubism
4                       Inagaki Tomoo Japanese (culture or style)
5                       René Magritte                        <NA>
6 Théophile-Alexandre Pierre Steinlen                        <NA>