Art Institute of Chicago API instructions

Art Institute of Chicago API Instructions and Call Example

Step 1: Learning the API

To start this assignmnet, I discovered the Chicago Art Instititute’s free API online. The API itself has thousands of data fields relating individual art pieces and collections that have been through the school. This API has three possible endpoints you can utilize to collect your query;

  • Listing (e.g /artworks)

  • Detail (e.g. /artworks /{id})

  • Search (e.g. /artworks/search)

Each endpoint or recourse contains a title and id field.

For my call example and guide I decided to use the Listing recourse which has the widest range of possible fields to pull from.

Some field examples that I did not use in my call example include..

  • api_model string - REST API resource type or endpoint

  • api_link string - REST API link for this resource

  • is_boosted boolean - Whether this document should be boosted in search

  • alt_titles array - Alternate names for this work

  • thumbnail array - Metadata about the image referenced by image_id. Currently, all thumbnails are IIIF images. You must build your own image URLs using IIIF Image API conventions. See our API documentation for more details.

  • main_reference_number string - Unique identifier assigned to the artwork upon acquisition

  • boost_rank number - Manual indication of what rank this artwork should take in search results. Noncontiguous.

  • date_start number - The year of the period of time associated with the creation of this work

  • date_end number - The year of the period of time associated with the creation of this work

  • date_display string - Readable, free-text description of the period of time associated with the creation of this work. This might include date terms like Dynasty, Era etc. Written by curators and editors in house style, and is the preferred field for display on websites and apps.

  • date_qualifier_title string - Readable, text qualifer to the dates provided for this record.

  • date_qualifier_id integer - Unique identifier of the qualifer to the dates provided for this record.

  • artist_display string - Readable description of the creator of this work. Includes artist names, nationality and lifespan dates

  • place_of_origin string - The location where the creation, design, or production of the work took place, or the original location of the work

  • description string - Longer explanation describing the work

  • short_description string - Short explanation describing the work

  • dimensions string - The size, shape, scale, and dimensions of the work. May include multiple dimensions like overall, frame, or dimension for each section of a work. Free-form text formatted in a house style.

  • dimensions_detail object - The height, width, depth, and/or diameter of each section of the work in centimeters

  • medium_display string - The substances or materials used in the creation of a work

  • inscriptions string - A description of distinguishing or identifying physical markings that are on the work

  • credit_line string - Brief statement indicating how the work came into the collection

  • catalogue_display string - Brief text listing all the catalogues raisonnés which include this work. This isn’t an exhaustive list of publications where the work has been mentioned. For that, see publication_history.

  • publication_history string - Bibliographic list of all the places this work has been published

  • exhibition_history string - List of all the places this work has been exhibited

  • provenance_text string - Ownership/collecting history of the work. May include names of owners, dates, and possibly methods of transfer of ownership. Free-form text formatted in a house style.

  • edition text - Edition number if the work is one of many

  • publishing_verification_level string - Indicator of how much metadata on the work in published. Web Basic is the least amount, Web Everything is the greatest.

  • internal_department_id number - An internal department id we use for analytics. Does not correspond to departments on the website.

  • fiscal_year_deaccession number - The fiscal year in which the work was deaccessioned.

  • is_public_domain boolean - Whether the work is in the public domain, meaning it was created before copyrights existed or has left the copyright term

  • is_zoomable boolean - Whether images of the work are allowed to be displayed in a zoomable interface.

  • max_zoom_window_size number - The maximum size of the window the image is allowed to be viewed in, in pixels.

  • copyright_notice string - Statement notifying how the work is protected by copyright. Applies to the work itself, not image or other related assets.

  • has_multimedia_resources boolean - Whether this artwork has any associated microsites, digital publications, or documents tagged as multimedia

  • has_educational_resources boolean - Whether this artwork has any documents tagged as educational

  • has_advanced_imaging boolean - Whether this artwork is enhanced with 3D models, 360 image sequences, Mirador views, etc.

  • colorfulness float - Unbounded positive float representing an abstract measure of colorfulness.

  • color object - Dominant color of this artwork in HSL

  • latitude number - Latitude coordinate of the location of this work in our galleries

  • longitude number - Longitude coordinate of the location of this work in our galleries

  • latlon string - Latitude and longitude coordinates of the location of this work in our galleries

  • on_loan_display string - If an artwork is on loan, this contains details about the loan

  • gallery_title string - The location of this work in our museum

  • gallery_i

Step 2. Creating an initial URL

First, I wanted to see if I would be successful in using this API to get data into JSOn after creating a URL.

I started by creating an R script and loading in appropriate packages. Then I added objects for the endpoint and list of fields that I wanted. I made these into a URL then utilized the GET() function to retrieve the data which I converted to a text from the JSON format. Finally, i was able to view the single page data frame using use_series(data).

For my loop, I first created an empty data frame for each page to fit nicely into.

I used the i variable for the computer to recognize I wanted multiple pages.

A status message was also inserted to show how many pages of the data had been collected

(DO NOT RUN: copied from original R script)

library(jsonlite)  # Converting json data into data frames
library(magrittr)  # Extracting items from list objects using piping grammar
library(httr)      # Interacting with HTTP verbs

### Initial URL build 

inst_endpoint <-
  "https://api.artic.edu/api/v1/artworks"

fields <- paste("id",
           "title",
           "artist_display",
           "date_display",
           "fiscal_year",
           "has_not_been_viewed_much",
           "is_on_view",
           "page",
           sep = ","
           )


inst_api_url <- paste0(inst_endpoint, "?fields=", fields, sep = "")

inst_api_url 

inst_api_url_response <- GET(inst_api_url)

inst_api_url_data <- 
 inst_api_url_response %>% 
  content(as = "text",
          encoding = "UTF-8") %>% # UTF-8 is the <near> universal standard
  fromJSON()

inst_api_url_data %>%
  use_series(data) %>%
  view()





# create empty data frame
inst_api_df <- data.frame()

# Loop through the first 7 pages
for (i in 1:7) {
  page_data <- inst_api_GET_df(page = i)
  
  inst_api_df <- bind_rows(inst_api_df, page_data)
  
  # Status message
  message("Page ", i, " of 7 collected.")
  
  # Pause between requests to avoid rate limiting
  if (i < 7) Sys.sleep(3)
}

This was good practice to see that I could actually use the API, but I wanted to gather at least 7 pages of artwork data that I could use to analyze so I decided to make a function and loop to grab the rest.

Step 3. Putting the API call into a function and looping through 7 pages

I only wanted the API function to collect different pages, I wanted my fields to stay the same. There were also no specific filters I wanted to add to the fields so the only function parameter I added was page. I also only wanted the data portion of the request to be collected, so in my return I added data$data.

(DO NOT RUN: copied from original R script)



inst_api_GET_df <- function(page) {
  # Define endpoint and fields
  inst_endpoint <- "https://api.artic.edu/api/v1/artworks"
  
  fields <- paste(
    "id",
    "title",
    "artist_display",
    "date_display",
    "fiscal_year",
    "has_not_been_viewed_much",
    "is_on_view",
    sep = ","
  )
  
  # Build full URL for given page
  inst_api_GET <- paste0(inst_endpoint, "?fields=", fields, "&page=", page)
  
  # Make GET request
  response <- GET(inst_api_GET)
  
  # Convert JSON to data frame
  data <- content(response, as = "text", encoding = "UTF-8") %>%
    fromJSON(flatten = TRUE)
  
  # Return just the artwork data portion
  return(data$data)
}

Step 4. Creating a csv file from the API Call and adding it to qmd/rmd. file

After I successfully pulled 7 pages of data, I turned the R data frame into a csv file that I could automatically load into a Quarto document like this one. I wanted to show an example of using this data link with a simple histogram visualization

#| echo: false
#| message: false
#| warning: false

# libraries, yo
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Loading in my data, bro.

art_inst_df <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/mccarthyc9_xavier_edu/EefsWa9rpGNJqX5jOyI2ufYB_b8kZ8SbsWJCJvaGKyYv6g?download=1")
Rows: 84 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): title, date_display, artist_display
dbl (2): id, fiscal_year
lgl (2): has_not_been_viewed_much, is_on_view

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
art_inst_df %>%
  ggplot(aes(x = nchar(title))) +
  geom_histogram(binwidth = 5, fill = "steelblue", color = "white") +
  labs(
    title = "Distribution of Artwork Title Lengths",
    x = "Number of Characters in Title",
    y = "Count"
  )