Art Institute of Chicago API Instructions and Call Example
Step 1: Learning the API
To start this assignmnet, I discovered the Chicago Art Instititute’s free API online. The API itself has thousands of data fields relating individual art pieces and collections that have been through the school. This API has three possible endpoints you can utilize to collect your query;
Listing (e.g /artworks)
Detail (e.g. /artworks /{id})
Search (e.g. /artworks/search)
Each endpoint or recourse contains a title and id field.
For my call example and guide I decided to use the Listing recourse which has the widest range of possible fields to pull from.
Some field examples that I did not use in my call example include..
api_modelstring - REST API resource type or endpoint
api_linkstring - REST API link for this resource
is_boostedboolean - Whether this document should be boosted in search
alt_titlesarray - Alternate names for this work
thumbnailarray - Metadata about the image referenced by image_id. Currently, all thumbnails are IIIF images. You must build your own image URLs using IIIF Image API conventions. See our API documentation for more details.
main_reference_numberstring - Unique identifier assigned to the artwork upon acquisition
boost_ranknumber - Manual indication of what rank this artwork should take in search results. Noncontiguous.
date_startnumber - The year of the period of time associated with the creation of this work
date_endnumber - The year of the period of time associated with the creation of this work
date_displaystring - Readable, free-text description of the period of time associated with the creation of this work. This might include date terms like Dynasty, Era etc. Written by curators and editors in house style, and is the preferred field for display on websites and apps.
date_qualifier_titlestring - Readable, text qualifer to the dates provided for this record.
date_qualifier_idinteger - Unique identifier of the qualifer to the dates provided for this record.
artist_displaystring - Readable description of the creator of this work. Includes artist names, nationality and lifespan dates
place_of_originstring - The location where the creation, design, or production of the work took place, or the original location of the work
descriptionstring - Longer explanation describing the work
short_descriptionstring - Short explanation describing the work
dimensionsstring - The size, shape, scale, and dimensions of the work. May include multiple dimensions like overall, frame, or dimension for each section of a work. Free-form text formatted in a house style.
dimensions_detailobject - The height, width, depth, and/or diameter of each section of the work in centimeters
medium_displaystring - The substances or materials used in the creation of a work
inscriptionsstring - A description of distinguishing or identifying physical markings that are on the work
credit_linestring - Brief statement indicating how the work came into the collection
catalogue_displaystring - Brief text listing all the catalogues raisonnés which include this work. This isn’t an exhaustive list of publications where the work has been mentioned. For that, see publication_history.
publication_historystring - Bibliographic list of all the places this work has been published
exhibition_historystring - List of all the places this work has been exhibited
provenance_textstring - Ownership/collecting history of the work. May include names of owners, dates, and possibly methods of transfer of ownership. Free-form text formatted in a house style.
editiontext - Edition number if the work is one of many
publishing_verification_levelstring - Indicator of how much metadata on the work in published. Web Basic is the least amount, Web Everything is the greatest.
internal_department_idnumber - An internal department id we use for analytics. Does not correspond to departments on the website.
fiscal_year_deaccessionnumber - The fiscal year in which the work was deaccessioned.
is_public_domainboolean - Whether the work is in the public domain, meaning it was created before copyrights existed or has left the copyright term
is_zoomableboolean - Whether images of the work are allowed to be displayed in a zoomable interface.
max_zoom_window_sizenumber - The maximum size of the window the image is allowed to be viewed in, in pixels.
copyright_noticestring - Statement notifying how the work is protected by copyright. Applies to the work itself, not image or other related assets.
has_multimedia_resourcesboolean - Whether this artwork has any associated microsites, digital publications, or documents tagged as multimedia
has_educational_resourcesboolean - Whether this artwork has any documents tagged as educational
has_advanced_imagingboolean - Whether this artwork is enhanced with 3D models, 360 image sequences, Mirador views, etc.
colorfulnessfloat - Unbounded positive float representing an abstract measure of colorfulness.
colorobject - Dominant color of this artwork in HSL
latitudenumber - Latitude coordinate of the location of this work in our galleries
longitudenumber - Longitude coordinate of the location of this work in our galleries
latlonstring - Latitude and longitude coordinates of the location of this work in our galleries
on_loan_displaystring - If an artwork is on loan, this contains details about the loan
gallery_titlestring - The location of this work in our museum
gallery_i
Step 2. Creating an initial URL
First, I wanted to see if I would be successful in using this API to get data into JSOn after creating a URL.
I started by creating an R script and loading in appropriate packages. Then I added objects for the endpoint and list of fields that I wanted. I made these into a URL then utilized the GET() function to retrieve the data which I converted to a text from the JSON format. Finally, i was able to view the single page data frame using use_series(data).
For my loop, I first created an empty data frame for each page to fit nicely into.
I used the i variable for the computer to recognize I wanted multiple pages.
A status message was also inserted to show how many pages of the data had been collected
(DO NOT RUN: copied from original R script)
library(jsonlite) # Converting json data into data frames
library(magrittr) # Extracting items from list objects using piping grammar
library(httr) # Interacting with HTTP verbs
### Initial URL build
inst_endpoint <-
"https://api.artic.edu/api/v1/artworks"
fields <- paste("id",
"title",
"artist_display",
"date_display",
"fiscal_year",
"has_not_been_viewed_much",
"is_on_view",
"page",
sep = ","
)
inst_api_url <- paste0(inst_endpoint, "?fields=", fields, sep = "")
inst_api_url
inst_api_url_response <- GET(inst_api_url)
inst_api_url_data <-
inst_api_url_response %>%
content(as = "text",
encoding = "UTF-8") %>% # UTF-8 is the <near> universal standard
fromJSON()
inst_api_url_data %>%
use_series(data) %>%
view()
# create empty data frame
inst_api_df <- data.frame()
# Loop through the first 7 pages
for (i in 1:7) {
page_data <- inst_api_GET_df(page = i)
inst_api_df <- bind_rows(inst_api_df, page_data)
# Status message
message("Page ", i, " of 7 collected.")
# Pause between requests to avoid rate limiting
if (i < 7) Sys.sleep(3)
}
This was good practice to see that I could actually use the API, but I wanted to gather at least 7 pages of artwork data that I could use to analyze so I decided to make a function and loop to grab the rest.
Step 3. Putting the API call into a function and looping through 7 pages
I only wanted the API function to collect different pages, I wanted my fields to stay the same. There were also no specific filters I wanted to add to the fields so the only function parameter I added was page. I also only wanted the data portion of the request to be collected, so in my return I added data$data.
(DO NOT RUN: copied from original R script)
inst_api_GET_df <- function(page) {
# Define endpoint and fields
inst_endpoint <- "https://api.artic.edu/api/v1/artworks"
fields <- paste(
"id",
"title",
"artist_display",
"date_display",
"fiscal_year",
"has_not_been_viewed_much",
"is_on_view",
sep = ","
)
# Build full URL for given page
inst_api_GET <- paste0(inst_endpoint, "?fields=", fields, "&page=", page)
# Make GET request
response <- GET(inst_api_GET)
# Convert JSON to data frame
data <- content(response, as = "text", encoding = "UTF-8") %>%
fromJSON(flatten = TRUE)
# Return just the artwork data portion
return(data$data)
}
Step 4. Creating a csv file from the API Call and adding it to qmd/rmd. file
After I successfully pulled 7 pages of data, I turned the R data frame into a csv file that I could automatically load into a Quarto document like this one. I wanted to show an example of using this data link with a simple histogram visualization
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Loading in my data, bro.art_inst_df <-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/mccarthyc9_xavier_edu/EefsWa9rpGNJqX5jOyI2ufYB_b8kZ8SbsWJCJvaGKyYv6g?download=1")
Rows: 84 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): title, date_display, artist_display
dbl (2): id, fiscal_year
lgl (2): has_not_been_viewed_much, is_on_view
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
art_inst_df %>%ggplot(aes(x =nchar(title))) +geom_histogram(binwidth =5, fill ="steelblue", color ="white") +labs(title ="Distribution of Artwork Title Lengths",x ="Number of Characters in Title",y ="Count" )