Cornell Print Monograph Acquisitions 2001 - 2018

The original purpose of this experiment is to link a 1.487 million record dataset of Cornell’s 2001 - 2018 print monograph acquisitions with corresponding OCLC Worldcat holdings, then group the holdings by language of the material, to measure the extent to which the collection we are building (in each language) is distinctive. This dataset does not imply a position on how distinctiveness is defined, by whom and for whom. It describes scarcity of print holdings and circulation of items in the collection, but does not interpret what the scarcity of a particular set of materials or their use or non-use means from the library or user perspective.

Looking at the initial dataset we raised the following questions and shared with the curators and selectors:

  1. How can we connect readers to these materials? Across all the languages that we acquired during this time period, there are 68355 books that held only by Cornell AND have never circulated. The implications are fascinating. There is a high chance that the only people who have ever read these materials are the authors, publishers, and the few individuals who purchased copies.

  2. Are there stategies used by archivists, to promote archival collections to local, national, and international communities which could be considered for these circulating materials?

  3. Can we imagine a better quantitative definition of distinctive collection that retains the simplicty of cornell_only_per1000?

We have since received additional questions and created more tools, represented below as “Appendix”, to aid different collection decisions. If you have questions, please contact Adam Chandler (alc28@cornell.edu).

Variables

language: language of the material as coded in the MARC record

n = number of print monograph titles collected by Cornell in the 2001 - 2018 time period

cornell_only = number of titles collected that are owned only by Cornell

cornell_only_percent = (cornell_only / n) * 100

cornell_only_per1000 = (cornell_only / n) * 1000

percent_has_circulated = percentage of titles that have circulated at least once

mean_ivy = average number of copies across Ivy Plus. For a group of titles we take the total number of Ivy Plus libraries that hold titles in the set (for example, Khmer language books) and divide by the number of titles. An average that is close to 1 means that few other libraries hold the same titles; a high average means many of libraries hold the same titles.

mean_oclc = average number of copies across all OCLC libraries. For a group of titles we take the total number of OCLC WorldCat libraries that hold titles in the set (for example, Khmer language books) and divide by the number of titles. An average that is close to 1 means that few other libraries hold the same titles; a high average means many of libraries hold the same titles.



Case study: Khmer

During these two decades, TABLE 1 shows that Cornell added 1232 Khmer language books to the collection. 817 of these are only held by Cornell, which means for every 1000 books we acquired in this language, 663 are unique to Cornell. Continuing with this example, 11% of them circulated, the average holdings across Ivy Plus libraries is 1.12 and the average number of libraries in Worldcat that hold these titles is 1.78.

How many Khmer books have circulated?

## # A tibble: 2 × 2
## # Groups:   has_circulated [2]
##   has_circulated     n
##            <dbl> <int>
## 1              0  1093
## 2              1   140

How many Khmer books that have not circulated are held only by Cornell? 758





Appendix 1: Language code and LC class


Appendix 2: Language and LC class circulation by borrower category




Appendix 3: Charges (or checkouts) per number of titles acquired

An alternative to percentage of titles circulated is charges per number ot titles acquired. A high number indicates a collection area that has high demand, with many titles that circulate often. A low number indicates an area with little patron activity.



Figure: VERSION 1: Demand and availability by language and lcclass

This figure is interactive. As you move your cursor across, a pop up will display data for each point. To zoom in, define an area by clicking on the plot and dragging your cursor to the lower right. You should see the plot change. To reset the plot, double click.

The red horizontal line represents median charges per titles acquired. The vertical red line represents the median Worldcat holdings.



Figure: VERSION 2: Demand and availability by language and lcclass, with size of label determined by number of books acquired

This figure is interactive. As you move your cursor across, a pop up will display data for each point. To zoom in, define an area by clicking on the plot and dragging your cursor to the lower right. You should see the plot change. To reset the plot, double click.

The red horizontal line represents median charges per titles acquired. The vertical red line represents the median Worldcat holdings.




Appendix 4: Circulation patterns and Worldcat holdings for monographs acquired in CY 2017-2018

Circulation percentages as of mid-2019.



The red horizontal line represents median percent has circulated acquired. The vertical red line represents the median Worldcat holdings.




BY language and LC class, for sets with more than 30 items acquired






Table: Circulation of recent acquisitions by affiliate type (Cornell, BD, ILL), for sets with more than 30 items acquired