The original purpose of this experiment is to link a 1.487 million record dataset of Cornell’s 2001 - 2018 print monograph acquisitions with corresponding OCLC Worldcat holdings, then group the holdings by language of the material, to measure the extent to which the collection we are building (in each language) is distinctive. This dataset does not imply a position on how distinctiveness is defined, by whom and for whom. It describes scarcity of print holdings and circulation of items in the collection, but does not interpret what the scarcity of a particular set of materials or their use or non-use means from the library or user perspective.
Looking at the initial dataset we raised the following questions and shared with the curators and selectors:
How can we connect readers to these materials? Across all the languages that we acquired during this time period, there are 68355 books that held only by Cornell AND have never circulated. The implications are fascinating. There is a high chance that the only people who have ever read these materials are the authors, publishers, and the few individuals who purchased copies.
Are there stategies used by archivists, to promote archival collections to local, national, and international communities which could be considered for these circulating materials?
Can we imagine a better quantitative definition of distinctive collection that retains the simplicty of cornell_only_per1000?
We have since received additional questions and created more tools, represented below as “Appendix”, to aid different collection decisions. If you have questions, please contact Adam Chandler (alc28@cornell.edu).
language: language of the material as coded in the MARC record
n = number of print monograph titles collected by Cornell in the 2001 - 2018 time period
cornell_only = number of titles collected that are owned only by Cornell
cornell_only_percent = (cornell_only / n) * 100
cornell_only_per1000 = (cornell_only / n) * 1000
percent_has_circulated = percentage of titles that have circulated at least once
mean_ivy = average number of copies across Ivy Plus. For a group of titles we take the total number of Ivy Plus libraries that hold titles in the set (for example, Khmer language books) and divide by the number of titles. An average that is close to 1 means that few other libraries hold the same titles; a high average means many of libraries hold the same titles.
mean_oclc = average number of copies across all OCLC libraries. For a group of titles we take the total number of OCLC WorldCat libraries that hold titles in the set (for example, Khmer language books) and divide by the number of titles. An average that is close to 1 means that few other libraries hold the same titles; a high average means many of libraries hold the same titles.
During these two decades, TABLE 1 shows that Cornell added 1232 Khmer language books to the collection. 817 of these are only held by Cornell, which means for every 1000 books we acquired in this language, 663 are unique to Cornell. Continuing with this example, 11% of them circulated, the average holdings across Ivy Plus libraries is 1.12 and the average number of libraries in Worldcat that hold these titles is 1.78.
How many Khmer books have circulated?
## # A tibble: 2 × 2
## # Groups: has_circulated [2]
## has_circulated n
## <dbl> <int>
## 1 0 1093
## 2 1 140
How many Khmer books that have not circulated are held only by Cornell? 758
An alternative to percentage of titles circulated is charges per number ot titles acquired. A high number indicates a collection area that has high demand, with many titles that circulate often. A low number indicates an area with little patron activity.
This figure is interactive. As you move your cursor across, a pop up will display data for each point. To zoom in, define an area by clicking on the plot and dragging your cursor to the lower right. You should see the plot change. To reset the plot, double click.
The red horizontal line represents median charges per titles acquired. The vertical red line represents the median Worldcat holdings.
This figure is interactive. As you move your cursor across, a pop up will display data for each point. To zoom in, define an area by clicking on the plot and dragging your cursor to the lower right. You should see the plot change. To reset the plot, double click.
The red horizontal line represents median charges per titles acquired. The vertical red line represents the median Worldcat holdings.
Circulation percentages as of mid-2019.
The red horizontal line represents median percent has circulated acquired. The vertical red line represents the median Worldcat holdings.