To: Xin Li
From: Adam Chandler (chair), Susie Cobb, Maureen Morris, Wendy Wilcox
Date: December 19, 2017
Subject: Tattle-tape Task Force Final Report
Even though members of our task force are not confident that it is actually preventing theft we must recommend continuing tattle-taping because staff, primarily selectors, clearly oppose changing the policy at this time. Feedback from access services staff was split: smaller units felt that tattle-taping was effective at preventing theft while larger units felt that tattle-taping was ineffective in protecting open stack collections. These responses make sense given the different approaches to responding to gate alarms. Before the CUL tattle-taping policy is changed, we recommend these steps:
- Replacement fees should by recycled back into supporting replacement of missing and lost materials.
- Centralize and streamline the decision-making process and funding for replacing missing and lost materials.
- Consider conducting an inventory of the library’s open stacks collections using the methodology (and tools, perhaps) employed in the EAST validation study to use as a baseline to inform present and future decision making on this issue.
“In order to evaluate the statistical likelihood that a retained volume exists on the shelves of any of the institutions, the EAST incorporated sample-based validation studies. The specific goals of this study were to establish and document the degree of confidence, and the possibility of error, in any EAST committed title being available for circulation. Results of the validation sample studies help predict the likelihood that titles selected for retention actually exist and can be located in the collection of a Retention Partner, and are in useable condition.” [https://eastlibraries.org/validation]
EAST Results
Overall, EAST can report a 97% availability rate.
The aggregated results from both cohorts (312,000 holdings across the 52 libraries) showed:
* 97% of monographs in the sample were accounted for: mean: 97%, median: 97.1%, high of 99.8% and low of 91%. (Note: “accounted for” includes those items previously determined to be in circulation based on an automated check of the libraries’ ILS.) * 2.3% of titles were in circulation at the time of the study
* 90% of the titles were deemed to be in average or excellent condition with 10% marked as in poor condition. Not surprisingly, older titles were in poorer condition.
A few notable observations include:
* Items published pre-1900 were in significantly poorer condition; some 45% of these items ranked “poor” on the condition scale
* An item being in poor condition was also somewhat correlated to its subject area
* The most significant factor for an item being missing was the holding library.
Study conducted between April and July. We sampled 6006 monograph across campus. Wendy Wilcox led the team that did the data collection in the stacks. Some notes: Annex was excluded because the stacks there are closed; Fine Arts was excluded because they are in the middle of a building transition.
AF (accounted for) = checkedout + present
Cornell accounted for rate: 96.4%
glimpse(df)
## Observations: 6,006
## Variables: 31
## $ present_or_not <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ bib_rec_nbr <chr> "1968678", "2249095", "5689943", "8618953",...
## $ mfhd_id <chr> "2389846", "2702959", "6199187", "8994997",...
## $ item_control_nbr <chr> "3723592", "4103508", "7620171", "9494855",...
## $ barcode <chr> "31924062968908", "31924072130184", "319241...
## $ begin_pub_date <dbl> 1960, 1993, 1971, 2013, 2010, 1971, 1994, 1...
## $ location_code <fct> afr, afr, afr, afr, afr, afr, afr, afr, afr...
## $ firstletter <fct> D, D, D, D, D, D, D, E, E, E, E, E, E, E, E...
## $ class <chr> "DT", "DT", "DT", "DT", "DT", "DT", "DT", "...
## $ classnumber <dbl> 32.000, 328.000, 356.000, 433.285, 433.545,...
## $ normalized_call_no <chr> "DT 32 R 61", "DT 328 ...
## $ display_call_no <chr> "DT32 .R61", "DT328.M53 .H3613x 1993", "DT3...
## $ call_nbr_norm_item <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ enumeration <chr> NA, NA, NA, NA, NA, NA, NA, "c.2", NA, NA, ...
## $ length_cn <dbl> 9, 22, 14, 19, 22, 20, 18, 11, 12, 18, 18, ...
## $ pagination <chr> "312 p. 22 cm.", "xv, 199 p. : ill. ; 24 cm...
## $ title <chr> "Death of Africa. By Peter Ritner.", "Victi...
## $ recorded_uses_item <dbl> 1, 0, 2, 0, 0, 10, 2, 0, 56, 1, 2, 4, 3, 2,...
## $ worldcat_oclc_nbr <chr> "412793", "59941146", "148569", "869824175"...
## $ catalog_url <chr> "https://newcatalog.library.cornell.edu/cat...
## $ us_holdings <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ row_number <dbl> 1585081, 735421, 2715241, 850681, 84661, 55...
## $ initials <chr> "mah94", "mah94", "mah94", "mah94", "mah94"...
## $ condition <fct> Acceptable, Excellent, Acceptable, Excellen...
## $ barcode_validation <chr> "yes", "yes", "yes", "yes", "yes", "no", "y...
## $ timestamp <dttm> 2018-07-06 13:56:22, 2018-07-06 13:56:22, ...
## $ item_status_desc <chr> "Not Charged", "Not Charged", "Not Charged"...
## $ has_circulated <dbl> 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1...
## $ is_oversize <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ age <dbl> 58, 25, 47, 5, 8, 47, 24, 53, 28, 39, 36, 3...
## $ age_group <fct> 11plus, 11plus, 11plus, 0-10, 0-10, 11plus,...
library(infer)
p_hat <- df %>%
summarise(stat = mean(present_or_not == "1")) %>%
pull()
p_hat
## [1] 0.9642025
replimit = 1000
boot <- df %>%
specify(response = present_or_not, success = "1") %>%
generate(reps = replimit, type = "bootstrap") %>%
calculate(stat = "prop")
boot
## # A tibble: 1,000 x 2
## replicate stat
## <int> <dbl>
## 1 1 0.965
## 2 2 0.965
## 3 3 0.964
## 4 4 0.964
## 5 5 0.963
## 6 6 0.964
## 7 7 0.966
## 8 8 0.962
## 9 9 0.959
## 10 10 0.964
## # ... with 990 more rows
se <- boot %>%
summarize(sd(stat)) %>%
pull()
se
## [1] 0.002301173
“The standard error is the standard deviation of the sampling distribution of the sample mean” [Geoff Cumming, Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis, 2012]
CUL mean and 1000 replication bootstrap confidence interval: M = 0.964, 95% CI [0.96, 0.969] .
| location_code | n |
|---|---|
| olin | 3216 |
| was | 596 |
| ech | 462 |
| law | 450 |
| mann | 280 |
| uris | 270 |
| sasa | 223 |
| ilr | 145 |
| math | 116 |
| mus | 102 |
| afr | 49 |
| jgsm | 41 |
| hote | 24 |
| vet | 22 |
| olin,anx | 5 |
| law,anx | 2 |
| asia | 1 |
| mann,anx | 1 |
| was,anx | 1 |
The sample size for Asia is clearly too small. Let’s remove locations with less that 40 in sample before we start modeling.
| location_code | Excellent | Acceptable | Poor | NA | average_age | average_num_uses |
|---|---|---|---|---|---|---|
| olin | 11 | 81 | 1 | 7 | 33 | 2.25 |
| was | 72 | 21 | 1 | 6 | 24 | 1.23 |
| ech | 61 | 32 | 2 | 6 | 21 | 0.93 |
| law | 56 | 34 | 5 | 6 | 49 | 1.48 |
| mann | 42 | 41 | 1 | 16 | 27 | 5.87 |
| uris | 15 | 75 | NA | 10 | 35 | 4.32 |
| sasa | 35 | 56 | 0 | 8 | 22 | 1.37 |
| ilr | 26 | 60 | 3 | 12 | 44 | 3.45 |
| math | 33 | 52 | 6 | 9 | 37 | 6.69 |
| mus | 47 | 47 | 1 | 5 | 34 | 3.91 |
| afr | 47 | 41 | 6 | 6 | 31 | 3.10 |
| jgsm | 41 | 37 | 10 | 12 | 21 | 13.46 |
mod1 <- glm(present_or_not ~ 1, data=df2, family=binomial)
summary(mod1)
##
## Call:
## glm(formula = present_or_not ~ 1, family = binomial, data = df2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.2687 -0.2687 -0.2687 -0.2687 2.5843
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.3032 0.0701 -47.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1823.6 on 5949 degrees of freedom
## Residual deviance: 1823.6 on 5949 degrees of freedom
## AIC: 1825.6
##
## Number of Fisher Scoring iterations: 6
glance(mod1)
## # A tibble: 1 x 7
## null.deviance df.null logLik AIC BIC deviance df.residual
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1824. 5949 -912. 1826. 1832. 1824. 5949
mod2 <- glm(present_or_not ~ location_code, data=df2, family=binomial)
summary(mod2)
##
## Call:
## glm(formula = present_or_not ~ location_code, family = binomial,
## data = df2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.4334 -0.2819 -0.2273 -0.2273 3.0414
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.6433 0.1119 -32.569 < 2e-16 ***
## location_codewas 0.2299 0.2586 0.889 0.37392
## location_codeech 0.4379 0.2652 1.651 0.09868 .
## location_codelaw 0.6753 0.2456 2.750 0.00596 **
## location_codemann 1.0784 0.2576 4.186 2.84e-05 ***
## location_codeuris 0.7372 0.2964 2.487 0.01287 *
## location_codesasa 0.7764 0.3172 2.448 0.01437 *
## location_codeilr 1.3255 0.3115 4.256 2.08e-05 ***
## location_codemath 0.7346 0.4339 1.693 0.09044 .
## location_codemus -0.9718 1.0111 -0.961 0.33652
## location_codeafr -0.2279 1.0165 -0.224 0.82264
## location_codejgsm 1.1044 0.6101 1.810 0.07025 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1823.6 on 5949 degrees of freedom
## Residual deviance: 1784.7 on 5938 degrees of freedom
## AIC: 1808.7
##
## Number of Fisher Scoring iterations: 7
glance(mod2)
## # A tibble: 1 x 7
## null.deviance df.null logLik AIC BIC deviance df.residual
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1824. 5949 -892. 1809. 1889. 1785. 5938
aug_mod2 <- augment(mod2, type.predict = "response") %>%
filter(location_code == "ilr" | location_code == "mus" | location_code == "mann" | location_code == "olin") %>%
group_by(location_code) %>%
sample_n(3) %>%
select(location_code, present_or_not, .fitted) %>%
arrange(.fitted)
kable(aug_mod2)
| location_code | present_or_not | .fitted |
|---|---|---|
| mus | 1 | 0.0098039 |
| mus | 1 | 0.0098039 |
| mus | 1 | 0.0098039 |
| olin | 1 | 0.0254975 |
| olin | 1 | 0.0254975 |
| olin | 1 | 0.0254975 |
| mann | 1 | 0.0714286 |
| mann | 1 | 0.0714286 |
| mann | 1 | 0.0714286 |
| ilr | 0 | 0.0896552 |
| ilr | 1 | 0.0896552 |
| ilr | 1 | 0.0896552 |
##
## Call:
## glm(formula = present_or_not ~ location_code + length_cn + recorded_uses_item,
## family = binomial, data = df2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3232 -0.2889 -0.2374 -0.2172 2.8437
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.334379 0.262109 -16.537 < 2e-16 ***
## location_codewas 0.162951 0.260983 0.624 0.532382
## location_codeech 0.365750 0.267962 1.365 0.172275
## location_codelaw 0.668289 0.246645 2.710 0.006738 **
## location_codemann 1.027301 0.259795 3.954 7.68e-05 ***
## location_codeuris 0.589947 0.300343 1.964 0.049502 *
## location_codesasa 0.703886 0.319179 2.205 0.027433 *
## location_codeilr 1.340871 0.315856 4.245 2.18e-05 ***
## location_codemath 0.743028 0.438712 1.694 0.090331 .
## location_codemus -0.934293 1.011942 -0.923 0.355868
## location_codeafr -0.216119 1.017210 -0.212 0.831746
## location_codejgsm 0.714289 0.683650 1.045 0.296108
## length_cn 0.036283 0.012946 2.803 0.005068 **
## recorded_uses_item 0.018457 0.005572 3.312 0.000926 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1823.6 on 5949 degrees of freedom
## Residual deviance: 1768.6 on 5936 degrees of freedom
## AIC: 1796.6
##
## Number of Fisher Scoring iterations: 7
## # A tibble: 1 x 7
## null.deviance df.null logLik AIC BIC deviance df.residual
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1824. 5949 -884. 1797. 1890. 1769. 5936
100 items with lowest probablity of being accounted for
head_aug_mod3 <- head(aug_mod3[,1:6],100)
head_aug_mod3 %>%
count(as.integer(present_or_not))
## # A tibble: 2 x 2
## `as.integer(present_or_not)` n
## <int> <int>
## 1 1 90
## 2 2 10
kable( head(head_aug_mod3,15) )
| bib_rec_nbr | present_or_not | location_code | length_cn | recorded_uses_item | .fitted |
|---|---|---|---|---|---|
| 7704606 | 0 | jgsm | 17 | 204 | 0.6817843 |
| 2910259 | 1 | ilr | 17 | 147 | 0.5833415 |
| 6194426 | 1 | law | 16 | 145 | 0.3990865 |
| 7093874 | 1 | jgsm | 18 | 108 | 0.2741588 |
| 787271 | 1 | mann | 10 | 98 | 0.2431575 |
| 2229926 | 1 | law | 18 | 94 | 0.2178868 |
| 7787849 | 0 | math | 14 | 89 | 0.1914365 |
| 1564634 | 1 | uris | 15 | 92 | 0.1821081 |
| 5318335 | 0 | mann | 15 | 68 | 0.1812728 |
| 4498531 | 1 | mann | 49 | 1 | 0.1808253 |
| 5347296 | 1 | mann | 49 | 0 | 0.1781073 |
| 4519494 | 1 | mann | 48 | 1 | 0.1755129 |
| 4070688 | 1 | uris | 15 | 84 | 0.1611377 |
| 2936554 | 0 | mann | 16 | 56 | 0.1553881 |
| 4695650 | 1 | uris | 30 | 52 | 0.1549654 |
100 items with highest probablity of being accounted for
tail_aug_mod3 <- tail(aug_mod3[,1:6],100)
tail_aug_mod3 %>%
count(as.integer(present_or_not))
## # A tibble: 1 x 2
## `as.integer(present_or_not)` n
## <int> <int>
## 1 1 100
kable( tail(tail_aug_mod3,15) )
| bib_rec_nbr | present_or_not | location_code | length_cn | recorded_uses_item | .fitted | |
|---|---|---|---|---|---|---|
| 5936 | 1327723 | 1 | mus | 12 | 0 | 0.0078975 |
| 5937 | 2422716 | 1 | mus | 12 | 0 | 0.0078975 |
| 5938 | 1175404 | 1 | mus | 10 | 3 | 0.0077639 |
| 5939 | 61313 | 1 | mus | 11 | 1 | 0.0077591 |
| 5940 | 450158 | 1 | mus | 11 | 0 | 0.0076182 |
| 5941 | 1762746 | 1 | mus | 11 | 0 | 0.0076182 |
| 5942 | 2412818 | 1 | mus | 11 | 0 | 0.0076182 |
| 5943 | 175007 | 1 | mus | 11 | 0 | 0.0076182 |
| 5944 | 2422304 | 1 | mus | 10 | 1 | 0.0074847 |
| 5945 | 812844 | 1 | mus | 10 | 1 | 0.0074847 |
| 5946 | 749588 | 1 | mus | 10 | 0 | 0.0073488 |
| 5947 | 2028297 | 1 | mus | 10 | 0 | 0.0073488 |
| 5948 | 2420529 | 1 | mus | 10 | 0 | 0.0073488 |
| 5949 | 2099009 | 1 | mus | 10 | 0 | 0.0073488 |
| 5950 | 940063 | 1 | mus | 9 | 0 | 0.0070888 |
At Cornell, we had an experiment ready to be conducted because there is one unit that does not use security stripping or gates, Law. Our intuition might tell us that the Law AF rate should therefore be lower than the other units. That is not the case. The Law mean AF (accounted for rate) in this sample is right in the middle of pack, with confidence intervals overlapping other units that have both higher and lower AF rates. We conclude that the effect size of having a security system is zero.
Sometime after completing it’s validation study, EAST conducted a survey of participating libraries to find out about the theft deterrence practices. 32 libraries responded. The library names are anonymized.
| library | tattletape_yes_no | validation_score |
|---|---|---|
| anteater | Yes | 0.984 |
| armadillo | No | 0.948 |
| axolotl | No | 0.984 |
| buffalo | Yes | 0.935 |
| camel | Yes | 0.976 |
| chameleon | Yes | 0.951 |
| cheetah | No | 0.990 |
| chipmunk | Yes | 0.953 |
| chupacabra | No | 0.973 |
| crow | Yes | 0.975 |
| dolphin | Yes | 0.989 |
| giraffe | Yes | 0.994 |
| grizzly | Yes | 0.990 |
| hedgehog | No | 0.953 |
| hippo | Yes | 0.963 |
| ifrit | Yes | 0.978 |
| iguana | No | 0.956 |
| jackal | Yes | 0.997 |
| koala | No | 0.983 |
| lemur | Yes | 0.916 |
| leopard | Yes | 0.992 |
| liger | Yes | 0.988 |
| llama | Yes | 0.982 |
| manatee | No | 0.984 |
| monkey | Yes | 0.953 |
| narwhal | No | 0.970 |
| nyan cat | Yes | 0.994 |
| otter | Yes | 0.990 |
| panda | Yes | 0.968 |
| quagga | No | 0.982 |
| squirrel | Yes | 0.995 |
| wombat | Yes | 0.967 |
## # A tibble: 2 x 2
## tattletape_yes_no mean
## <fct> <dbl>
## 1 No 0.972
## 2 Yes 0.974
## # A tibble: 2 x 5
## af se total status n
## <dbl> <dbl> <int> <chr> <chr>
## 1 0.972 0.00448 1000 no 10
## 2 0.974 0.00444 1000 yes 22
In this experiment we divided the EAST libraries into two groups, the 22 libraries in the survey with security systems vs. the 10 libraries with no security systems, and generated accounted for (AF) rates and standard errors using bootstrap simulation for each group. The difference in AF rates can be explained by random noise, as we see from the overlapping 95% confidence intervals. We conclude again that the effect size of having a security system is zero.
Chandler, Adam. “Cornell Validation Study 2018 Initial Findings,” September 12, 2018. http://rpubs.com/acct4rpubs/418599.