Acknowledgements

2017 CUL Tattle-tape task force recommendations

To: Xin Li
From: Adam Chandler (chair), Susie Cobb, Maureen Morris, Wendy Wilcox
Date: December 19, 2017
Subject: Tattle-tape Task Force Final Report

Even though members of our task force are not confident that it is actually preventing theft we must recommend continuing tattle-taping because staff, primarily selectors, clearly oppose changing the policy at this time. Feedback from access services staff was split: smaller units felt that tattle-taping was effective at preventing theft while larger units felt that tattle-taping was ineffective in protecting open stack collections. These responses make sense given the different approaches to responding to gate alarms. Before the CUL tattle-taping policy is changed, we recommend these steps:

  • Replacement fees should by recycled back into supporting replacement of missing and lost materials.
  • Centralize and streamline the decision-making process and funding for replacing missing and lost materials.
  • Consider conducting an inventory of the library’s open stacks collections using the methodology (and tools, perhaps) employed in the EAST validation study to use as a baseline to inform present and future decision making on this issue.

What is the EAST validation study?

“In order to evaluate the statistical likelihood that a retained volume exists on the shelves of any of the institutions, the EAST incorporated sample-based validation studies. The specific goals of this study were to establish and document the degree of confidence, and the possibility of error, in any EAST committed title being available for circulation. Results of the validation sample studies help predict the likelihood that titles selected for retention actually exist and can be located in the collection of a Retention Partner, and are in useable condition.” [https://eastlibraries.org/validation]

EAST Results

Overall, EAST can report a 97% availability rate.
The aggregated results from both cohorts (312,000 holdings across the 52 libraries) showed:
* 97% of monographs in the sample were accounted for: mean: 97%, median: 97.1%, high of 99.8% and low of 91%. (Note: “accounted for” includes those items previously determined to be in circulation based on an automated check of the libraries’ ILS.) * 2.3% of titles were in circulation at the time of the study
* 90% of the titles were deemed to be in average or excellent condition with 10% marked as in poor condition. Not surprisingly, older titles were in poorer condition.

A few notable observations include:
* Items published pre-1900 were in significantly poorer condition; some 45% of these items ranked “poor” on the condition scale
* An item being in poor condition was also somewhat correlated to its subject area
* The most significant factor for an item being missing was the holding library.

Cornell Validation Study 2018 Results

Study conducted between April and July. We sampled 6006 monograph across campus. Wendy Wilcox led the team that did the data collection in the stacks. Some notes: Annex was excluded because the stacks there are closed; Fine Arts was excluded because they are in the middle of a building transition.

AF (accounted for) = checkedout + present

Cornell accounted for rate: 96.4%

Our dataset

glimpse(df)
## Observations: 5,975
## Variables: 34
## $ present_or_not     <fct> Present, Present, Present, Present, Present...
## $ bib_rec_nbr        <chr> "1968678", "2249095", "5689943", "8618953",...
## $ mfhd_id            <chr> "2389846", "2702959", "6199187", "8994997",...
## $ item_control_nbr   <chr> "3723592", "4103508", "7620171", "9494855",...
## $ barcode            <chr> "31924062968908", "31924072130184", "319241...
## $ begin_pub_date     <dbl> 1960, 1993, 1971, 2013, 2010, 1971, 1994, 1...
## $ location_code      <fct> afr, afr, afr, afr, afr, afr, afr, afr, afr...
## $ firstletter        <fct> D, D, D, D, D, D, D, E, E, E, E, E, E, E, E...
## $ class              <chr> "DT", "DT", "DT", "DT", "DT", "DT", "DT", "...
## $ classnumber        <dbl> 32.000, 328.000, 356.000, 433.285, 433.545,...
## $ normalized_call_no <chr> "DT   32            R 61", "DT  328        ...
## $ display_call_no    <chr> "DT32 .R61", "DT328.M53 .H3613x 1993", "DT3...
## $ call_nbr_norm_item <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ enumeration        <chr> NA, NA, NA, NA, NA, NA, NA, "c.2", NA, NA, ...
## $ length_cn          <dbl> 9, 22, 14, 19, 22, 20, 18, 11, 12, 18, 18, ...
## $ pagination         <chr> "312 p. 22 cm.", "xv, 199 p. : ill. ; 24 cm...
## $ title              <chr> "Death of Africa. By Peter Ritner.", "Victi...
## $ recorded_uses_item <dbl> 1, 0, 2, 0, 0, 10, 2, 0, 56, 1, 2, 4, 3, 2,...
## $ worldcat_oclc_nbr  <chr> "412793", "59941146", "148569", "869824175"...
## $ catalog_url        <chr> "https://newcatalog.library.cornell.edu/cat...
## $ us_holdings        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ row_number         <dbl> 1585081, 735421, 2715241, 850681, 84661, 55...
## $ initials           <chr> "mah94", "mah94", "mah94", "mah94", "mah94"...
## $ condition          <fct> Acceptable, Excellent, Acceptable, Excellen...
## $ barcode_validation <chr> "yes", "yes", "yes", "yes", "yes", "no", "y...
## $ timestamp          <dttm> 2018-07-06 13:56:22, 2018-07-06 13:56:22, ...
## $ item_status_desc   <chr> "Not Charged", "Not Charged", "Not Charged"...
## $ is_missing         <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ has_circulated     <dbl> 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1...
## $ is_oversize        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ age                <dbl> 58, 25, 47, 5, 8, 47, 24, 53, 28, 39, 36, 3...
## $ age_group          <fct> 11plus, 11plus, 11plus, 0-10, 0-10, 11plus,...
## $ callnum            <chr> "dt32 .r61", "dt328.m53 .h3613x 1993", "dt3...
## $ num_cn_chars       <int> 3, 5, 3, 3, 4, 4, 4, 3, 2, 2, 3, 3, 3, 2, 3...

Bootstrap simulation to derive a standard error

p_hat <- df %>%
  summarise(stat = mean(is_missing == "1")) %>%
  pull()
p_hat
## [1] 0.03531381
replimit = 1000

boot <- df %>%
  specify(response = is_missing, success = "1") %>%
  generate(reps = replimit, type = "bootstrap") %>%
  calculate(stat = "prop")
boot
## # A tibble: 1,000 x 2
##    replicate   stat
##        <int>  <dbl>
##  1         1 0.0345
##  2         2 0.0341
##  3         3 0.0330
##  4         4 0.0377
##  5         5 0.0393
##  6         6 0.0380
##  7         7 0.0341
##  8         8 0.0378
##  9         9 0.0346
## 10        10 0.0333
## # ... with 990 more rows
se <- boot %>%
  summarize(sd(stat)) %>%
  pull()
se
## [1] 0.002373401

“The standard error is the standard deviation of the sampling distribution of the sample mean.” [Geoff Cumming, Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis, 2012]

CUL mean and 1000 replication bootstrap confidence interval: M = 0.035, 95% CI [0.031, 0.04] .

How many monographs do we estimate are missing across the whole collection?

nrow(population_to_draw_from) [1] 3079136

nrow(population_to_draw_from) * .0357 = 109,925

Therefore, our best estimate of how many total unaccounted for items there may be across these CUL units is 109,925 +/- 524 .

Sample size, UF rates, and condition across combined CUL locations

location_code total ave_age ave_num_uses percent_excellent percent_acceptable percent_poor percent_na
olin 3216 33 2.25 11 81 1 7
asia 1282 22 1.15 62 31 1 6
law 450 49 1.48 56 34 5 6
mann 280 27 5.87 42 41 1 16
uris 270 35 4.32 15 75 NA 10
hlm 210 38 5.90 29 54 6 11
math 116 37 6.69 33 52 6 9
mus 102 34 3.91 47 47 1 5
afr 49 31 3.10 47 41 6 6

Logistic regression models

Model 1: all the possible explanatory variables

mod1 <- glm(is_missing ~ location_code + firstletter + recorded_uses_item + is_oversize + age + length_cn + num_cn_chars, data=df2, family=binomial)
tidy_mod1 <- tidy(mod1) %>%
    arrange(p.value)
kable(tidy_mod1)
term estimate std.error statistic p.value
recorded_uses_item 0.0182676 0.0055599 3.2855707 0.0010178
location_codemann 1.0551836 0.3601727 2.9296600 0.0033933
location_codehlm 0.9843067 0.3420155 2.8779593 0.0040026
location_codelaw 1.0918400 0.5047513 2.1631245 0.0305316
location_codeasia 0.3533658 0.1971118 1.7927177 0.0730181
location_codemath 1.1232229 0.6563570 1.7112987 0.0870260
location_codemus -2.0436455 1.4227482 -1.4364070 0.1508866
length_cn 0.0202631 0.0184296 1.0994828 0.2715575
num_cn_chars 0.0583434 0.0703174 0.8297150 0.4066999
age 0.0008798 0.0011542 0.7622741 0.4458964
location_codeuris 0.3319163 0.4519798 0.7343609 0.4627288
is_oversize 0.2037515 0.2877821 0.7080063 0.4789413
location_codeafr -0.4105901 1.0265235 -0.3999812 0.6891704
(Intercept) -18.4297290 620.8715397 -0.0296836 0.9763194
firstletterM 15.2509858 620.8722880 0.0245638 0.9804029
firstletterL 15.0742805 620.8715578 0.0242792 0.9806299
firstletterE 14.9261936 620.8715952 0.0240407 0.9808201
firstletterZ 14.8995164 620.8716369 0.0239977 0.9808544
firstletterR 14.7993926 620.8716143 0.0238365 0.9809830
firstletterN 14.7158794 620.8715956 0.0237020 0.9810903
firstletterT 14.5164988 620.8715821 0.0233808 0.9813465
firstletterJ 14.2335562 620.8715645 0.0229251 0.9817100
firstletterC 14.2139690 620.8718576 0.0228936 0.9817352
firstletterF 14.1767694 620.8716603 0.0228337 0.9817830
firstletterP 14.1664464 620.8714571 0.0228170 0.9817962
firstletterH 14.1176910 620.8714729 0.0227385 0.9818589
firstletterG 13.9564668 620.8716530 0.0224788 0.9820660
firstletterD 13.8165996 620.8714845 0.0222536 0.9822457
firstletterB 13.7878154 620.8715151 0.0222072 0.9822827
firstletterQ 13.7203047 620.8716235 0.0220985 0.9823694
firstletterK 13.5972138 620.8716554 0.0219002 0.9825276
firstletterS 12.8060778 620.8723236 0.0206259 0.9835440
firstletterV -0.1306172 1453.9564338 -0.0000898 0.9999283
firstletterU -0.0602283 918.6032630 -0.0000656 0.9999477

Model 2: we can start making pretty good predictions

## 
## Call:
## glm(formula = is_missing ~ location_code + length_cn + recorded_uses_item, 
##     family = binomial, data = df2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1570  -0.2845  -0.2398  -0.2174   2.8426  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -4.328734   0.261644 -16.544  < 2e-16 ***
## location_codeasia   0.347490   0.187052   1.858 0.063208 .  
## location_codelaw    0.668909   0.246522   2.713 0.006660 ** 
## location_codemann   1.031294   0.259681   3.971 7.15e-05 ***
## location_codeuris   0.593217   0.300234   1.976 0.048172 *  
## location_codehlm    1.081866   0.293362   3.688 0.000226 ***
## location_codemath   0.748466   0.438482   1.707 0.087832 .  
## location_codemus   -0.932390   1.011915  -0.921 0.356836    
## location_codeafr   -0.214804   1.017155  -0.211 0.832746    
## length_cn           0.036105   0.012926   2.793 0.005218 ** 
## recorded_uses_item  0.017584   0.005506   3.194 0.001405 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1825.4  on 5974  degrees of freedom
## Residual deviance: 1775.5  on 5964  degrees of freedom
## AIC: 1797.5
## 
## Number of Fisher Scoring iterations: 7

Items with highest probablity of being unaccounted for

## # A tibble: 2 x 2
##   `as.integer(is_missing)`     n
##                      <int> <int>
## 1                        1    91
## 2                        2     9
bib_rec_nbr is_missing location_code length_cn recorded_uses_item .fitted
7704606 1 hlm 17 204 0.7219358
2910259 0 hlm 17 147 0.4879488
6194426 0 law 16 145 0.3699469
7093874 0 hlm 18 108 0.3322796
787271 0 mann 10 98 0.2291452
2229926 0 law 18 94 0.2047236
7787849 1 math 14 89 0.1809669
4498531 0 mann 49 1 0.1808308
5347296 0 mann 49 0 0.1782406
4519494 0 mann 48 1 0.1755441
5318335 1 mann 15 68 0.1736257
1564634 0 uris 15 92 0.1713339
4070688 0 uris 15 84 0.1522739
2936554 1 mann 16 56 0.1499426
4695650 0 uris 30 52 0.1495671

Items with highest probablity of being accounted for

## # A tibble: 1 x 2
##   `as.integer(is_missing)`     n
##                      <int> <int>
## 1                        1   100
bib_rec_nbr is_missing location_code length_cn recorded_uses_item .fitted
5961 2402245 0 mus 11 2 0.0079326
5962 1788258 0 mus 9 6 0.0079179
5963 61313 0 mus 11 1 0.0077955
5964 1175404 0 mus 10 3 0.0077882
5965 450158 0 mus 11 0 0.0076606
5966 1762746 0 mus 11 0 0.0076606
5967 2412818 0 mus 11 0 0.0076606
5968 175007 0 mus 11 0 0.0076606
5969 2422304 0 mus 10 1 0.0075211
5970 812844 0 mus 10 1 0.0075211
5971 749588 0 mus 10 0 0.0073910
5972 2028297 0 mus 10 0 0.0073910
5973 2420529 0 mus 10 0 0.0073910
5974 2099009 0 mus 10 0 0.0073910
5975 940063 0 mus 9 0 0.0071308

Model 3: More parsing of call number.

In this model, we first try to remove words from call numbers then count the number of letters. The thinking here is this mights capture some of the complexity of more complicated call numbers. Not successful.This version does not help - still not a significant predictor. The simple call numberlength variable is more predictive.

df2 %>%
  select(display_call_no, length_cn, num_cn_chars) %>%
  sample_n(5)
## # A tibble: 5 x 3
##   display_call_no       length_cn num_cn_chars
##   <chr>                     <dbl>        <int>
## 1 E207.G81 G792 1871           18            3
## 2 PN1991.77.W3 W37 2013        21            4
## 3 PR6056.F54 W55x 1997         20            5
## 4 HD9000.5 .I582 1998          19            3
## 5 DS121.3 .R57 1992z           18            4
df2 %>%
  select(display_call_no, length_cn, num_cn_chars) %>%
  arrange(desc(num_cn_chars)) %>%
  top_n(5)
## Selecting by num_cn_chars
## # A tibble: 51 x 3
##    display_call_no                           length_cn num_cn_chars
##    <chr>                                         <dbl>        <int>
##  1 Oversize JN5208 .A16 ser.2 div.1 sect.2 +        41           13
##  2 PL5093.C5 B66 v.14,no.460,etc.                   30           10
##  3 HD4813 .I781 3d sess.no.1                        25           10
##  4 Trials KD370.N8 L38 1932                         24           10
##  5 KF26 .A3 90th Apoll                              19           10
##  6 KF26 .A35 92nd Tobac                             20           10
##  7 KF26 .A6 92nd Agric                              19           10
##  8 KF26 .C6 93rd Unive                              19           10
##  9 KF26 .C6 96th Nomin                              19           10
## 10 KF26 .E57 95th North                             20           10
## # ... with 41 more rows
mod3 <- glm(is_missing ~ location_code + recorded_uses_item  + num_cn_chars, data=df2, family=binomial)
summary(mod3)
## 
## Call:
## glm(formula = is_missing ~ location_code + recorded_uses_item + 
##     num_cn_chars, family = binomial, data = df2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1231  -0.2849  -0.2322  -0.2241   2.8518  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.889483   0.264240 -14.720  < 2e-16 ***
## location_codeasia   0.432762   0.184028   2.352 0.018693 *  
## location_codelaw    0.644010   0.250550   2.570 0.010159 *  
## location_codemann   1.012787   0.259366   3.905 9.43e-05 ***
## location_codeuris   0.715768   0.298746   2.396 0.016579 *  
## location_codehlm    1.037640   0.291873   3.555 0.000378 ***
## location_codemath   0.653674   0.436420   1.498 0.134182    
## location_codemus   -1.006419   1.011378  -0.995 0.319689    
## location_codeafr   -0.239331   1.017033  -0.235 0.813959    
## recorded_uses_item  0.017013   0.005454   3.119 0.001812 ** 
## num_cn_chars        0.055468   0.064544   0.859 0.390133    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1825.4  on 5974  degrees of freedom
## Residual deviance: 1782.1  on 5964  degrees of freedom
## AIC: 1804.1
## 
## Number of Fisher Scoring iterations: 7

Can we learn anything new if we restrict the dataset to specific locations?

# olin
df_olin <- df2 %>%
  filter(location_code == "olin")
mod_olin <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_olin, family=binomial)
summary(mod_olin)
## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_olin)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.4011  -0.2362  -0.2214  -0.2082   2.8683  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -4.463519   0.414020 -10.781   <2e-16 ***
## recorded_uses_item  0.007055   0.020287   0.348   0.7280    
## length_cn           0.044925   0.021268   2.112   0.0347 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 763.64  on 3215  degrees of freedom
## Residual deviance: 759.41  on 3213  degrees of freedom
## AIC: 765.41
## 
## Number of Fisher Scoring iterations: 6
# asia
df_asia <- df2 %>%
  filter(location_code == "asia")
mod_asia <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_asia, family=binomial)
summary(mod_asia)
## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_asia)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5334  -0.2822  -0.2719  -0.2667   2.6158  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.50735    0.57707  -6.078 1.22e-09 ***
## recorded_uses_item  0.04780    0.03733   1.280    0.200    
## length_cn           0.01086    0.02731   0.398    0.691    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 416.01  on 1281  degrees of freedom
## Residual deviance: 414.58  on 1279  degrees of freedom
## AIC: 420.58
## 
## Number of Fisher Scoring iterations: 6
# law
df_law <- df2 %>%
  filter(location_code == "law")
mod_law <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_law, family=binomial)
summary(mod_law)
## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_law)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5050  -0.3166  -0.3149  -0.3133   2.4740  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.046662   0.706540  -4.312 1.62e-05 ***
## recorded_uses_item  0.006834   0.018371   0.372     0.71    
## length_cn           0.003802   0.038075   0.100     0.92    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 175.71  on 449  degrees of freedom
## Residual deviance: 175.59  on 447  degrees of freedom
## AIC: 181.59
## 
## Number of Fisher Scoring iterations: 5
# mann
df_mann <- df2 %>%
  filter(location_code == "mann")
mod_mann <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_mann, family=binomial)
summary(mod_mann)
## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_mann)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3692  -0.3807  -0.3494  -0.3304   2.5107  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.49074    0.78252  -4.461 8.16e-06 ***
## recorded_uses_item  0.03621    0.01567   2.311   0.0209 *  
## length_cn           0.03827    0.04073   0.940   0.3474    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 144.10  on 279  degrees of freedom
## Residual deviance: 139.08  on 277  degrees of freedom
## AIC: 145.08
## 
## Number of Fisher Scoring iterations: 5
# uris
df_uris <- df2 %>%
  filter(location_code == "uris")
mod_uris <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_uris, family=binomial)
summary(mod_uris)
## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_uris)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8050  -0.3635  -0.2641  -0.2152   2.7233  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -5.33176    1.07742  -4.949 7.47e-07 ***
## recorded_uses_item  0.02336    0.01999   1.168   0.2427    
## length_cn           0.10522    0.04204   2.503   0.0123 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 110.12  on 269  degrees of freedom
## Residual deviance: 103.18  on 267  degrees of freedom
## AIC: 109.18
## 
## Number of Fisher Scoring iterations: 6
# hlm
df_hlm <- df2 %>%
  filter(location_code == "hlm")
mod_hlm <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_hlm, family=binomial)
summary(mod_hlm)
## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_hlm)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8395  -0.4045  -0.3860  -0.3534   2.4113  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        -3.385662   1.367555  -2.476   0.0133 *
## recorded_uses_item  0.011548   0.007932   1.456   0.1454  
## length_cn           0.048610   0.080933   0.601   0.5481  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 113.13  on 209  degrees of freedom
## Residual deviance: 111.02  on 207  degrees of freedom
## AIC: 117.02
## 
## Number of Fisher Scoring iterations: 5
# math
df_math <- df2 %>%
  filter(location_code == "math")
mod_math <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_math, family=binomial)
summary(mod_math)
## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_math)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7418  -0.2950  -0.2700  -0.2600   2.6149  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        -3.457243   1.902465  -1.817   0.0692 .
## recorded_uses_item  0.046144   0.021517   2.145   0.0320 *
## length_cn           0.004214   0.124724   0.034   0.9730  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 47.226  on 115  degrees of freedom
## Residual deviance: 42.931  on 113  degrees of freedom
## AIC: 48.931
## 
## Number of Fisher Scoring iterations: 6

Tattle-tape redux: What evidence do we have that security stripping improves UF rates? In other words, what is our estimate of the effect size (i.e., return on investment) for libraries that operate security systems?

EAST surveyed libraries that participated in their validation study about security practices

Sometime after completing it’s validation study, EAST conducted a survey of participating libraries to find out about the theft deterrence practices. 32 libraries responded. The library names are anonymized.

library tattletape_yes_no validation_score
anteater Yes 0.016
armadillo No 0.052
axolotl No 0.016
buffalo Yes 0.065
camel Yes 0.024
chameleon Yes 0.049
cheetah No 0.010
chipmunk Yes 0.047
chupacabra No 0.027
crow Yes 0.025
dolphin Yes 0.011
giraffe Yes 0.006
grizzly Yes 0.010
hedgehog No 0.047
hippo Yes 0.037
ifrit Yes 0.022
iguana No 0.044
jackal Yes 0.003
koala No 0.017
lemur Yes 0.084
leopard Yes 0.008
liger Yes 0.012
llama Yes 0.018
manatee No 0.016
monkey Yes 0.047
narwhal No 0.030
nyan cat Yes 0.006
otter Yes 0.010
panda Yes 0.032
quagga No 0.018
squirrel Yes 0.005
wombat Yes 0.033

In this experiment we divided the EAST libraries into two groups, the 22 libraries in the survey with security systems vs. the 10 libraries with no security systems, and generated unaccounted for (UF) rates and standard errors using bootstrap simulation for each group. The difference in UF rates can be explained by random noise, as we see from the overlapping 95% confidence intervals. We conclude again that the effect size of having a security system is zero.

Cornell

At Cornell, we had an experiment ready to be conducted because there is one unit that does not use security stripping or gates, Law. Our intuition might tell us that the Law UF rate should therefore be higher than the other units. That is not the case. In this sample is Law right in the middle of pack, with confidence intervals overlapping other units that have both higher and lower UF rates.

Recommendations

  1. Improve model. What variables might we add to improve the prediction accuracy of our model? What other questions should we be asking?
  2. Where confidence intervals are widest, do more sampling in Cornell unit libraries to improve the accuracy of our estimates.
  3. Use our predictive model to improve the user experience for patrons: Replace missing items or make a decision to remove missing items from our catalog before patrons experience frustration. This means allocating more resources to stacks management at units where the open stacks need attention.

Report URL

Chandler, Adam. “Cornell Validation Study 2018 Findings 2,” September 2018. http://rpubs.com/acct4rpubs/419508.