Cornell Validation Study 2018 Findings

Acknowledgements

Wendy Wilcox, study design, project management, data collection
Joanne Leary, study design, data collection
Jenn Colt, Google App installation
Sara Amato (EAST consortium data analyst), assisted with sampling method, software installation, and data sharing

2017 CUL Tattle-tape task force recommendations

To: Xin Li
From: Adam Chandler (chair), Susie Cobb, Maureen Morris, Wendy Wilcox
Date: December 19, 2017
Subject: Tattle-tape Task Force Final Report

Even though members of our task force are not confident that it is actually preventing theft we must recommend continuing tattle-taping because staff, primarily selectors, clearly oppose changing the policy at this time. Feedback from access services staff was split: smaller units felt that tattle-taping was effective at preventing theft while larger units felt that tattle-taping was ineffective in protecting open stack collections. These responses make sense given the different approaches to responding to gate alarms. Before the CUL tattle-taping policy is changed, we recommend these steps:

Replacement fees should by recycled back into supporting replacement of missing and lost materials.

Centralize and streamline the decision-making process and funding for replacing missing and lost materials.

Consider conducting an inventory of the library’s open stacks collections using the methodology (and tools, perhaps) employed in the EAST validation study to use as a baseline to inform present and future decision making on this issue.

What is the EAST validation study?

“In order to evaluate the statistical likelihood that a retained volume exists on the shelves of any of the institutions, the EAST incorporated sample-based validation studies. The specific goals of this study were to establish and document the degree of confidence, and the possibility of error, in any EAST committed title being available for circulation. Results of the validation sample studies help predict the likelihood that titles selected for retention actually exist and can be located in the collection of a Retention Partner, and are in useable condition.” [https://eastlibraries.org/validation]

EAST Results

Overall, EAST can report a 97% availability rate.
The aggregated results from both cohorts (312,000 holdings across the 52 libraries) showed:
* 97% of monographs in the sample were accounted for: mean: 97%, median: 97.1%, high of 99.8% and low of 91%. (Note: âaccounted forâ includes those items previously determined to be in circulation based on an automated check of the librariesâ ILS.) * 2.3% of titles were in circulation at the time of the study
* 90% of the titles were deemed to be in average or excellent condition with 10% marked as in poor condition. Not surprisingly, older titles were in poorer condition.

A few notable observations include:
* Items published pre-1900 were in significantly poorer condition; some 45% of these items ranked “poor” on the condition scale
* An item being in poor condition was also somewhat correlated to its subject area
* The most significant factor for an item being missing was the holding library.

Cornell Validation Study 2018 Results

Study conducted between April and July. We sampled 6006 monograph across campus. Wendy Wilcox led the team that did the data collection in the stacks. Some notes: Annex was excluded because the stacks there are closed; Fine Arts was excluded because they are in the middle of a building transition.

AF (accounted for) = checkedout + present

Cornell accounted for rate: 96.4%

Our dataset

glimpse(df)

## Observations: 5,975
## Variables: 34
## $ present_or_not     <fct> Present, Present, Present, Present, Present...
## $ bib_rec_nbr        <chr> "1968678", "2249095", "5689943", "8618953",...
## $ mfhd_id            <chr> "2389846", "2702959", "6199187", "8994997",...
## $ item_control_nbr   <chr> "3723592", "4103508", "7620171", "9494855",...
## $ barcode            <chr> "31924062968908", "31924072130184", "319241...
## $ begin_pub_date     <dbl> 1960, 1993, 1971, 2013, 2010, 1971, 1994, 1...
## $ location_code      <fct> afr, afr, afr, afr, afr, afr, afr, afr, afr...
## $ firstletter        <fct> D, D, D, D, D, D, D, E, E, E, E, E, E, E, E...
## $ class              <chr> "DT", "DT", "DT", "DT", "DT", "DT", "DT", "...
## $ classnumber        <dbl> 32.000, 328.000, 356.000, 433.285, 433.545,...
## $ normalized_call_no <chr> "DT   32            R 61", "DT  328        ...
## $ display_call_no    <chr> "DT32 .R61", "DT328.M53 .H3613x 1993", "DT3...
## $ call_nbr_norm_item <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ enumeration        <chr> NA, NA, NA, NA, NA, NA, NA, "c.2", NA, NA, ...
## $ length_cn          <dbl> 9, 22, 14, 19, 22, 20, 18, 11, 12, 18, 18, ...
## $ pagination         <chr> "312 p. 22 cm.", "xv, 199 p. : ill. ; 24 cm...
## $ title              <chr> "Death of Africa. By Peter Ritner.", "Victi...
## $ recorded_uses_item <dbl> 1, 0, 2, 0, 0, 10, 2, 0, 56, 1, 2, 4, 3, 2,...
## $ worldcat_oclc_nbr  <chr> "412793", "59941146", "148569", "869824175"...
## $ catalog_url        <chr> "https://newcatalog.library.cornell.edu/cat...
## $ us_holdings        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ row_number         <dbl> 1585081, 735421, 2715241, 850681, 84661, 55...
## $ initials           <chr> "mah94", "mah94", "mah94", "mah94", "mah94"...
## $ condition          <fct> Acceptable, Excellent, Acceptable, Excellen...
## $ barcode_validation <chr> "yes", "yes", "yes", "yes", "yes", "no", "y...
## $ timestamp          <dttm> 2018-07-06 13:56:22, 2018-07-06 13:56:22, ...
## $ item_status_desc   <chr> "Not Charged", "Not Charged", "Not Charged"...
## $ is_missing         <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ has_circulated     <dbl> 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1...
## $ is_oversize        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ age                <dbl> 58, 25, 47, 5, 8, 47, 24, 53, 28, 39, 36, 3...
## $ age_group          <fct> 11plus, 11plus, 11plus, 0-10, 0-10, 11plus,...
## $ callnum            <chr> "dt32 .r61", "dt328.m53 .h3613x 1993", "dt3...
## $ num_cn_chars       <int> 3, 5, 3, 3, 4, 4, 4, 3, 2, 2, 3, 3, 3, 2, 3...

Bootstrap simulation to derive a standard error

p_hat <- df %>%
  summarise(stat = mean(is_missing == "1")) %>%
  pull()
p_hat

## [1] 0.03531381

replimit = 1000

boot <- df %>%
  specify(response = is_missing, success = "1") %>%
  generate(reps = replimit, type = "bootstrap") %>%
  calculate(stat = "prop")
boot

## # A tibble: 1,000 x 2
##    replicate   stat
##        <int>  <dbl>
##  1         1 0.0345
##  2         2 0.0341
##  3         3 0.0330
##  4         4 0.0377
##  5         5 0.0393
##  6         6 0.0380
##  7         7 0.0341
##  8         8 0.0378
##  9         9 0.0346
## 10        10 0.0333
## # ... with 990 more rows

se <- boot %>%
  summarize(sd(stat)) %>%
  pull()
se

## [1] 0.002373401

“The standard error is the standard deviation of the sampling distribution of the sample mean.” [Geoff Cumming, Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis, 2012]

CUL mean and 1000 replication bootstrap confidence interval: M = 0.035, 95% CI [0.031, 0.04] .

How many monographs do we estimate are missing across the whole collection?

nrow(population_to_draw_from) [1] 3079136

nrow(population_to_draw_from) * .0357 = 109,925

Therefore, our best estimate of how many total unaccounted for items there may be across these CUL units is 109,925 +/- 524 .

Sample size, UF rates, and condition across combined CUL locations

location_code	total	ave_age	ave_num_uses	percent_excellent	percent_acceptable	percent_poor	percent_na
olin	3216	33	2.25	11	81	1	7
asia	1282	22	1.15	62	31	1	6
law	450	49	1.48	56	34	5	6
mann	280	27	5.87	42	41	1	16
uris	270	35	4.32	15	75	NA	10
hlm	210	38	5.90	29	54	6	11
math	116	37	6.69	33	52	6	9
mus	102	34	3.91	47	47	1	5
afr	49	31	3.10	47	41	6	6

Logistic regression models

Model 1: all the possible explanatory variables

mod1 <- glm(is_missing ~ location_code + firstletter + recorded_uses_item + is_oversize + age + length_cn + num_cn_chars, data=df2, family=binomial)
tidy_mod1 <- tidy(mod1) %>%
    arrange(p.value)
kable(tidy_mod1)

term	estimate	std.error	statistic	p.value
recorded_uses_item	0.0182676	0.0055599	3.2855707	0.0010178
location_codemann	1.0551836	0.3601727	2.9296600	0.0033933
location_codehlm	0.9843067	0.3420155	2.8779593	0.0040026
location_codelaw	1.0918400	0.5047513	2.1631245	0.0305316
location_codeasia	0.3533658	0.1971118	1.7927177	0.0730181
location_codemath	1.1232229	0.6563570	1.7112987	0.0870260
location_codemus	-2.0436455	1.4227482	-1.4364070	0.1508866
length_cn	0.0202631	0.0184296	1.0994828	0.2715575
num_cn_chars	0.0583434	0.0703174	0.8297150	0.4066999
age	0.0008798	0.0011542	0.7622741	0.4458964
location_codeuris	0.3319163	0.4519798	0.7343609	0.4627288
is_oversize	0.2037515	0.2877821	0.7080063	0.4789413
location_codeafr	-0.4105901	1.0265235	-0.3999812	0.6891704
(Intercept)	-18.4297290	620.8715397	-0.0296836	0.9763194
firstletterM	15.2509858	620.8722880	0.0245638	0.9804029
firstletterL	15.0742805	620.8715578	0.0242792	0.9806299
firstletterE	14.9261936	620.8715952	0.0240407	0.9808201
firstletterZ	14.8995164	620.8716369	0.0239977	0.9808544
firstletterR	14.7993926	620.8716143	0.0238365	0.9809830
firstletterN	14.7158794	620.8715956	0.0237020	0.9810903
firstletterT	14.5164988	620.8715821	0.0233808	0.9813465
firstletterJ	14.2335562	620.8715645	0.0229251	0.9817100
firstletterC	14.2139690	620.8718576	0.0228936	0.9817352
firstletterF	14.1767694	620.8716603	0.0228337	0.9817830
firstletterP	14.1664464	620.8714571	0.0228170	0.9817962
firstletterH	14.1176910	620.8714729	0.0227385	0.9818589
firstletterG	13.9564668	620.8716530	0.0224788	0.9820660
firstletterD	13.8165996	620.8714845	0.0222536	0.9822457
firstletterB	13.7878154	620.8715151	0.0222072	0.9822827
firstletterQ	13.7203047	620.8716235	0.0220985	0.9823694
firstletterK	13.5972138	620.8716554	0.0219002	0.9825276
firstletterS	12.8060778	620.8723236	0.0206259	0.9835440
firstletterV	-0.1306172	1453.9564338	-0.0000898	0.9999283
firstletterU	-0.0602283	918.6032630	-0.0000656	0.9999477

Model 2: we can start making pretty good predictions

## 
## Call:
## glm(formula = is_missing ~ location_code + length_cn + recorded_uses_item, 
##     family = binomial, data = df2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1570  -0.2845  -0.2398  -0.2174   2.8426  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -4.328734   0.261644 -16.544  < 2e-16 ***
## location_codeasia   0.347490   0.187052   1.858 0.063208 .  
## location_codelaw    0.668909   0.246522   2.713 0.006660 ** 
## location_codemann   1.031294   0.259681   3.971 7.15e-05 ***
## location_codeuris   0.593217   0.300234   1.976 0.048172 *  
## location_codehlm    1.081866   0.293362   3.688 0.000226 ***
## location_codemath   0.748466   0.438482   1.707 0.087832 .  
## location_codemus   -0.932390   1.011915  -0.921 0.356836    
## location_codeafr   -0.214804   1.017155  -0.211 0.832746    
## length_cn           0.036105   0.012926   2.793 0.005218 ** 
## recorded_uses_item  0.017584   0.005506   3.194 0.001405 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1825.4  on 5974  degrees of freedom
## Residual deviance: 1775.5  on 5964  degrees of freedom
## AIC: 1797.5
## 
## Number of Fisher Scoring iterations: 7

Items with highest probablity of being unaccounted for

## # A tibble: 2 x 2
##   `as.integer(is_missing)`     n
##                      <int> <int>
## 1                        1    91
## 2                        2     9

bib_rec_nbr	is_missing	location_code	length_cn	recorded_uses_item	.fitted
7704606	1	hlm	17	204	0.7219358
2910259	0	hlm	17	147	0.4879488
6194426	0	law	16	145	0.3699469
7093874	0	hlm	18	108	0.3322796
787271	0	mann	10	98	0.2291452
2229926	0	law	18	94	0.2047236
7787849	1	math	14	89	0.1809669
4498531	0	mann	49	1	0.1808308
5347296	0	mann	49	0	0.1782406
4519494	0	mann	48	1	0.1755441
5318335	1	mann	15	68	0.1736257
1564634	0	uris	15	92	0.1713339
4070688	0	uris	15	84	0.1522739
2936554	1	mann	16	56	0.1499426
4695650	0	uris	30	52	0.1495671

Items with highest probablity of being accounted for

## # A tibble: 1 x 2
##   `as.integer(is_missing)`     n
##                      <int> <int>
## 1                        1   100

	bib_rec_nbr	location_code	length_cn	recorded_uses_item	.fitted
5961	2402245	mus	11	2	0.0079326
5962	1788258	mus	9	6	0.0079179
5963	61313	mus	11	1	0.0077955
5964	1175404	mus	10	3	0.0077882
5965	450158	mus	11	0	0.0076606
5966	1762746	mus	11	0	0.0076606
5967	2412818	mus	11	0	0.0076606
5968	175007	mus	11	0	0.0076606
5969	2422304	mus	10	1	0.0075211
5970	812844	mus	10	1	0.0075211
5971	749588	mus	10	0	0.0073910
5972	2028297	mus	10	0	0.0073910
5973	2420529	mus	10	0	0.0073910
5974	2099009	mus	10	0	0.0073910
5975	940063	mus	9	0	0.0071308

Model 3: More parsing of call number.

In this model, we first try to remove words from call numbers then count the number of letters. The thinking here is this mights capture some of the complexity of more complicated call numbers. Not successful.This version does not help - still not a significant predictor. The simple call numberlength variable is more predictive.

df2 %>%
  select(display_call_no, length_cn, num_cn_chars) %>%
  sample_n(5)

## # A tibble: 5 x 3
##   display_call_no       length_cn num_cn_chars
##   <chr>                     <dbl>        <int>
## 1 E207.G81 G792 1871           18            3
## 2 PN1991.77.W3 W37 2013        21            4
## 3 PR6056.F54 W55x 1997         20            5
## 4 HD9000.5 .I582 1998          19            3
## 5 DS121.3 .R57 1992z           18            4

df2 %>%
  select(display_call_no, length_cn, num_cn_chars) %>%
  arrange(desc(num_cn_chars)) %>%
  top_n(5)

## Selecting by num_cn_chars

## # A tibble: 51 x 3
##    display_call_no                           length_cn num_cn_chars
##    <chr>                                         <dbl>        <int>
##  1 Oversize JN5208 .A16 ser.2 div.1 sect.2 +        41           13
##  2 PL5093.C5 B66 v.14,no.460,etc.                   30           10
##  3 HD4813 .I781 3d sess.no.1                        25           10
##  4 Trials KD370.N8 L38 1932                         24           10
##  5 KF26 .A3 90th Apoll                              19           10
##  6 KF26 .A35 92nd Tobac                             20           10
##  7 KF26 .A6 92nd Agric                              19           10
##  8 KF26 .C6 93rd Unive                              19           10
##  9 KF26 .C6 96th Nomin                              19           10
## 10 KF26 .E57 95th North                             20           10
## # ... with 41 more rows

mod3 <- glm(is_missing ~ location_code + recorded_uses_item  + num_cn_chars, data=df2, family=binomial)
summary(mod3)

## 
## Call:
## glm(formula = is_missing ~ location_code + recorded_uses_item + 
##     num_cn_chars, family = binomial, data = df2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1231  -0.2849  -0.2322  -0.2241   2.8518  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.889483   0.264240 -14.720  < 2e-16 ***
## location_codeasia   0.432762   0.184028   2.352 0.018693 *  
## location_codelaw    0.644010   0.250550   2.570 0.010159 *  
## location_codemann   1.012787   0.259366   3.905 9.43e-05 ***
## location_codeuris   0.715768   0.298746   2.396 0.016579 *  
## location_codehlm    1.037640   0.291873   3.555 0.000378 ***
## location_codemath   0.653674   0.436420   1.498 0.134182    
## location_codemus   -1.006419   1.011378  -0.995 0.319689    
## location_codeafr   -0.239331   1.017033  -0.235 0.813959    
## recorded_uses_item  0.017013   0.005454   3.119 0.001812 ** 
## num_cn_chars        0.055468   0.064544   0.859 0.390133    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1825.4  on 5974  degrees of freedom
## Residual deviance: 1782.1  on 5964  degrees of freedom
## AIC: 1804.1
## 
## Number of Fisher Scoring iterations: 7

Can we learn anything new if we restrict the dataset to specific locations?

# olin
df_olin <- df2 %>%
  filter(location_code == "olin")
mod_olin <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_olin, family=binomial)
summary(mod_olin)

## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_olin)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.4011  -0.2362  -0.2214  -0.2082   2.8683  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -4.463519   0.414020 -10.781   <2e-16 ***
## recorded_uses_item  0.007055   0.020287   0.348   0.7280    
## length_cn           0.044925   0.021268   2.112   0.0347 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 763.64  on 3215  degrees of freedom
## Residual deviance: 759.41  on 3213  degrees of freedom
## AIC: 765.41
## 
## Number of Fisher Scoring iterations: 6

# asia
df_asia <- df2 %>%
  filter(location_code == "asia")
mod_asia <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_asia, family=binomial)
summary(mod_asia)

## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_asia)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5334  -0.2822  -0.2719  -0.2667   2.6158  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.50735    0.57707  -6.078 1.22e-09 ***
## recorded_uses_item  0.04780    0.03733   1.280    0.200    
## length_cn           0.01086    0.02731   0.398    0.691    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 416.01  on 1281  degrees of freedom
## Residual deviance: 414.58  on 1279  degrees of freedom
## AIC: 420.58
## 
## Number of Fisher Scoring iterations: 6

# law
df_law <- df2 %>%
  filter(location_code == "law")
mod_law <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_law, family=binomial)
summary(mod_law)

## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_law)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.5050  -0.3166  -0.3149  -0.3133   2.4740  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.046662   0.706540  -4.312 1.62e-05 ***
## recorded_uses_item  0.006834   0.018371   0.372     0.71    
## length_cn           0.003802   0.038075   0.100     0.92    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 175.71  on 449  degrees of freedom
## Residual deviance: 175.59  on 447  degrees of freedom
## AIC: 181.59
## 
## Number of Fisher Scoring iterations: 5

# mann
df_mann <- df2 %>%
  filter(location_code == "mann")
mod_mann <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_mann, family=binomial)
summary(mod_mann)

## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_mann)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3692  -0.3807  -0.3494  -0.3304   2.5107  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.49074    0.78252  -4.461 8.16e-06 ***
## recorded_uses_item  0.03621    0.01567   2.311   0.0209 *  
## length_cn           0.03827    0.04073   0.940   0.3474    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 144.10  on 279  degrees of freedom
## Residual deviance: 139.08  on 277  degrees of freedom
## AIC: 145.08
## 
## Number of Fisher Scoring iterations: 5

# uris
df_uris <- df2 %>%
  filter(location_code == "uris")
mod_uris <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_uris, family=binomial)
summary(mod_uris)

## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_uris)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8050  -0.3635  -0.2641  -0.2152   2.7233  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -5.33176    1.07742  -4.949 7.47e-07 ***
## recorded_uses_item  0.02336    0.01999   1.168   0.2427    
## length_cn           0.10522    0.04204   2.503   0.0123 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 110.12  on 269  degrees of freedom
## Residual deviance: 103.18  on 267  degrees of freedom
## AIC: 109.18
## 
## Number of Fisher Scoring iterations: 6

# hlm
df_hlm <- df2 %>%
  filter(location_code == "hlm")
mod_hlm <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_hlm, family=binomial)
summary(mod_hlm)

## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_hlm)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8395  -0.4045  -0.3860  -0.3534   2.4113  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        -3.385662   1.367555  -2.476   0.0133 *
## recorded_uses_item  0.011548   0.007932   1.456   0.1454  
## length_cn           0.048610   0.080933   0.601   0.5481  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 113.13  on 209  degrees of freedom
## Residual deviance: 111.02  on 207  degrees of freedom
## AIC: 117.02
## 
## Number of Fisher Scoring iterations: 5

# math
df_math <- df2 %>%
  filter(location_code == "math")
mod_math <- glm(is_missing ~  recorded_uses_item  + length_cn, data=df_math, family=binomial)
summary(mod_math)

## 
## Call:
## glm(formula = is_missing ~ recorded_uses_item + length_cn, family = binomial, 
##     data = df_math)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7418  -0.2950  -0.2700  -0.2600   2.6149  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        -3.457243   1.902465  -1.817   0.0692 .
## recorded_uses_item  0.046144   0.021517   2.145   0.0320 *
## length_cn           0.004214   0.124724   0.034   0.9730  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 47.226  on 115  degrees of freedom
## Residual deviance: 42.931  on 113  degrees of freedom
## AIC: 48.931
## 
## Number of Fisher Scoring iterations: 6

Tattle-tape redux: What evidence do we have that security stripping improves UF rates? In other words, what is our estimate of the effect size (i.e., return on investment) for libraries that operate security systems?

EAST surveyed libraries that participated in their validation study about security practices

Sometime after completing it’s validation study, EAST conducted a survey of participating libraries to find out about the theft deterrence practices. 32 libraries responded. The library names are anonymized.

library	tattletape_yes_no	validation_score
anteater	Yes	0.016
armadillo	No	0.052
axolotl	No	0.016
buffalo	Yes	0.065
camel	Yes	0.024
chameleon	Yes	0.049
cheetah	No	0.010
chipmunk	Yes	0.047
chupacabra	No	0.027
crow	Yes	0.025
dolphin	Yes	0.011
giraffe	Yes	0.006
grizzly	Yes	0.010
hedgehog	No	0.047
hippo	Yes	0.037
ifrit	Yes	0.022
iguana	No	0.044
jackal	Yes	0.003
koala	No	0.017
lemur	Yes	0.084
leopard	Yes	0.008
liger	Yes	0.012
llama	Yes	0.018
manatee	No	0.016
monkey	Yes	0.047
narwhal	No	0.030
nyan cat	Yes	0.006
otter	Yes	0.010
panda	Yes	0.032
quagga	No	0.018
squirrel	Yes	0.005
wombat	Yes	0.033

In this experiment we divided the EAST libraries into two groups, the 22 libraries in the survey with security systems vs. the 10 libraries with no security systems, and generated unaccounted for (UF) rates and standard errors using bootstrap simulation for each group. The difference in UF rates can be explained by random noise, as we see from the overlapping 95% confidence intervals. We conclude again that the effect size of having a security system is zero.

Cornell

At Cornell, we had an experiment ready to be conducted because there is one unit that does not use security stripping or gates, Law. Our intuition might tell us that the Law UF rate should therefore be higher than the other units. That is not the case. In this sample is Law right in the middle of pack, with confidence intervals overlapping other units that have both higher and lower UF rates.

Recommendations

Improve model. What variables might we add to improve the prediction accuracy of our model? What other questions should we be asking?
Where confidence intervals are widest, do more sampling in Cornell unit libraries to improve the accuracy of our estimates.
Use our predictive model to improve the user experience for patrons: Replace missing items or make a decision to remove missing items from our catalog before patrons experience frustration. This means allocating more resources to stacks management at units where the open stacks need attention.

Report URL

Chandler, Adam. “Cornell Validation Study 2018 Findings 2,” September 2018. http://rpubs.com/acct4rpubs/419508.