Analysis of Voyager Load Performance, December 2014

Overview

Compiler
Adam Chandler
Fri Jan 09 12:32:40 2015

Purpose:
1. create a baseline for Voyager load performance
2. identify the characteristics of the slowest jobs

Data source:
This analysis builds on work Pete did in the fall to create a log of job load performance. The source data is : http://lstools.library.cornell.edu/mysql/bulkimplog.cgi

I added two fields to facilite analysis and visualization of the data:
1. Secs.Bib = round(Durationsecs / Records, 2))
2. Durationsecs = Duration converted to seconds
3. Datetime = Day and time reformated
4. Wday = Day of week number (Sunday = 1)
5. Hour = Hour of day job started

565 Jobs in Sample

## Classes 'tbl_df', 'tbl' and 'data.frame':    565 obs. of  16 variables:
##  $ WeekDay        : chr  "" "Thu" "Thu" "Sat" ...
##  $ Records        : int  NA 14 14 5496 40000 45000 39515 43939 18996 33198 ...
##  $ Duration       : chr  "" "0:00:01" "0:00:01" "0:08:35" ...
##  $ Bibs.Sec       : num  NA 14 14 10.7 10.5 ...
##  $ BulkImpRule    : chr  "" "001MERG" "001MERG" "BIBMAINT" ...
##  $ Netid          : chr  "" "lms6" "lms6" "lbatch" ...
##  $ KeywordIndexing: chr  "" "yes" "yes" "no" ...
##  $ Added          : int  NA 0 0 0 36 163 5940 6562 0 0 ...
##  $ Merged         : int  NA 0 0 0 39948 44825 33507 37236 0 0 ...
##  $ Replaced       : int  NA 0 0 5496 0 0 0 0 18996 33198 ...
##  $ Filename       : chr  "" "lms6..001MERG.BACH1163.mrc" "lms6..001MERG.BACH1163.dat" "lbatch.cleanup.BIBMAINT.1419133480.mrc" ...
##  $ Durationsecs   : num  NA 1 1 515 3797 ...
##  $ Secs.Bib       : num  NA 0.07 0.07 0.09 0.09 0.1 0.1 0.11 0.12 0.12 ...
##  $ Datetime       : POSIXct, format: NA "2014-12-18 12:02:00" ...
##  $ Wday           : num  NA 5 5 7 4 6 4 7 1 6 ...
##  $ Hour           : Factor w/ 19 levels "2","4","6","7",..: NA 9 8 18 4 15 8 7 13 12 ...

By BulkImport rules

## Source: local data frame [45 x 5]
## 
##    BulkImpRule totalrecords totaljobs totaldurationsecs Secs.Bib
## 1     BIBMAINT      3658794       141            911777     0.25
## 2     AUTHMERG       229323        15             28274     0.12
## 3       BIBSUP        45834        46             28531     0.62
## 4       BIBUPD         5656        21              6787     1.20
## 5     SERSOLUN         2554         1              1249     0.49
## 6        MITCN         1289        26              6355     4.93
## 7     CAMBEIRO         1148        24              1962     1.71
## 8       YANKEE         1109         4              1948     1.76
## 9       SERSOL         1106         1              1325     1.20
## 10    MARCADIA          720         1              1676     2.33
## ..         ...          ...       ...               ...      ...

By keywordindexing mode

Keyword indexed jobs are small and slow, non-indexed jobs are fast and big

KeywordIndexing == No

## [1] "# of jobs:  92"
## [1] "# Records in jobs"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2554    8595   39990   42370   65750  203600
## [1] "Secs.Bib"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.090   0.190   0.240   0.254   0.295   0.750
## [1] "Duration in seconds"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     515    2204    8717   10030   15250   31280

KeywordIndexing == Yes

## [1] "# of jobs:  472"
## [1] "# Records in jobs"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00    6.00  111.80   28.25 2342.00
## [1] "Secs.Bib"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.070   1.920   3.125   7.828   8.518  88.500
## [1] "Duration in seconds"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     7.0    37.0   189.4   187.5  3405.0

by day of week, Variation in speed

By Start hour