Compiler
Adam ChandlerFri Jan 09 12:32:40 2015
Purpose:
1. create a baseline for Voyager load performance
2. identify the characteristics of the slowest jobs
Data source:
This analysis builds on work Pete did in the fall to create a log of job load performance. The source data is : http://lstools.library.cornell.edu/mysql/bulkimplog.cgi
I added two fields to facilite analysis and visualization of the data:
1. Secs.Bib = round(Durationsecs / Records, 2))
2. Durationsecs = Duration converted to seconds
3. Datetime = Day and time reformated
4. Wday = Day of week number (Sunday = 1)
5. Hour = Hour of day job started
565 Jobs in Sample
## Classes 'tbl_df', 'tbl' and 'data.frame': 565 obs. of 16 variables:
## $ WeekDay : chr "" "Thu" "Thu" "Sat" ...
## $ Records : int NA 14 14 5496 40000 45000 39515 43939 18996 33198 ...
## $ Duration : chr "" "0:00:01" "0:00:01" "0:08:35" ...
## $ Bibs.Sec : num NA 14 14 10.7 10.5 ...
## $ BulkImpRule : chr "" "001MERG" "001MERG" "BIBMAINT" ...
## $ Netid : chr "" "lms6" "lms6" "lbatch" ...
## $ KeywordIndexing: chr "" "yes" "yes" "no" ...
## $ Added : int NA 0 0 0 36 163 5940 6562 0 0 ...
## $ Merged : int NA 0 0 0 39948 44825 33507 37236 0 0 ...
## $ Replaced : int NA 0 0 5496 0 0 0 0 18996 33198 ...
## $ Filename : chr "" "lms6..001MERG.BACH1163.mrc" "lms6..001MERG.BACH1163.dat" "lbatch.cleanup.BIBMAINT.1419133480.mrc" ...
## $ Durationsecs : num NA 1 1 515 3797 ...
## $ Secs.Bib : num NA 0.07 0.07 0.09 0.09 0.1 0.1 0.11 0.12 0.12 ...
## $ Datetime : POSIXct, format: NA "2014-12-18 12:02:00" ...
## $ Wday : num NA 5 5 7 4 6 4 7 1 6 ...
## $ Hour : Factor w/ 19 levels "2","4","6","7",..: NA 9 8 18 4 15 8 7 13 12 ...
## Source: local data frame [45 x 5]
##
## BulkImpRule totalrecords totaljobs totaldurationsecs Secs.Bib
## 1 BIBMAINT 3658794 141 911777 0.25
## 2 AUTHMERG 229323 15 28274 0.12
## 3 BIBSUP 45834 46 28531 0.62
## 4 BIBUPD 5656 21 6787 1.20
## 5 SERSOLUN 2554 1 1249 0.49
## 6 MITCN 1289 26 6355 4.93
## 7 CAMBEIRO 1148 24 1962 1.71
## 8 YANKEE 1109 4 1948 1.76
## 9 SERSOL 1106 1 1325 1.20
## 10 MARCADIA 720 1 1676 2.33
## .. ... ... ... ... ...
## [1] "# of jobs: 92"
## [1] "# Records in jobs"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2554 8595 39990 42370 65750 203600
## [1] "Secs.Bib"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.090 0.190 0.240 0.254 0.295 0.750
## [1] "Duration in seconds"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 515 2204 8717 10030 15250 31280
## [1] "# of jobs: 472"
## [1] "# Records in jobs"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 6.00 111.80 28.25 2342.00
## [1] "Secs.Bib"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.070 1.920 3.125 7.828 8.518 88.500
## [1] "Duration in seconds"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 7.0 37.0 189.4 187.5 3405.0