1 KIT Cluster: Submissions

Link to the Parallel Workload Archive page: https://www.cs.huji.ac.il/labs/parallel/workload/l_kit_fh2/index.html.

1.1 Cleaning

In a first time, let us load the file and look at the size of it.

filename <- "kit.csv"
df <- read.csv(filename, header = T, sep = "")
nrow(df)
## [1] 114355

Note that the .csv file is the orignal .swf file without the SWF header.

We compute the start time and finishi time of each jobs.

df$start_time <- df$submit + df$wait
df$end_time   <- df$start_time + df$runtime

The SWF has a status field for the status of the job. To summarize, the status field codes are (or should be) as follows:

n Status
0 Job Failed
1 Job completed successfully
2 This partial execution will be continued
3 This is the last partial execution, job completed
4 This is the last partial execution, job failed
5 Job was cancelled (either before starting or during run)

As a first approach, we are going to keep only the successful jobs.

df <- df[df$status != 5 & df$status != 0 & df$status != -1,]
nrow(df)
## [1] 114355

1.2 Waiting Time

A job in the KIT Cluster waits in average 8779.8077915 seconds.

The following Figure shows the histogram of the waiting times for all the jobs in the cluster.

We can see that about 70% of the jobs wait less than 10 seconds before starting being executed.

1.3 Running Time

A job in the KIT Cluster runs in average 1.7716572^{4} seconds. The longer running time is 604800 seconds, i.e. 168 hours. The shorter is 1 seconds.

The following Figure shows the histogram of the running times for all the jobs in the cluster.

We can see that the majority of jobs are running for about a minute. And that 50% of the jobs run for 10 minutes of less.

1.4 Submission per 30 seconds

We decide to look at the number of submission in a 30 seconds time frame. This time frame is chosen as this is the sampling time of CiGri.

maxtime <- max(df$submit)
time_interval <- 30

df30s <- df %>% group_by(gr=cut(submit, breaks= seq(0, maxtime, by = time_interval))) %>% summarise(count = n())

ggplot(df30s) +
    stat_ecdf(aes(count), geom = "step") +
    theme_bw() +
    scale_x_continuous(trans = 'log10') +
    ggtitle("CDF of the number of Submissions per 30 seconds") +
    xlab("Number of Submissions") +
    ylab("P(X < n)")

1.5 Kmeans

We now only focus on three fields:

  • the submission time of the job

  • the runtime of the job

  • the number of processors used

keep <- c("submit", "runtime", "procused")
df <- df[keep]

We then group the submission by interval of 30 seconds

time_interval <- 30
maxtime <- max(df$submit)
nb_interval <- maxtime / time_interval

df2 <- df %>% group_by(gr=cut(submit, breaks=nb_interval)) %>%
  summarise(nb_submission = n(), procs = mean(procused), run = mean(runtime)) %>%
  arrange(as.numeric(gr))

df2$c <- as.character(df2$gr)
df2$start <- df2$c

for (i in 1:length(df2$c)) {
  df2$start[i] <- as.numeric( sub("\\((.+),.*", "\\1", df2$start[i]))
}

df2$start[1] <- 0

keep <- c("nb_submission", "procs", "run")
df <- df2[keep]

head(df)
plot_ly(x = df$nb_submission, y = df$procs, z = df$run, type="scatter3d", mode="markers", size = 5)

We can see that most point are “close to the sides”.

set.seed(17440)

N <- 100

ns <- seq(1, N)
wss <- seq(1, N)
dkm <- data.frame(ns, wss)

for (i in 1:N) {
  clustering <- kmeans(df, i)
  dkm$wss[i] <- clustering$tot.withinss
}

ggplot(dkm) +
  geom_line(aes(x = ns, y = wss)) +
  theme_bw() +
  scale_y_continuous(trans = "log10")

mini <- min(dkm$wss)

opti_n <- dkm$ns[dkm$wss == mini]

clustering <- kmeans(df, opti_n)

df$cluster <- clustering$cluster

plot_ly(x=df$nb_submission, y=df$procs, z=df$run, type="scatter3d", mode="markers", size = 5, color = df$cluster)
clustering$centers
##    nb_submission        procs          run
## 1       1.176152   159.372165  26579.42360
## 2       1.199248   114.141838  19191.35255
## 3       1.171906   265.492932    101.17601
## 4       1.629073    69.425656    464.79111
## 5       1.106509 10514.615385    450.47436
## 6       1.206595     2.983976     61.90965
## 7       2.675136   320.935330  53418.62147
## 8       1.350844   184.294380  13063.88049
## 9       1.224138   479.402299   1449.05435
## 10      1.351528   133.113963   2583.21533
## 11      1.502160   254.723522  73460.66898
## 12      1.388031    48.119510    929.20217
## 13      1.249110   195.888070  29609.67378
## 14      1.315861    93.175151     89.10152
## 15      1.281371    80.977320   2955.61032
## 16      1.587838   224.299784  11033.52369
## 17      1.231646   296.618509    324.03030
## 18      1.806897    79.439618   1184.67842
## 19      2.203812   151.090456   5546.82128
## 20      1.226209   104.097244  15926.34903
## 21      1.196300   596.691670 173136.98793
## 22      1.229358   669.224516    479.05939
## 23      1.284195   312.733858    597.19210
## 24      1.589951   150.283043   4088.18082
## 25      1.354508    77.757036   1612.14660
## 26      2.576271   312.064712  48156.00861
## 27      1.493182    84.008151   2196.55620
## 28      1.379949   121.392964   3570.01835
## 29      1.820896   438.782587    974.04535
## 30      1.234015   211.414777  32891.11806
## 31      2.905612    46.277975    365.45868
## 32      1.518170   235.379948  36610.37536
## 33      1.833625    35.995132    705.86100
## 34      1.294225   512.739677    118.41246
## 35      1.201410   668.211189 270470.12017
## 36      1.244541   541.216157   2031.51346
## 37      1.447164   132.863012   6197.50084
## 38      1.946827   196.402411  22897.20482
## 39      1.384530   146.785476  24249.38542
## 40      1.369967    88.254596    789.17964
## 41      1.371027   286.566286  84986.79909
## 42      1.522358   306.904099  66081.54387
## 43      1.735990   318.674157  42711.62166
## 44      1.531532  1532.284999   1541.01538
## 45      2.042350   166.498553  20854.65895
## 46      1.095845   145.481852    167.04513
## 47      1.661972  3260.253018   1999.53119
## 48      1.229572  4696.464332    335.42315
## 49      1.347727  2533.674598    256.34266
## 50      1.332553   610.506245 148118.35660
## 51      1.414520   300.472443  17868.31742
## 52      1.859375    91.803375   1858.65525
## 53      1.460123   142.548844   4743.32447
## 54      1.854973    33.255076    180.48941
## 55      1.372327   358.691195  96263.40458
## 56      1.526902   199.408265  11998.55768
## 57      2.166202    43.956838    585.51610
## 58      2.250000  1202.368750    182.03259
## 59      1.453704   191.859506   8134.33359
## 60      1.441989   427.046238 224467.25217
## 61      1.234234   448.664916 109973.14894
## 62      2.027316    42.291602    273.38723
## 63      1.724138   211.132670  10022.76740
## 64      1.315141   653.880428 128635.55856
## 65      1.275000  3092.275000   5744.98750
## 66      1.670924    20.792058    100.36633
## 67      1.259259  1297.351852   3548.47454
## 68      1.620047   200.073926   9056.71304
## 69      1.873563   304.477251  60196.43978
## 70      1.557461   213.386980   7131.08241
## 71      1.000000  6172.000000  11527.00000
## 72      1.435045   357.037958  15005.12345
## 73      1.137130    16.831567     47.34374
## 74      1.654515    77.955176   1434.26738
## 75      1.509485   182.104502  14212.10274
## 76      1.234973   120.243716  16426.82451