Link to the Parallel Workload Archive page: https://www.cs.huji.ac.il/labs/parallel/workload/l_kit_fh2/index.html.
In a first time, let us load the file and look at the size of it.
filename <- "kit.csv"
df <- read.csv(filename, header = T, sep = "")
nrow(df)## [1] 114355
Note that the .csv file is the orignal .swf file without the SWF header.
We compute the start time and finishi time of each jobs.
df$start_time <- df$submit + df$wait
df$end_time <- df$start_time + df$runtimeThe SWF has a status field for the status of the job. To summarize, the status field codes are (or should be) as follows:
| n | Status |
|---|---|
| 0 | Job Failed |
| 1 | Job completed successfully |
| 2 | This partial execution will be continued |
| 3 | This is the last partial execution, job completed |
| 4 | This is the last partial execution, job failed |
| 5 | Job was cancelled (either before starting or during run) |
As a first approach, we are going to keep only the successful jobs.
df <- df[df$status != 5 & df$status != 0 & df$status != -1,]
nrow(df)## [1] 114355
A job in the KIT Cluster waits in average 8779.8077915 seconds.
The following Figure shows the histogram of the waiting times for all the jobs in the cluster.
We can see that about 70% of the jobs wait less than 10 seconds before starting being executed.
A job in the KIT Cluster runs in average 1.7716572^{4} seconds. The longer running time is 604800 seconds, i.e. 168 hours. The shorter is 1 seconds.
The following Figure shows the histogram of the running times for all the jobs in the cluster.
We can see that the majority of jobs are running for about a minute. And that 50% of the jobs run for 10 minutes of less.
We decide to look at the number of submission in a 30 seconds time frame. This time frame is chosen as this is the sampling time of CiGri.
maxtime <- max(df$submit)
time_interval <- 30
df30s <- df %>% group_by(gr=cut(submit, breaks= seq(0, maxtime, by = time_interval))) %>% summarise(count = n())
ggplot(df30s) +
stat_ecdf(aes(count), geom = "step") +
theme_bw() +
scale_x_continuous(trans = 'log10') +
ggtitle("CDF of the number of Submissions per 30 seconds") +
xlab("Number of Submissions") +
ylab("P(X < n)")We now only focus on three fields:
the submission time of the job
the runtime of the job
the number of processors used
keep <- c("submit", "runtime", "procused")
df <- df[keep]We then group the submission by interval of 30 seconds
time_interval <- 30
maxtime <- max(df$submit)
nb_interval <- maxtime / time_interval
df2 <- df %>% group_by(gr=cut(submit, breaks=nb_interval)) %>%
summarise(nb_submission = n(), procs = mean(procused), run = mean(runtime)) %>%
arrange(as.numeric(gr))
df2$c <- as.character(df2$gr)
df2$start <- df2$c
for (i in 1:length(df2$c)) {
df2$start[i] <- as.numeric( sub("\\((.+),.*", "\\1", df2$start[i]))
}
df2$start[1] <- 0
keep <- c("nb_submission", "procs", "run")
df <- df2[keep]
head(df)plot_ly(x = df$nb_submission, y = df$procs, z = df$run, type="scatter3d", mode="markers", size = 5)We can see that most point are “close to the sides”.
set.seed(17440)
N <- 100
ns <- seq(1, N)
wss <- seq(1, N)
dkm <- data.frame(ns, wss)
for (i in 1:N) {
clustering <- kmeans(df, i)
dkm$wss[i] <- clustering$tot.withinss
}
ggplot(dkm) +
geom_line(aes(x = ns, y = wss)) +
theme_bw() +
scale_y_continuous(trans = "log10")mini <- min(dkm$wss)
opti_n <- dkm$ns[dkm$wss == mini]
clustering <- kmeans(df, opti_n)
df$cluster <- clustering$cluster
plot_ly(x=df$nb_submission, y=df$procs, z=df$run, type="scatter3d", mode="markers", size = 5, color = df$cluster)clustering$centers## nb_submission procs run
## 1 1.176152 159.372165 26579.42360
## 2 1.199248 114.141838 19191.35255
## 3 1.171906 265.492932 101.17601
## 4 1.629073 69.425656 464.79111
## 5 1.106509 10514.615385 450.47436
## 6 1.206595 2.983976 61.90965
## 7 2.675136 320.935330 53418.62147
## 8 1.350844 184.294380 13063.88049
## 9 1.224138 479.402299 1449.05435
## 10 1.351528 133.113963 2583.21533
## 11 1.502160 254.723522 73460.66898
## 12 1.388031 48.119510 929.20217
## 13 1.249110 195.888070 29609.67378
## 14 1.315861 93.175151 89.10152
## 15 1.281371 80.977320 2955.61032
## 16 1.587838 224.299784 11033.52369
## 17 1.231646 296.618509 324.03030
## 18 1.806897 79.439618 1184.67842
## 19 2.203812 151.090456 5546.82128
## 20 1.226209 104.097244 15926.34903
## 21 1.196300 596.691670 173136.98793
## 22 1.229358 669.224516 479.05939
## 23 1.284195 312.733858 597.19210
## 24 1.589951 150.283043 4088.18082
## 25 1.354508 77.757036 1612.14660
## 26 2.576271 312.064712 48156.00861
## 27 1.493182 84.008151 2196.55620
## 28 1.379949 121.392964 3570.01835
## 29 1.820896 438.782587 974.04535
## 30 1.234015 211.414777 32891.11806
## 31 2.905612 46.277975 365.45868
## 32 1.518170 235.379948 36610.37536
## 33 1.833625 35.995132 705.86100
## 34 1.294225 512.739677 118.41246
## 35 1.201410 668.211189 270470.12017
## 36 1.244541 541.216157 2031.51346
## 37 1.447164 132.863012 6197.50084
## 38 1.946827 196.402411 22897.20482
## 39 1.384530 146.785476 24249.38542
## 40 1.369967 88.254596 789.17964
## 41 1.371027 286.566286 84986.79909
## 42 1.522358 306.904099 66081.54387
## 43 1.735990 318.674157 42711.62166
## 44 1.531532 1532.284999 1541.01538
## 45 2.042350 166.498553 20854.65895
## 46 1.095845 145.481852 167.04513
## 47 1.661972 3260.253018 1999.53119
## 48 1.229572 4696.464332 335.42315
## 49 1.347727 2533.674598 256.34266
## 50 1.332553 610.506245 148118.35660
## 51 1.414520 300.472443 17868.31742
## 52 1.859375 91.803375 1858.65525
## 53 1.460123 142.548844 4743.32447
## 54 1.854973 33.255076 180.48941
## 55 1.372327 358.691195 96263.40458
## 56 1.526902 199.408265 11998.55768
## 57 2.166202 43.956838 585.51610
## 58 2.250000 1202.368750 182.03259
## 59 1.453704 191.859506 8134.33359
## 60 1.441989 427.046238 224467.25217
## 61 1.234234 448.664916 109973.14894
## 62 2.027316 42.291602 273.38723
## 63 1.724138 211.132670 10022.76740
## 64 1.315141 653.880428 128635.55856
## 65 1.275000 3092.275000 5744.98750
## 66 1.670924 20.792058 100.36633
## 67 1.259259 1297.351852 3548.47454
## 68 1.620047 200.073926 9056.71304
## 69 1.873563 304.477251 60196.43978
## 70 1.557461 213.386980 7131.08241
## 71 1.000000 6172.000000 11527.00000
## 72 1.435045 357.037958 15005.12345
## 73 1.137130 16.831567 47.34374
## 74 1.654515 77.955176 1434.26738
## 75 1.509485 182.104502 14212.10274
## 76 1.234973 120.243716 16426.82451