We are intersted in the relation between the load of the fileserver and the perturbation on the writing requests (that might come from priority users of the cluster).
For instance, in the past we defined the reference value for the CiGri controllers based on an empirical value of the load of the fileserver. We want now to be able to pick this reference value based on the maximum process time that a write request can experiment (which represents the disturbance on the jobs).
We are using two machines from the Grisou Cluster in Grid5000.
One of the machine represents the Fileserver and hosts a Network FileSystem (NFS). The other machine represents the OAR nodes and mounts this NFS server.
We are only interested in the increase in processing time. Thus we do not consider the usual computing part of the jobs.
Hence the OAR node will submit every 30 seconds (same as a CiGri cycle) a given number of write requests to the NFS server. All those requests have the same size.
We then measure the time taken between the start of the writing request until the end:
start_time="$(date +%s.N)"
dd if=/dev/zero of=/mnt/nfs/foo bs=<FILE_SIZE>M count=1 oflag=direct
end_time="$(date +%s.%N)"For the following, we consider 4 types of file sizes: 10M, 25M, 50M and 75M.
For each of this file size, we have 6 different submission sizes: 1 job, 10 jobs, 20 jobs, 30 jobs, 40 jobs and 50 jobs.
For each of the submission size, we repeat the submission 100 times.
And during all the duration of the experiment, we also log the load of the machine hosting the NFS server.
We start by plotting the processing time for each submission.
We can see that there is a relation between the maximum processing time for a job and the size of the file and the size of its submission.
We thus compute a model linking the maximum processing time and the couple (file size, submission size):
df_limits <- df %>%
group_by(file_size, sub_size) %>%
summarise(
start = min(start),
min_processing_time = min(time),
max_processing_time = max(time),
max_overhead = max(time) - min(time),
percentage_overhead = (max(time) - min(time)) * 100 / min(time)
)## `summarise()` regrouping output by 'file_size' (override with `.groups` argument)
model_max_time <- lm(data = df_limits, max_processing_time ~ sub_size:file_size)
summary(model_max_time)##
## Call:
## lm(formula = max_processing_time ~ sub_size:file_size, data = df_limits)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39132 -0.14780 -0.03857 0.15126 0.53803
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.497e-01 7.598e-02 1.971 0.0615 .
## sub_size:file_size 7.191e-03 5.335e-05 134.783 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2633 on 22 degrees of freedom
## Multiple R-squared: 0.9988, Adjusted R-squared: 0.9987
## F-statistic: 1.817e+04 on 1 and 22 DF, p-value: < 2.2e-16
We plot the actual processing times against the prediciton based only on the number of jobs sent (submission size) and the size of the file.
We see that we are able to predict the maximum processing time for a job given its size and the number of jobs in its submission.
We start by filtering a bit the noise in the load. For this we apply a rolling (or moving) average on the data. We took a window of 50 data points.
We can see that there are some sorts of stages where the load is constant. Those seem to correspond to the period when we send the same number of jobs from a given file size.
We can try to extract a model between the file size and the submission size for the load.
model_load <- lm(data = df_load, ma ~ sub_size * file_size, na.action = na.omit)
summary(model_load)##
## Call:
## lm(formula = ma ~ sub_size * file_size, data = df_load, na.action = na.omit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41775 -0.12606 0.01233 0.15070 1.07435
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.275e-02 8.171e-03 6.455 1.12e-10 ***
## sub_size -9.347e-04 2.675e-04 -3.494 0.000477 ***
## file_size -2.839e-03 1.737e-04 -16.348 < 2e-16 ***
## sub_size:file_size 1.848e-03 5.684e-06 325.145 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2812 on 14335 degrees of freedom
## Multiple R-squared: 0.9765, Adjusted R-squared: 0.9765
## F-statistic: 1.984e+05 on 3 and 14335 DF, p-value: < 2.2e-16
Let us plot the prediction of this model onto the data.
We see that knowing the file size of the jobs and the number of jobs in the submission, we are able to get a good approximation of the load.
What we are interested in is to know by how much our write request is expecting to be disturbed when the load has a given value.
We are thus looking for a model of the form:
\[ f(load) = max\_processing\_time \]
First, we use the model seen in the previous section to estimate the load based on the file size and submission size.
df$prediction_load <- predict(model_load, df)We then use the prediction from the model for the max processing time seen in Section 3 to find a new model linking the load of the fileserver and the max processing time.
model <- lm(data = df, prediction_processing_time ~ prediction_load)
summary(model)##
## Call:
## lm(formula = prediction_processing_time ~ prediction_load, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.13384 -0.08101 -0.04983 0.04865 0.59235
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1870788 0.0008516 219.7 <2e-16 ***
## prediction_load 4.0183863 0.0002627 15295.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1251 on 60398 degrees of freedom
## Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997
## F-statistic: 2.34e+08 on 1 and 60398 DF, p-value: < 2.2e-16
\[ \operatorname{prediction\_processing\_time} = 0.19 + 4.02(\operatorname{prediction\_load}) + \epsilon \]
It seems like we could simplify this model to transform it into a “rule of thumb” by:
\[ max\_processing\_time \simeq 4 \times load \]
Let us now use this model to predict the max processing time only from the values of the load:
We can see that the prediction does match quite nicely the max processing time.
We have seen that we are able to predict the load of the fileserver knowing only the number of jobs in the submission and the size of the file to write.
In the case of CiGri we do not know the size of the file at the time of the campaign submission.
However, we can measure it during the first executions.
As we are using a NFS server, we have access to some data of the NFS client.
In particular there is a small sensor hidden that counts the number of Mb of data written from the NFS client since the last reboot:
quentin $ cat /proc/net/rpc/nfs
net 0 0 0 0
rpc 1424 0 1424
proc3 22 0 619 5 102 77 10 11 2 1 0 0 0 0 0 1 0 0 55 6 16 8 0
proc4 60 0 0 342 21 21 0 0 0 21 0 3 0 0 0 0 0 0 22 34 3 1 1 0 0 0 0 2 0 0 2 5 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
We can then extract this counter before and after the execution of the CiGri jobs and measure the size of the file written and the global execution of the job as well:
count_Mb_before="$(cat /proc/net/rpc/nfs | grep proc4 | cut -f 5 -d ' ')"
start_time="$(date +%s.%N)"
# execute CiGri job
end_time="$(date +%s.%N)"
count_Mb_after="$(cat /proc/net/rpc/nfs | grep proc4 | cut -f 5 -d ' ')"Our relation will then look something like:
\[ load(t_{n + 1}) = \alpha \times load(t_n) + \beta \times sub\_size(t_{n - d}) \]
where \(d\) is the delay, i.e. the execution time of the job in CiGri cycles.
One issue to investigate is that the load is defined recursively as follows:
\[ load(t_{n + 1}) = load(t_n) e ^{-T} + N_n \left(1 - e^{-T}\right) \]
At first sight, this recursion in the defintion might pose a problem for identifying the coefficients \(\alpha\) and \(\beta\) and thus design the controller.
It seems fair to assume that the NFS works in the following way:
it takes a request from the queue
executes it
and takes some time to prepare the next request
The time to execute the request is (slightly) variable based on the size of the file to write.
But the idle time between request should be somewhat constant or maybe depends on the file size as well ?
model_idle_nfs_time <- lm(data = df_nfs, mean_idle_time ~ file_size)
summary(model_idle_nfs_time)##
## Call:
## lm(formula = mean_idle_time ~ file_size, data = df_nfs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.011895 -0.008571 -0.005117 0.003454 0.022129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0132849 0.0059030 2.251 0.0372 *
## file_size 0.0613062 0.0001255 488.508 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01389 on 18 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
## F-statistic: 2.386e+05 on 1 and 18 DF, p-value: < 2.2e-16
It seems like this idle time is a function of the file size…
Investigation to be continued …
We have seen that we are able in the case of CiGri to:
predict the maximum processing time for a request knowing the size of the file and the size of the submission it is part of
predict the load of the fileserver knowing the size of the file and the size of the submission it is part of
predict the maximum processing time for a request knowing the load of the fileserver
We also give a rule a thumb for the relation between the load of the fileserver and the maximum processing time.