Introduction

We have installed a Cleversafe storage system in Frederick. Cleversafe is an object storage system. It uses erasure coding to provide high availability and data security.

The current infrastructure includes four storage accessors, each of which is capable of accessing identical objects. Large files are transferred to the object store by splitting them up into pieces and transferring each piece independently. Once they all arrive at the Cleversafe accessor, they are put back together to form a single object for storage.

The experiment

The main question of interest in this experiment is:

What is the transfer speed of a single large file transfer?

Given the system at hand, there are a few variables that might impace transfer speeds.

  1. File size
  2. Transfer concurrency: how many parallel streams are used to transfer the file chunks. (used between 1 and 9)
  3. Transfer chunk size: the maximum size of each chunk. (used between 10 and 100MB)
  4. The accessor used: though unlikely to affect the results, this was captured in the logs.

What did I do?

I set up a transfer that included fastq files (gzipped) from the Meltzer lab. In all, I captured the results of 710 transfers coming from our local storage server (1Gb connection, not being used) in an overnight transfer window.

Analysis

Now, we can investigate the relationships between variables. All transfers are noted in the table below.

The distribution of file sizes is given in the next plot.

Effect of accessor

I used three of the four accessors (coding error, so I ended up not using the fourth). The next plot shows that the three accessors seem to have similar transfer performances.

Effect of file size

The plot below shows transfer speed as a function of file size (in MB). There is not a strong trend here in terms of transfer speeds and file size.

Effect of chunk size

The following plot shows the effect of chunk size on transfer speed. Again, there does not appear to be a strong trend.

Effect of concurrency

Recall that large files are broken into pieces and transferred. If concurrency is more than 1, these transfers occur in parallel. There are many factors that could affect the efficiency here, but we simply investigate how concurrency affects file transfer speeds in this test.

Conclusions

There appears to be one large determinant of file transfer speeds. The concurrency in this system is best set to 1. This may change once a load balancer is in place and in a system that has higher network connectivity, but for the simple setup of transferring a large file from a single location to one accessor, the conclusion is that the main setting to tinker with is the concurrency.