Overview

This document presents a report on the test of the BBCP utility as an option for fast file transfer on A*CRC HPC systems. In particular, we compare the performance of BBCP with SCP and tune the parameters used for file transfer.

Basically, BBCP is a fast, free, and easy node-to-node method to move large data. It’s capable of breaking up the transfer into multiple simultaneous streams.

More information on BBCP can be found on the BBCP homepage.

BBCP on Fuji Figure 1: Running BBCP across two compute nodes of Fuji (Fuji380 and Fuji382)

BBCP vs SCP

Here we compare the performance of BBCP vs SCP. Node-to-node tests were conducted for different settings (Figure 2):

1GE w FW AF : 1Gb Ethernet with firewall (Aurora - Fuji)
1GE w FW AA : 1Gb Ethernet with firewall (Aurora - Axle)
1GE w/o FW : 1Gb Ethernet without firewall (Fuji internal)
10GigE (IB QDR): 10Gb Ethernet without firewall (Fuji internal)

Note that in (1), (2), and (3) BBCP and SCP were executed via TCP and in (4) it was run via IPoIB.

In all settings, TCP window size of 8MB and 2 parallel streams were used to transfer a dummy file of 2GB in size.

BBCPvsSCP Plot
Figure 2: Comparison of the performance of BBCP against SCP for four different settings.

Performance without firewall is higher for both BBCP and SCP, as expected. However, it seems that BBCP is better at utilizing the link rate in the presence of firewall. Moreover, there are more hops within [Aurora - Axle] than [Aurora - Fuji], thus the better performance for the latter (see the traceroute in Appendix).

Most importantly, when used on IB QDR, BBCP performed the transfer much faster than SCP at a rate of 1300 * 8 = 10400 Mbps (or ~10Gbps). This rate is purely the transfer rate and not the effective rate (hence overhead due to disk I/O is excluded). As we shall see later, BBCP can be made faster by tuning its parameters. Note that IB QDR has link rate of 4 * 10Gbps = 40 Gbps.

Parameter Tuning

Two important parameters to tune are

TCP window size.
number of parallel streams.

We tested five different TCP window sizes (2MB and its integer multiples up to 32MB). For each window size, we also tested 5 possible number of streams from 2 up to 32 (figure 2). Each measurement is the average of ten trials to neutralize external covariates (e.g. network intermittence). The full results can be found in figure 3 below. The best performance was attained with 4 streams and 4MB window size.

Parameter Tuning Figure 3: Transfer Rate of BBCP for various TCP Window sizes and number of parallel streams.

With 4 streams and 4MB window, the transfer rate shot to ~2GB/s (16Gb/s), about 1.5x faster than using 2 streams. This suggests that BBCP utilized the full link rate of IB QDR (40Gb/s) rather than 10Gb/s even though it was run via IPoIB.

Conclusion

BBCP can perform much faster than SCP for transferring large files. In our tests, transfer rate of ~16Gb/s was achieved with 4 parallel streams (4MB tcp window size) run across two internal compute nodes without firewall.

Appendix

#Traceroute from Aurora to Fuji
kevins@aurora:/dev/shm> /usr/sbin/traceroute fuji
traceroute to fuji (202.83.248.75), 30 hops max, 40 byte packets using UDP
 1  * fuji.acrc.a-star.edu.sg (202.83.248.75)(H!)  5.210 ms (H!)  6.331 ms
 

#Traceroute from Aurora to Axle
kevins@aurora:/dev/shm> /usr/sbin/traceroute axle
traceroute to axle (123.136.66.111), 30 hops max, 40 byte packets using UDP
 1  202.83.248.1 (202.83.248.1)  0.659 ms   0.720 ms   0.670 ms
 2  10.217.175.227 (10.217.175.227)  0.896 ms   0.820 ms   0.835 ms
 3  10.217.175.242 (10.217.175.242)  1.950 ms   2.325 ms   1.529 ms
 4  10.217.175.210 (10.217.175.210)  3.819 ms   3.381 ms   2.469 ms
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *


#BBCP from Aurora to Axle
for i in `seq 1 10`; do /home01/acrc/kevins/scratch/bbcp/bin/amd64_linux/bbcp -z -V -w 8M -s 2 -T 'ssh -x -a -oFallBackToRsh=no %I -l %U %H /home01/acrc/kevins/scratch/bbcp' banana.caramel axle:/dev/shm &> prog$i.txt; ssh axle 'rm /dev/shm/banana.caramel'; done

#SCP from Aurora to Axle
for i in `seq 1 10`; do scp banana.caramel axle:/dev/shm; done

Fast Data Transfer with BBCP on A*CRC HPC Systems

Kevin Siswandi

Hiew Ngee Heng, Paul

20 July 2015

Overview

BBCP vs SCP

Parameter Tuning

Conclusion

Appendix