Introduction

The purpose of this test report is to investigate the transmit performance of ConnectX-4 when using a single Send Queue. The results can be used to guide software design, for example to evaluate whether a single Send Queue is suitable for a given application.

These results have not yet been reviewed by other parties. They may or may not be consistent with expectations.

Test setup

Fixed factors

  • Snabb “packetblaster” software optimized to prevent CPU bottlenecks (prototype with ConnectX-4 support).
  • Mellanox ConnectX-4 100G single-port ethernet card (PSID: MT_2180110032).
  • Firmware version 12.16.1020.
  • 1 x Send Queue.
  • Send Work Queue Entries (WQEs) always 64-bytes:
    • 16B Control Segment.
    • 32B Ethernet Segment (16 payload bytes inline).
    • 16B Send Data Segment (remaining payload bytes on DMA gather).
  • Physical addresses (“rlkey”“) used for all Send WQEs.
  • Completion event requested once every 512 packets.
  • Single entry (“collapsed”) completion queue.
  • Packets are transmitted continuously for approximately one second.

Benchmark result is taken from hardware (“vport”) counter for sent packets.

Variable factors

  • QueueSize (number of Work Queue Entries for the Send Queue).
  • PacketSize (number of bytes of payload per packet).

Results

Packet rate results with Y-scale chosen to “zoom in” on the data:

Now with the Y-scale chosen to show the theoretical limits of 0 - 148.8 Mpps for 100G ethernet.

Gigabytes per second of data transfer. The number is based on the amount of data in the Send Work Queue Entries. This includes the ethernet header and packet payload but excludes CRC and other Layer-2/Layer-1 overhead.

Summary

Overall these single-queue results are far below the maximum capacity of 100G Ethernet. We need to evaluate other benchmark setups such as using multiple Send Queues.