We’re comparing performance of the apps.mellanox.connectx driver between two systems:

BIOS settings unknown are at this point.

The configuration is one 2x100G port NIC, with both ports wired to each other. Packetblaster will transmit packets one one port in both test cases. In the Receive performance test case packets are received in the other port.

The code under test can be found here: eugeneia/mellanox-benchmark

We run benchmarks on a matrix of parameters including

The benchmark is run three times for every parameter configuration and we show the minimum, maximum, and average of the results.

We also overlay 100G linerate (grey dashed line) to put the results into perspective.

Gbps <- function(mpps,pktsize) {
  mpps*(12+8+pktsize)*8/1000
}

Linerate <- function(G, pktsize) {
  G*1e9 / ((12+8+pktsize)*8)
}

Packetblaster performance

Packetblaster is a optimized TX routine for our Connect-X driver. It should demonstrate the maximum transmit rate supported by the NIC.

This should reproduce the results in ConnectX: Review N*SQ 64B transmit performance mellanox (Rev 2) from 2016. It quite doesn’t though, and is not the same code. Have to investigate what’s the difference.

packetblaster <- (mellanox.tx.only.queues.sizes.intel.100e6.coarse %>%
           mutate(system="Intel Xeon Silver 4116 @ 2.10GHz")) %>%
  bind_rows((mellanox.tx.only.queues.sizes.epyc.100e6.coarse %>%
               mutate(system="AMD EPYC 7443P"))) %>%
  mutate(workers=sprintf("%d workers (cores)", workers),
         queues=sprintf("%d queues", queues)) %>%
  group_by(system, workers, queues, pktsize) %>% 
  summarise(min_mpps=min(rate), avg_mpps=mean(rate), max_mpps=max(rate),
            min_loss=(min(drop+error)), min_loss=(mean(drop+error)), max_loss=(max(drop+error))) %>%
  ungroup() %>%
  mutate(Gbps=Gbps(max_mpps-max_loss, pktsize))
`summarise()` has grouped output by 'system', 'workers', 'queues'. You can override using the `.groups` argument.
ggplot(packetblaster, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) + 
  geom_line(aes(y=max_mpps, linetype="0_tx")) +
  geom_line(aes(y=max_loss, linetype="1_loss")) + 
  geom_point(aes(y=avg_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(packetblaster$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Packetblaster rate MPPS (TX only)")

Receive performance

Packetblaster transmits on one port, and packets are received on the other port. Each side has a dedicated CPU core for each worker. I.e., “6 workers” means six cores used for transmit, and six distinct cores are used for receive.

txrx <- (mellanox.tx.rx.queues.sizes.intel.100e6.coarse %>%
          mutate(system="Intel Xeon Silver 4116 @ 2.10GHz")) %>%
  bind_rows((mellanox.tx.rx.queues.sizes.epyc.100e6.coarse %>%
               mutate(system="AMD EPYC 7443P") %>% na.omit())) %>%
  mutate(workers=sprintf("%d workers (cores)", workers),
         queues=sprintf("%d queues", queues)) %>%
  mutate(rx_mpps=rxrate-(rxdrop+rxerror)) %>%
  group_by(system, workers, queues, pktsize) %>% 
  summarise(min_mpps=min(txrate),
            avg_mpps=mean(txrate),
            max_mpps=max(txrate),
            min_rx_mpps=(min(rx_mpps)),
            avg_rx_mpps=(mean(rx_mpps)),
            max_rx_mpps=(max(rx_mpps))) %>%
  ungroup() %>%
  mutate(rxGbps=Gbps(max_rx_mpps, pktsize), Gbps=Gbps(max_mpps, pktsize))
`summarise()` has grouped output by 'system', 'workers', 'queues'. You can override using the `.groups` argument.
ggplot(txrx, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_rx_mpps, linetype="0_rx")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: RX rate of combined receive queues in MPPS")

Zoom into results on EPYC

Packetblaster

packetblaster_epyc <- filter(packetblaster, system=="AMD EPYC 7443P")
ggplot(packetblaster_epyc, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) + 
  geom_line(aes(y=max_mpps, linetype="0_tx")) +
  geom_line(aes(y=max_loss, linetype="1_loss")) + 
  geom_point(aes(y=avg_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(packetblaster_epyc$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Packetblaster rate MPPS (TX only)")

Receive

txrx_epyc <- filter(txrx, system=="AMD EPYC 7443P")
ggplot(txrx_epyc, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_rx_mpps, linetype="0_rx")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx_epyc$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: RX rate of combined receive queues in MPPS")

RX Drops

ggplot(txrx, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_mpps-max_rx_mpps, linetype="0_drop")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_mpps-avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps-max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps-min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Loss rate of combined receive queues in MPPS")

Help

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

