library(ggplot2)
library(dplyr)
We’re comparing performance of the apps.mellanox.connectx driver between two
AMD EPYC 7443P, Mellanox Technologies MT28800 Family [ConnectX-5 Ex] (PCIeGen4, Speed 16GT/s, Width x16)
Intel Xeon Silver 4116 @ 2.10GHz, Mellanox Technologies MT27800 Family [ConnectX-5] (PCIeGen3, Speed 8GT/s, Width x16)
On the AMD EPYC 7443P, where the NIC is on IO bus 81, we also compare two BIOS settings:
Preferred IO Bus->81
Preferred IO->Auto
See [https://www.supermicro.com/support/faqs/faq.cfm?faq=33731]
Preferred IO Device
Advanced->NB Configuration->Preferred IO->Manual
Advanced->NB Configuration->Preferred IO Bus->81
The configuration is one 2x100G port NIC, with both ports wired to each other. Packetblaster will transmit packets one one port in both test cases. In the Receive performance test case packets are received in the other port.
The code under test can be found here: eugeneia/mellanox-benchmark
We run benchmarks on a matrix of parameters including
- packet size (including 4-byte CRC)
- number of workers (cpu cores) provided to the application
- number of hardware send/receive queues per worker
The benchmark is run three times for every parameter configuration and we show the minimum, maximum, and average of the results.
We also overlay 100G linerate (grey dashed line) to put the results into perspective.
Gbps <- function(mpps,pktsize) {
mpps*(12+8+pktsize)*8/1000
}
Linerate <- function(G, pktsize) {
G*1e9 / ((12+8+pktsize)*8)
}
Receive performance
Packetblaster transmits on one port, and packets are received on the other port. Each side has a dedicated CPU core for each worker. I.e., “6 workers” means six cores used for transmit, and six distinct cores are used for receive.
txrx <- (mellanox.tx.rx.queues.sizes.intel.100e6.coarse %>%
mutate(system="Intel Xeon Silver 4116 @ 2.10GHz")) %>%
bind_rows((mellanox.tx.rx.queues.sizes.epyc.100e6.coarse %>%
mutate(system="AMD EPYC 7443P") %>% na.omit())) %>%
bind_rows((mellanox.tx.rx.queues.sizes.epyc.100e6.coarse.bios2 %>%
mutate(system="AMD EPYC 7443P (Pref. IO bus 81)") %>% na.omit())) %>%
mutate(workers=sprintf("%d workers (cores)", workers),
queues=sprintf("%d queues", queues)) %>%
mutate(rx_mpps=rxrate-(rxdrop+rxerror)) %>%
group_by(system, workers, queues, pktsize) %>%
summarise(min_mpps=min(txrate),
avg_mpps=mean(txrate),
max_mpps=max(txrate),
min_rx_mpps=(min(rx_mpps)),
avg_rx_mpps=(mean(rx_mpps)),
max_rx_mpps=(max(rx_mpps))) %>%
ungroup() %>%
mutate(rxGbps=Gbps(max_rx_mpps, pktsize), Gbps=Gbps(max_mpps, pktsize))
`summarise()` has grouped output by 'system', 'workers', 'queues'. You can override using the `.groups` argument.
ggplot(txrx, aes(x=pktsize, color=queues)) +
facet_grid(system ~ workers) +
geom_line(aes(y=max_rx_mpps, linetype="0_rx")) +
geom_line(aes(y=max_mpps, linetype="1_tx")) +
geom_point(aes(y=avg_rx_mpps, shape="avg"), alpha=0.5) +
geom_point(aes(y=max_rx_mpps, shape="max"), alpha=0.5) +
geom_point(aes(y=min_rx_mpps, shape="min"), alpha=0.5) +
geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
coord_cartesian(ylim=c(NA, max(txrx$max_mpps))) +
scale_x_continuous(breaks=c(64,256,512,512+256,1024)) +
scale_y_continuous(n.breaks = 10) +
ggtitle("Multi core performance by number of queues per worker and packet size",
subtitle="apps.mellanox.connectx: RX rate of combined receive queues in MPPS")

ggplot(txrx, aes(x=pktsize, color=queues)) +
facet_grid(system ~ workers) +
geom_line(aes(y=rxGbps, linetype="0_rx")) +
geom_line(aes(y=Gbps, linetype="1_tx")) +
geom_point(aes(y=rxGbps, shape="rx"), alpha=0.5) +
geom_point(aes(y=Gbps, shape="tx"), alpha=0.5) +
geom_line(aes(y=100, linetype="2_linerate"), color='grey') +
coord_cartesian(ylim=c(NA, max(txrx$rxGbps))) +
scale_x_continuous(breaks=c(64,256,512,512+256,1024)) +
scale_y_continuous(n.breaks = 10) +
ggtitle("Multi core performance by number of queues per worker and packet size",
subtitle="apps.mellanox.connectx: RX throughput of combined receive queues in Gbps")

Forwarding performance
Single node
Packetblaster transmits on one port, packets are received on the other port and forwarded back to the Packetblaster port (with src/dst MAC addresses swapped). Each side has a dedicated CPU core for each worker. I.e., “6 workers” means six cores used for transmit, and six distinct cores are used for receive/forward.
txfwd <- (mellanox.tx.fwd.queues.sizes.intel.100e6 %>%
mutate(system="Intel Xeon Silver 4116 @ 2.10GHz")) %>%
bind_rows((mellanox.tx.fwd.queues.sizes.epyc.100e6.coarse.bios2 %>%
mutate(system="AMD EPYC 7443P (Pref. IO bus 81)") %>% na.omit())) %>%
mutate(workers=sprintf("%d workers (cores)", workers),
queues=sprintf("%d queues", queues)) %>%
mutate(fwd_mpps=fwrate-(fwdrop+fwerror)) %>%
group_by(system, workers, queues, pktsize) %>%
summarise(min_mpps=min(txrate),
avg_mpps=mean(txrate),
max_mpps=max(txrate),
min_fwd_mpps=(min(fwd_mpps)),
avg_fwd_mpps=(mean(fwd_mpps)),
max_fwd_mpps=(max(fwd_mpps))) %>%
ungroup() %>%
mutate(fwdGbps=Gbps(max_fwd_mpps, pktsize), Gbps=Gbps(max_mpps, pktsize))
`summarise()` has grouped output by 'system', 'workers', 'queues'. You can override using the `.groups` argument.
ggplot(txfwd, aes(x=pktsize, color=queues)) +
facet_grid(system ~ workers) +
geom_line(aes(y=max_fwd_mpps, linetype="0_fwd")) +
geom_line(aes(y=max_mpps, linetype="1_tx")) +
geom_point(aes(y=avg_fwd_mpps, shape="avg"), alpha=0.5) +
geom_point(aes(y=max_fwd_mpps, shape="max"), alpha=0.5) +
geom_point(aes(y=min_fwd_mpps, shape="min"), alpha=0.5) +
geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
coord_cartesian(ylim=c(NA, max(txfwd$max_fwd_mpps))) +
scale_x_continuous(breaks=c(64,256,512,512+256,1024)) +
scale_y_continuous(n.breaks = 10) +
ggtitle("Multi core performance by number of queues per worker and packet size",
subtitle="apps.mellanox.connectx: Forwarding rate of combined receive queues in MPPS")

ggplot(txfwd, aes(x=pktsize, color=queues)) +
facet_grid(system ~ workers) +
geom_line(aes(y=fwdGbps, linetype="0_fwd")) +
geom_line(aes(y=Gbps, linetype="1_tx")) +
geom_point(aes(y=fwdGbps, shape="fwd"), alpha=0.5) +
geom_point(aes(y=Gbps, shape="tx"), alpha=0.5) +
geom_line(aes(y=100, linetype="2_linerate"), color='grey') +
scale_x_continuous(breaks=c(64,256,512,512+256,1024)) +
scale_y_continuous(n.breaks = 10) +
ggtitle("Multi core performance by number of queues per worker and packet size",
subtitle="apps.mellanox.connectx: Forwarding throughput of combined receive queues in Gbps")

Forwarding between systems
Here we test forwarding performance between our two systems. Each system uses one port of a 2x100G Connect-X card.
Note that test traffic is generated by the system:
Intel Xeon Silver 4116 @ 2.10GHz, Mellanox Technologies MT27800 Family [ConnectX-5] (PCIeGen3, Speed 8GT/s, Width x16)
As such it can not exceed the TX rate measured in “Packet sizes”, which is overlayed as a dashed grey line (“txlimit”) in the plots below.
The traffic generator uses one worker/core with 16 transmit queues.
The Epyc system receives the generated test traffic and forwards it back to the load generator over the same port using N workers/cores with one queue pair each.
The test traffic is split across two pairs of MACs and two vlans.
fwd_b2b_macvlan <- tx.fwd.b2b.coarse.nofc.macvlan.fine.n3 %>%
mutate(workers=as.factor(workers)) %>%
mutate(queues=as.factor(queues)) %>%
group_by(pktsize, workers, queues) %>%
summarise(fwrate=max(fwrate)) %>% ungroup() %>%
left_join(filter(packetblaster_sizes, system=="Intel Xeon Silver 4116 (pcie gen3)"),
by=c("pktsize" = "pktsize")) %>%
mutate(Gbps=Gbps(fwrate, pktsize), MaxGbps=Gbps(maxrate, pktsize))
`summarise()` has grouped output by 'pktsize', 'workers'. You can override using the `.groups` argument.
ggplot(fwd_b2b_macvlan, aes(x=pktsize, color=workers)) +
geom_line(aes(y=fwrate)) +
geom_point(aes(y=fwrate)) +
geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="linerate"), color='grey') +
geom_line(aes(y=maxrate, linetype="txlimit"), color='grey') +
coord_cartesian(ylim=c(NA, max(fwd_b2b_macvlan$fwrate))) +
ggtitle("Forwarding performance between two servers",
subtitle="two macs, two vlans, rate in Mpps")
Warning: Removed 28 row(s) containing missing values (geom_path).

ggplot(fwd_b2b_macvlan, aes(x=pktsize, color=workers)) +
geom_line(aes(y=Gbps)) +
geom_point(aes(y=Gbps)) +
geom_line(aes(y=MaxGbps, linetype="txlimit"), color='grey') +
geom_line(aes(y=100, linetype="linerate"), color='grey') +
coord_cartesian(ylim=c(NA, max(fwd_b2b_macvlan$Gbps))) +
ggtitle("Forwarding performance between two servers",
subtitle="two macs, two vlans, rate in Gbps")
Warning: Removed 28 row(s) containing missing values (geom_path).

Help
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
