We’re comparing performance of the apps.mellanox.connectx driver between two systems:

BIOS settings unknown are at this point.

The configuration is one 2x100G port NIC, with both ports wired to each other. Packetblaster will transmit packets one one port in both test cases. In the Receive performance test case packets are received in the other port.

The code under test can be found here: eugeneia/mellanox-benchmark

We run benchmarks on a matrix of parameters including

The benchmark is run three times for every parameter configuration and we show the minimum, maximum, and average of the results.

We also overlay 100G linerate (grey dashed line) to put the results into perspective.

Gbps <- function(mpps,pktsize) {
  mpps*(12+8+pktsize)*8/1000
}

Linerate <- function(G, pktsize) {
  G*1e9 / ((12+8+pktsize)*8)
}

Packetblaster performance

Packetblaster is a optimized TX routine for our Connect-X driver. It should demonstrate the maximum transmit rate supported by the NIC.

This should reproduce the results in ConnectX: Review N*SQ 64B transmit performance mellanox (Rev 2) from 2016. It quite doesn’t though, and is not the same code. Have to investigate what’s the difference.

packetblaster <- (mellanox.tx.only.queues.sizes.intel.100e6.coarse %>%
           mutate(system="Intel Xeon Silver 4116 @ 2.10GHz")) %>%
  bind_rows((mellanox.tx.only.queues.sizes.epyc.100e6.coarse %>%
               mutate(system="AMD EPYC 7443P"))) %>%
  mutate(workers=sprintf("%d workers (cores)", workers),
         queues=sprintf("%d queues", queues)) %>%
  group_by(system, workers, queues, pktsize) %>% 
  summarise(min_mpps=min(rate), avg_mpps=mean(rate), max_mpps=max(rate),
            min_loss=(min(drop+error)), min_loss=(mean(drop+error)), max_loss=(max(drop+error))) %>%
  ungroup() %>%
  mutate(Gbps=Gbps(max_mpps-max_loss, pktsize))
`summarise()` has grouped output by 'system', 'workers', 'queues'. You can override using the `.groups` argument.
ggplot(packetblaster, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) + 
  geom_line(aes(y=max_mpps, linetype="0_tx")) +
  geom_line(aes(y=max_loss, linetype="1_loss")) + 
  geom_point(aes(y=avg_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(packetblaster$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Packetblaster rate MPPS (TX only)")

Receive performance

Packetblaster transmits on one port, and packets are received on the other port. Each side has a dedicated CPU core for each worker. I.e., “6 workers” means six cores used for transmit, and six distinct cores are used for receive.

txrx <- (mellanox.tx.rx.queues.sizes.intel.100e6.coarse %>%
          mutate(system="Intel Xeon Silver 4116 @ 2.10GHz")) %>%
  bind_rows((mellanox.tx.rx.queues.sizes.epyc.100e6.coarse %>%
               mutate(system="AMD EPYC 7443P") %>% na.omit())) %>%
  mutate(workers=sprintf("%d workers (cores)", workers),
         queues=sprintf("%d queues", queues)) %>%
  mutate(rx_mpps=rxrate-(rxdrop+rxerror)) %>%
  group_by(system, workers, queues, pktsize) %>% 
  summarise(min_mpps=min(txrate),
            avg_mpps=mean(txrate),
            max_mpps=max(txrate),
            min_rx_mpps=(min(rx_mpps)),
            avg_rx_mpps=(mean(rx_mpps)),
            max_rx_mpps=(max(rx_mpps))) %>%
  ungroup() %>%
  mutate(rxGbps=Gbps(max_rx_mpps, pktsize), Gbps=Gbps(max_mpps, pktsize))
`summarise()` has grouped output by 'system', 'workers', 'queues'. You can override using the `.groups` argument.
ggplot(txrx, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_rx_mpps, linetype="0_rx")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: RX rate of combined receive queues in MPPS")

Zoom into results on EPYC

Packetblaster

packetblaster_epyc <- filter(packetblaster, system=="AMD EPYC 7443P")
ggplot(packetblaster_epyc, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) + 
  geom_line(aes(y=max_mpps, linetype="0_tx")) +
  geom_line(aes(y=max_loss, linetype="1_loss")) + 
  geom_point(aes(y=avg_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(packetblaster_epyc$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Packetblaster rate MPPS (TX only)")

Receive

txrx_epyc <- filter(txrx, system=="AMD EPYC 7443P")
ggplot(txrx_epyc, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_rx_mpps, linetype="0_rx")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx_epyc$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: RX rate of combined receive queues in MPPS")

RX Drops

ggplot(txrx, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_mpps-max_rx_mpps, linetype="0_drop")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_mpps-avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps-max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps-min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Loss rate of combined receive queues in MPPS")

Help

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

---
title: "apps.mellanox.connectx: Packetblaster and Receive performance"
output: html_notebook
---

We’re comparing performance of the `apps.mellanox.connectx` driver between two
systems:

 - `AMD EPYC 7443P`, `Mellanox Technologies MT28800 Family [ConnectX-5 Ex]`
 - `Intel Xeon Silver 4116 @ 2.10GHz`, `Mellanox Technologies MT27800 Family [ConnectX-5]`

BIOS settings unknown are at this point.

The configuration is one 2x100G port NIC, with both ports wired to each other.
Packetblaster will transmit packets one one port in both test cases.
In the *Receive performance* test case packets are received in the other port.

The code under test can be found here:
[eugeneia/mellanox-benchmark](https://github.com/eugeneia/snabb/commits/mellanox-benchmark)

We run benchmarks on a matrix of parameters including

- packet size (including 4-byte CRC)
- number of workers (cpu cores) provided to the application
- number of hardware send/receive queues per worker

The benchmark is run three times for every parameter configuration
and we show the minimum, maximum, and average of the results.

We also overlay 100G linerate (grey dashed line) to put the results into
perspective.

```{r}
Gbps <- function(mpps,pktsize) {
  mpps*(12+8+pktsize)*8/1000
}

Linerate <- function(G, pktsize) {
  G*1e9 / ((12+8+pktsize)*8)
}
```

# Packetblaster performance

Packetblaster is a optimized TX routine for our Connect-X driver.
It should demonstrate the maximum transmit rate supported by the NIC.

This should reproduce the results in [ConnectX: Review N*SQ 64B transmit performance mellanox (Rev 2)](https://github.com/snabbco/snabb/issues/1007)
from 2016. It quite doesn’t though, and is not the same code. Have to investigate
what’s the difference.

```{r}
packetblaster <- (mellanox.tx.only.queues.sizes.intel.100e6.coarse %>%
           mutate(system="Intel Xeon Silver 4116 @ 2.10GHz")) %>%
  bind_rows((mellanox.tx.only.queues.sizes.epyc.100e6.coarse %>%
               mutate(system="AMD EPYC 7443P"))) %>%
  mutate(workers=sprintf("%d workers (cores)", workers),
         queues=sprintf("%d queues", queues)) %>%
  group_by(system, workers, queues, pktsize) %>% 
  summarise(min_mpps=min(rate), avg_mpps=mean(rate), max_mpps=max(rate),
            min_loss=(min(drop+error)), min_loss=(mean(drop+error)), max_loss=(max(drop+error))) %>%
  ungroup() %>%
  mutate(Gbps=Gbps(max_mpps-max_loss, pktsize))
```

```{r fig.height=5, fig.width=10}
ggplot(packetblaster, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) + 
  geom_line(aes(y=max_mpps, linetype="0_tx")) +
  geom_line(aes(y=max_loss, linetype="1_loss")) + 
  geom_point(aes(y=avg_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(packetblaster$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Packetblaster rate MPPS (TX only)")
```

# Receive performance

Packetblaster transmits on one port, and packets are received on the other port.
Each side has a dedicated CPU core for each worker.
I.e., "6 workers" means six cores used for transmit,
and six distinct cores are used for receive.

```{r}
txrx <- (mellanox.tx.rx.queues.sizes.intel.100e6.coarse %>%
          mutate(system="Intel Xeon Silver 4116 @ 2.10GHz")) %>%
  bind_rows((mellanox.tx.rx.queues.sizes.epyc.100e6.coarse %>%
               mutate(system="AMD EPYC 7443P") %>% na.omit())) %>%
  mutate(workers=sprintf("%d workers (cores)", workers),
         queues=sprintf("%d queues", queues)) %>%
  mutate(rx_mpps=rxrate-(rxdrop+rxerror)) %>%
  group_by(system, workers, queues, pktsize) %>% 
  summarise(min_mpps=min(txrate),
            avg_mpps=mean(txrate),
            max_mpps=max(txrate),
            min_rx_mpps=(min(rx_mpps)),
            avg_rx_mpps=(mean(rx_mpps)),
            max_rx_mpps=(max(rx_mpps))) %>%
  ungroup() %>%
  mutate(rxGbps=Gbps(max_rx_mpps, pktsize), Gbps=Gbps(max_mpps, pktsize))
```


```{r fig.height=5, fig.width=10}
ggplot(txrx, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_rx_mpps, linetype="0_rx")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: RX rate of combined receive queues in MPPS")
```


# Zoom into results on EPYC

## Packetblaster

```{r fig.height=5, fig.width=10}
packetblaster_epyc <- filter(packetblaster, system=="AMD EPYC 7443P")
ggplot(packetblaster_epyc, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) + 
  geom_line(aes(y=max_mpps, linetype="0_tx")) +
  geom_line(aes(y=max_loss, linetype="1_loss")) + 
  geom_point(aes(y=avg_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(packetblaster_epyc$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Packetblaster rate MPPS (TX only)")
```

## Receive

```{r fig.height=5, fig.width=10}
txrx_epyc <- filter(txrx, system=="AMD EPYC 7443P")
ggplot(txrx_epyc, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_rx_mpps, linetype="0_rx")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx_epyc$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: RX rate of combined receive queues in MPPS")

```


# RX Drops

```{r fig.height=5, fig.width=10}
ggplot(txrx, aes(x=pktsize, color=queues)) +
  facet_grid(system ~ workers) +
  geom_line(aes(y=max_mpps-max_rx_mpps, linetype="0_drop")) +
  geom_line(aes(y=max_mpps, linetype="1_tx")) + 
  geom_point(aes(y=avg_mpps-avg_rx_mpps, shape="avg"), alpha=0.5) +
  geom_point(aes(y=max_mpps-max_rx_mpps, shape="max"), alpha=0.5) +
  geom_point(aes(y=min_mpps-min_rx_mpps, shape="min"), alpha=0.5) +
  geom_line(aes(y=Linerate(100, pktsize)/1e6, linetype="2_linerate"), color='grey') +
  coord_cartesian(ylim=c(NA, max(txrx$max_mpps))) +
  ggtitle("Multi core performance by number of queues per worker and packet size",
          subtitle="apps.mellanox.connectx: Loss rate of combined receive queues in MPPS")
```

# Help

This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code. 

Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*.

Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
