Intro

We need passwordless ssh set up between hosts and devices.

Host IPs are odd, 192.168.101.1, 192.168.101.3 and so on. The smartNICs are even 192.168.101.2, 192.168.101.4 and so forth.

With the default modules / installations of OpenMPI don’t work for us, presumably because the versions are not exact.

Installing OpenMPI

We use spack. We had problems using the same spack in a shared filesystem, so we checkout spack separately on the host and the device

Host

git clone https://github.com/spack/spack.git spackhost
. spackhost/share/spack/setup-env.sh
spack install openmpi@4.1.1

We need to note the install directory of openmpi at the very end of the spack installation process. Let’s say it is MPIPATHHOST Example would be

/cosma5/data/durham/hschulz/spack/opt/spack/linux-centos7-zen2/gcc-9.3.0/openmpi-4.1.1-r2o654xbr5nhygk4dposzq4hi6fhztil

SmartNIC

git clone https://github.com/spack/spack.git spackdevice
. spackdevice/share/spack/setup-env.sh
spack install openmpi@4.1.1

We need to note the install directory of openmpi at the very end of the spack installation process. Let’s say it is MPIPATHDEVICE

Example would be

/cosma/home/durham/hschulz/smartnic/spack/opt/spack/linux-centos7-graviton/gcc-4.8.5/openmpi-4.1.1-rielorb4zmxbme2r4yvrijvysfhws4ug

Setup OpenMPI in bashrc

We need to explicitly set PATH and LD_LIBRARY_PATH to the respective directories on host and device. This works:

case "$HOSTNAME" in
    # DEVICE
    "bluefield101.pri.cosma7.alces.network" | "bluefield102.pri.cosma7.alces.network")
        export PATH=MPIPATHDEVICE/bin:$PATH
        export LD_LIBRARY_PATH=MPIPATHDEVICE/lib:$LD_LIBRARY_PATH
        ;;
    # HOST
    "b101.pri.cosma7.alces.network" | "b102.pri.cosma7.alces.network")
        export PATH=MPIPATHHOST/bin:$PATH
        export LD_LIBRARY_PATH=MPIPATHHOST/lib:$LD_LIBRARY_PATH
        ;;
esac

Any hint how to do the hostname in bit elegantly is highly welcome. For the time being we will have to live with an extension to the above using 16 hostnames for device and host.

Test application

We use this MPI helloworld

Host

 mpic++ mpi_hello_world.c -IMPIPATHHOST/include -o hwhost

Device

 mpic++ mpi_hello_world.c -o hwdevice

Run with appfile

cat appfile
-host 192.168.101.1:40,192.168.101.3:40,192.168.101.5:40 -np 80 ./hwhost 
-host 192.168.101.2:16,192.168.101.4:16,192.168.101.6:16 -np 32 ./hwdev

mpirun --app appfile

Embedded or the other mode

mlxconfig -d /dev/mst/mt41686_pciconf0 q |grep CPU

need to sudo enable mst!

embedded mode: it sees the traffic from the host, meant for security

need this from Alastair:

sudo mst start 
mlxconfig -d /dev/mst/mt41686_pciconf0 q |grep CPU 

expected output

[root@thor031 ~]# mlxconfig -d /dev/mst/mt41686_pciconf0 q |grep CPU INTERNAL_CPU_MODEL EMBEDDED_CPU(1)