We need passwordless ssh set up between hosts and devices.
Host IPs are odd, 192.168.101.1, 192.168.101.3 and so on. The smartNICs are even 192.168.101.2, 192.168.101.4 and so forth.
With the default modules / installations of OpenMPI don’t work for us, presumably because the versions are not exact.
We use spack. We had problems using the same spack in a shared filesystem, so we checkout spack separately on the host and the device
git clone https://github.com/spack/spack.git spackhost
. spackhost/share/spack/setup-env.sh
spack install openmpi@4.1.1
We need to note the install directory of openmpi at the very end of the spack installation process. Let’s say it is MPIPATHHOST Example would be
/cosma5/data/durham/hschulz/spack/opt/spack/linux-centos7-zen2/gcc-9.3.0/openmpi-4.1.1-r2o654xbr5nhygk4dposzq4hi6fhztil
git clone https://github.com/spack/spack.git spackdevice
. spackdevice/share/spack/setup-env.sh
spack install openmpi@4.1.1
We need to note the install directory of openmpi at the very end of the spack installation process. Let’s say it is MPIPATHDEVICE
Example would be
/cosma/home/durham/hschulz/smartnic/spack/opt/spack/linux-centos7-graviton/gcc-4.8.5/openmpi-4.1.1-rielorb4zmxbme2r4yvrijvysfhws4ug
We need to explicitly set PATH and LD_LIBRARY_PATH to the respective directories on host and device. This works:
case "$HOSTNAME" in
# DEVICE
"bluefield101.pri.cosma7.alces.network" | "bluefield102.pri.cosma7.alces.network")
export PATH=MPIPATHDEVICE/bin:$PATH
export LD_LIBRARY_PATH=MPIPATHDEVICE/lib:$LD_LIBRARY_PATH
;;
# HOST
"b101.pri.cosma7.alces.network" | "b102.pri.cosma7.alces.network")
export PATH=MPIPATHHOST/bin:$PATH
export LD_LIBRARY_PATH=MPIPATHHOST/lib:$LD_LIBRARY_PATH
;;
esac
Any hint how to do the hostname in bit elegantly is highly welcome. For the time being we will have to live with an extension to the above using 16 hostnames for device and host.
We use this MPI helloworld
mpic++ mpi_hello_world.c -IMPIPATHHOST/include -o hwhost
mpic++ mpi_hello_world.c -o hwdevice
cat appfile
-host 192.168.101.1:40,192.168.101.3:40,192.168.101.5:40 -np 80 ./hwhost
-host 192.168.101.2:16,192.168.101.4:16,192.168.101.6:16 -np 32 ./hwdev
mpirun --app appfile
mlxconfig -d /dev/mst/mt41686_pciconf0 q |grep CPU
need to sudo enable mst!
embedded mode: it sees the traffic from the host, meant for security
need this from Alastair:
sudo mst start
mlxconfig -d /dev/mst/mt41686_pciconf0 q |grep CPU
expected output
[root@thor031 ~]# mlxconfig -d /dev/mst/mt41686_pciconf0 q |grep CPU INTERNAL_CPU_MODEL EMBEDDED_CPU(1)