This document describes how to access the NCC gpu cluster and how to compile LLVM to achieve OpenMP - and SYCL - offloading with the Nvidia (CUDA) devices present.
To get access, send a mail to Rob Powell. The cluster is only accessible from within the university network:
ssh CISUERNAME@ncc1.clients.dur.ac.uk
When working from home, I tend to first ssh into hamilton and then into ncc.
The login node allows to ssh into compute nodes:
Clearly, cpu6 is the machine to compile on.
The gpu nodes are:
The compute capabilities are important when offloading with openmp.
Here I describe how to compile llvm to run sycl or openmp offload applications with support for CUDA devices.
We will need GCC>=8.2, ninja, cmake, CUDA
Instead of the nominal llvm repo, we need to pull from intel’s llvm repo like so:
git clone https://github.com/intel/llvm -b sycl --single-branch --depth 1
This branch comes with a helper script (llvm/buildbot/configure.py) that makes life quite a bit easier:
CUDA_LIB_PATH=/apps/cuda/cuda-11.4/lib64/stubs CC=gcc-10 CXX=g++-10 python llvm/buildbot/configure.py --cuda --cmake-opt="-DCUDA_TOOLKIT_ROOT_DIR=/apps/cuda/cuda-11.4"
Note that CC= and CXX= select specific GNU compilers available on the system. All the other bits are needed to correctly find all CUDA related bits. This script will run to completion on, e.g., cpu6. To compile, we can use:
CUDA_LIB_PATH=/apps/cuda/cuda-11.4/lib64/stubs CC=gcc-10 CXX=g++-10 python llvm/buildbot/compile.py
This will compile llvm/clang with sycl+cuda. The libraries and binaries are found in llvm/build/lib and llvm/build/bin respectively.
Note, a subset is also copied into llvm/build/install but that is not the right path to use when compiling openmp.
To make use of the new compiler, the following may be used:
export PATH=BASEDIR/llvm/build/bin:${PATH}
export LD_LIBRARY_PATH=BASEDIR/llvm/build/lib:${LD_LIBRARY_PATH}
where BASEDIR is the directory where the git clone command was issued from.
To check that things are ok, try e.g.
ssh gpu7
export PATH=BASEDIR/llvm/build/bin:${PATH}
export LD_LIBRARY_PATH=BASEDIR/llvm/build/lib:${LD_LIBRARY_PATH}
sycl-ls
The output should look similar to this:
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA TITAN RTX 0.0 [CUDA 11.6]
[ext_oneapi_cuda:gpu:1] NVIDIA CUDA BACKEND, NVIDIA TITAN RTX 0.0 [CUDA 11.6]
[ext_oneapi_cuda:gpu:2] NVIDIA CUDA BACKEND, NVIDIA TITAN RTX 0.0 [CUDA 11.6]
[ext_oneapi_cuda:gpu:3] NVIDIA CUDA BACKEND, NVIDIA TITAN RTX 0.0 [CUDA 11.6]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
For this to work, it is essential to have llvm-lit to be present in PATH. The steps to build openmp with cuda offloading are:
ssh cpu6
export PATH=BASEDIR/llvm/build/bin:${PATH}
export LD_LIBRARY_PATH=BASEDIR/llvm/build/lib:${LD_LIBRARY_PATH}
cd llvm
mkdir buildopenmp
cd buildopenmp
CUDA_LIB_PATH=/apps/cuda/cuda-11.4/lib64/stubs cmake -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES="61;70;75;80;86" -DCUDA_TOOLKIT_ROOT_DIR=/apps/cuda/cuda-11.4 -DLIBOMPTARGET_ENABLE_DEBUG=YES -DCMAKE_C_COMPILER=BASEDIR/llvm/build/install/bin/clang -DCMAKE_CXX_COMPILER=BASEDIR/llvm/build/install/bin/clang++ -DCMAKE_INSTALL_PREFIX=$PWD/local -GNinja ../openmp
ninja
ninja install
The crucial option is the list of compute capabilities.
Note: there used to be a cmake flag to set the default capability (-DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_61) but that is no longer available.
The default is sm_35 which will not work with any of the NCC devices. But that’s ok, explicitly specifying the desired cpability is not hard and probably less error prone.
You can check that all the specified architectures are supported by
ls libomptarget/ |grep nvptx
libomptarget-nvptx-sm_61.bc
libomptarget-nvptx-sm_70.bc
libomptarget-nvptx-sm_75.bc
libomptarget-nvptx-sm_80.bc
libomptarget-nvptx-sm_86.bc
To make use of openmp, add local/lib to the LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=BASEDIR/llvm/buildopenmp/local/lib:${LD_LIBRARY_PATH}
clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target -march=sm_75
--cuda-path=/apps/cuda/cuda-11.4
--libomptarget-nvptx-bc-path=BASEDIR/llvm/buildopenmp/local/lib
-IBASEDIR/llvm/buildopenmp/local/include
-LBASEDIR/llvm/buildopenmp/local/lib -lomptarget
test_omp.cxx
In short: only works with named kernels for some reason, see https://github.com/intel/llvm/issues/5804
Solution looks like this:
clang++ -std=c++17 test.cxx -fsycl -fsycl-targets=nvptx64-cuda -fopenmp -L/home2/gcgt96/sycl_workspace/buildopenmp/lib -lomp -lomptarget -fopenmp-targets=nvptx64-cuda -Xopenmp-target -march=sm_75 --libomptarget-nvptx-bc-path=/home2/gcgt96/sycl_workspace/buildopenmp/lib -I/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl -I/home2/gcgt96/sycl_workspace/syclcudagcc/include
Here is a test program
#include <iostream>
#include <CL/sycl.hpp>
using namespace sycl;
static queue Q(default_selector{});
int main()
{
std::cout << " Using SYCL device: " << Q.get_device().get_info<sycl::info::device::name>() << std::endl;
int *data = malloc_shared<int>(3, Q);
data[0] = 1;
data[1] = 2;
data[2] = 0;
Q.submit([&](handler &cgh)
{
#ifdef NONAME
cgh.single_task( [=](){data[2] = data[0] + data[1];} );
#else
cgh.single_task<class simple_sum>( [=](){data[2] = data[0] + data[1];} );
#endif
}).wait();
std::cout << "Result: " << data[2] << "\n";
return 0;
}