29/10/2021

Outline

  • Hardware
  • Parallel computing with shared memory
  • OpenMP

Ask and upvote questions on slido.com.

Evolution of CPU clock speeds

  • A lot of heat is generated when switching transistors on and off in a small volume
  • Higher clock speed means higher voltage \(\to\) even more power consumption and heat generation
  • Transistors get faster the smaller their gates are
  • Since 2007: gates are the size of a silicon atom already

Designing faster chips is really difficult due to the laws of physics: thermodynamics but also quantum effects.

Parallelism

Solution to technical challenges by chip manufacturers: Hardware that allows for parallelism.

The point of parallel computing is to get work done faster.

Examples:

  • Branch prediction (out of order execution)
  • Vector units (SIMD)
  • Accelerators
  • Multiple CPU cores in one chip

Simplistic view

  • Many cores connected to memory.
  • All cores can read and write from/to memory.

More realistic view

  • Hierarchical memory structure (performance implications).
  • Main memory split in modules associated with sockets.
  • Still: all cores can read and write from/to memory.

Hamilton processor (a single socket)

Hamilton node

A processor contains multiple cores and is connected to (its own) memory.

A node consists of multiple processors.

Cores of processor A can access memory of processor B (and vice versa).

Hamilton system

Topology discovery

lscpu
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
Model name:            Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
CPU MHz:               1204.394
CPU max MHz:           3300.0000
CPU min MHz:           1200.0000
L2 cache:              256K
L3 cache:              35840K

How many physical cores does this machine have?

Go to poll at slido.com # 401 123

CPU(s):                160
Thread(s) per core:    4
Core(s) per socket:    20
Socket(s):             2
Model name:            POWER9, altivec supported
CPU max MHz:           3800.0000
CPU min MHz:           2300.0000
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              10240K

Shared memory programming

Model to program hardware where multiple cores are connected to memory.

Central concept: thread

A thread is a pure software concept (similar to a process) — a series of instructions executed consecutively.

A thread can share memory with other threads (processes cannot!).

A thread also has private memory (not shared with other threads).

All threads can access shared data.

Only the owning thread can access its private data.

Private and shared data are software concepts!

Parallel computation with threads

A program can have many threads.

Most common (for performance): one thread per physical core.

But generally, the operating system can deal with many threads per core.

All threads run the same executable but can take different paths!

In other words: each thread has its own notion of what to do next.

There is no concept of messaging between threads.

All data exchange happens by accessing the shared memory \(\to\) synchronisation is very important.

Single address space

  • All cores refer to the shared variable “a” via the same address.

Thread communication

Thread communication

Thread communication

Thread communication

Recap: sequential C/C++ program

#include <cstdio>                  // import some library

int main()                         // entry point for executable
{
  for (int i=0;i<100;i++)          // Iteration
  {
     if (i==10)                    // if statement
     {
         printf("Hello World\n");  // print to terminal
     }
  }
  return 0;                        // return code 0 means normal exit
}

A compiler is a software that translates source code to a binary.

g++ prog.cxx -o prog

OpenMP (www.openmp.org)

OpenMP is a standard and a programming model for shared memory parallelism.

It exists for C/C++ and FORTRAN (and java).

Compiler developers look at the (evolving) standard and implement the programming model.

\(\to\) compilers “know” how to compile a program that uses OpenMP.

Principal components are: runtime, library functions, environment variables and pragmas.