OpenMP 3

29/10/2021

Outline

Race condition
Synchronisation

Recap of key concepts

Many cores connected to memory.
Single address space.

Recap of key concepts

Private memory only accesible by owning thread
No notion of messages between threads
All communication via shared variables

Recap of key concepts

#pragma omp parallel default(none) private(mya) shared(a)
{
  #pragma omp for
  for (int i=0;i<1000;i++) { ... ; }
}

g++ -fopenmp prog.cxx -o myprogram

OMP_NUM_THREADS=4 ./myprogram

Parallel region

Race condition

#include <cstdio>

int main() {
    int a=10;
    #pragma omp parallel default(none) shared(a)
    {
        a = a+1;
    }
    printf("Final result a=%d\n", a);
    return 0;
}

Initialise a=10, each thread: add 1 to a
Expect final \(a=10+N_{threads}\)
For \(N_{threads}>1\) there is no guarantee for correctness

Compile and run with:

g++ -fopenmp  rcdemo.cxx -o rcdemo
./rcdemo

Race condition

Initialise a=10 in shared memory

Race condition

Thread 1 loads 10 into CPU register

Race condition

Thread 1 adds 1 to 10 in CPU register

Race condition

Thread 2 loads 10 into CPU register

Race condition

Thread 2 adds 1 to 10 in CPU register

Race condition

Thread 1 stores 11 in shared memory a

Race condition

Thread 2 stores 11 in shared memory a

Race condition

Symptom: non-deterministic execution behaviour of code.

Solution: if multiple threads need to update a shared variable, the update must be protected (there are several ways in the OpenMP standard to do that).

Can be source of bugs that are hard to find and fix, especially in larger programs.

Avoid by sticking with default(none) and carefully thinking about which variables are shared and which are private.

Code example with race condition

Task: compute \(\sum\limits_{n=1}^{100} n\)

#include <cstdio>

int main() {
    int sum=0;
    const int N=100;

    #pragma omp parallel for
    for (int n=1;n<N+1;n++)
    {
      sum +=n;         // Race condition here
    }
    printf("Result is %d. It should be %d\n", sum, N*(N+1)/2);
    return 0;
}

pragma omp atomic

We can safely update a single shared variable using

#pragma omp atomic
    statement;

atomic protects a single address in shared memory.

atomic applies to a single code statement only.

atomic applies only to basic types.

statement must be of simple form like:

x+=1;
x++;
x--;
x-=2;
x*=2;

Code example fixed

Task: compute \(\sum\limits_{n=1}^{100} n\)

#include <cstdio>

int main() {
    int sum=0;
    const int N=100;

    #pragma omp parallel for
    for (int n=1;n<N+1;n++)
    { 
      # pragma omp atomic
      sum +=n;         // Race condition eliminated
    }
    printf("Result is %d. It should be %d\n", sum, N*(N+1)/2);
    return 0;
}

pragma omp critical

For things not covered by atomic use #pragma omp critical

critical ensures that only one thread at a time executes block of code.

critical is typically more expensive than atomic

#pragma omp critical
{
    statement1;
    statement2;
    ...
}

Reductions

A reduction produces a single value from operations like addition, multiplication, min, max, and, or.

Allows each thread to reduce into a private copy — and then reduce those private copies to get final result.

Syntax similar to shared/private but you need to give the operator, too:

#pragma omp parallel for reduction(+:mysum)
{
   for(int i=0;i<100;i++) mysum+=i;
}

Outline

Recap of key concepts

Recap of key concepts

Recap of key concepts

Parallel region

Race condition

Race condition

Race condition

Race condition

Race condition

Race condition

Race condition

Race condition

Race condition

Code example with race condition

pragma omp atomic

Code example fixed

pragma omp critical

Reductions

Reductions (cont)