19/11/2021

Outline

  • Amdahl’s law

Recap of key concepts

  • Many cores connected to memory.
  • Single address space.

Recap of key concepts

  • Private memory only accesible by owning thread
  • No notion of messages between threads
  • All communication via shared variables

Recap of key concepts

#pragma omp parallel default(none) private(mya) shared(a)
{
  #pragma omp for
  for (int i=0;i<1000;i++) { ... ; }
}
g++ -fopenmp prog.cxx -o myprogram
OMP_NUM_THREADS=4 ./myprogram

Parallel region

Race condition

Symptom: non-deterministic execution behaviour of code.

Solution: if multiple threads need to update a shared variable, the update must be protected

#pragma omp atomic
STATEMENT

#pragma omp critical
{
  STATEMENT1
  STATEMENT2
  ...
}

#pragma omp parallel for reduction(+:SHAREDVARIABLE)

Speedup

Let \(T_1\) be the total run time of a program when using exactly one thread.

Let \(T_n\) be the total run time of the same program when using \(n\) threads.

We define the speedup when using \(n\) threads as \[speedup(n)=\frac{T_1}{T_n}\].

We define the parallelisable fraction of a program’s run time as \(p\). With \(p\in[0,1)\).

Amdahl’s law

Not all work in a program can be parallelised.

Everything that is not parallel, we call serial.

The serial component is a limit on the parallel performance.

You can never make a program faster than the serial component through parallelism.

\[T_\mathrm{total} = T_\mathrm{serial} + \frac{T_\mathrm{parallelisable}}{N_\mathrm{threads}}\]

\[speedup(n, p) = \frac{1}{\frac{p}{n} + (1-p)}\]

\[\lim_{n\to\infty} speedup(n,p)=\frac{1}{1-p}\]

Gene Amdahl

1922 - 2015

Amdahl’s law

Amdahl’s law

Amdahl’s law

Amdahl’s law

\[speedup(n,p) = \frac{1}{\frac{p}{n} + (1-p)}\]

Parallel efficiency

Define parallel efficiency as efficiency= \(\frac{speedup}{n}\).

Measures scalability of parallel program — higher is better.

Even just a 1% serial phase drastically limits effectiveness of parallelism.

It can get worse

Parallelism also introduces overhead (thread creation and orchestration). \[speedup(n,p) = \frac{1}{\frac{p}{n} + (1-p) + c}\]

It can get even worse

The overhead might increase with the number of threads. \[speedup(n,p) = \frac{1}{\frac{p}{n} + (1-p) + f(n)}\]

Remarks

Amdahl’s law is valid only for fixed-size problems.

I.e. when wanting so solve a problem in the shortest time possible.

Amdahl’s law omits effects like growing amount of (fast) cache-memory when using more processors.

Modern day high-performance computing is concerned with growing the problem size.

E.g. higher resolution simulations (weather, climate, virus, galaxies,…).