ErweiterungsbauParallel programming Course Summer 2012

Center for Computing and Communication, Extension Building

RWTH Aachen University

Kopernikusstraße 6, 52074 Aachen
Seminar Room 4


The Center for Computing and Communication of RWTH Aachen University offers weekly courses about parallel programming in the time span from 1.8.2012 - 12.09.2012. The course series will cover basics in GPGPU programming, shared-memory programming with OpenMP, message passing with MPI and it will cover tools for correctness checking as well as tools for performance analysis.




Basic message passing with MPI


Advanced MPI, profiling and debugging of MPI applications


Introduction to OpenMP Programming


Advanced OpenMP Programming


Correctness Tools


Performance Tools



Attendees should be comfortable with C/C++ or Fortran programming and interested in learning more about parallel programming. The presentations will be given in English language. The Linux Cluster of the Center for Computing and Communication will be used for the exercises.


There are no fees for the courses, but since we can only offer a limited number of seats, please register for every course you like to attend. Registration links are provided below after the detailed descriptions of the course topics.

Details and Registration

  • Introduction to GPGPU programming

01.08.2012,  9:00 – 12:00

In this course, we will present the basic concepts of programming general-purpose graphics processing units (GPGPUs). We will explain the major differences between multicore architectures and GPUs and briefly introduce NVIDIA’s GPU architectures (Fermi, Kepler). One of the recent dominant GPU programming models is CUDA. We will go into CUDA’s C and Fortran extensions and show the basic techniques how to offload computations to GPUs. We will also give an outlook on OpenACC which provides a directive-based approach (similar to OpenMP) to accelerate compute-intensive loops. We will have a whole Workshop on OpenACC in October. Finally, we will introduce the RWTH GPU-Cluster and its usage.

After the presentation, we will have time for an hands-on session. You may work on some CUDA programming exercises and try out a CUDA debugger. If desired, we can also have a look at some simple tuning concepts during lab time.

  • Basic message passing with MPI

08.08.2012, 9:00 – 12:00

The Message Passing Interface (MPI) is the de-facto standard for programming large distributed memory HPC systems. This ½ -day course will introduce the basic concepts of the Single Program Multiple Data (SPMD) parallel programming model realized using message passing with MPI. It will introduce the MPI standard, run-time, point-to-point communication and the most important of the collective communication operations available in MPI. Attendants are expected to have at least intermediate understanding of either C or Fortran.

  • Advanced MPI, profiling and debugging of MPI applications

15.08.2012, 9:00 – 12:00

This ½ -day course will build on the topics presented in the basic MPI course. It will cover basic usage of the most popular parallel debuggers and performance tools (TotalView and Vampir in particular) specifically in the context of MPI. The course will also teach basic hybrid parallelization, i.e. the combination of message passing and shared memory programming. Hybrid parallelization is gaining popularity as the number of cores per cluster node is growing.

  • Introduction to OpenMP Programming

22.08.2012 9:00 – 12:00

OpenMP is a widely used approach for programming shared memory architectures. It is supported by most compilers nowadays. This ½-day course will give a comprehensive introduction into shared memory parallel programming in general and particularly with OpenMP. It will cover the largest fraction of OpenMP language elements, focusing on Worksharing and Tasking to speedup program execution. It will also touch on basic aspects of performance optimization, such as load balancing and dealing with NUMA architectures.

  • Advanced OpenMP Programming

29.08.2012, 9:00 – 12:00

This ½-day course assumes attendees understand basic parallelization concepts and already got first experiences with OpenMP, i.e. from an introductory course. It focuses on performance aspects, such as data and thread locality on NUMA architectures, false sharing, and private versus shared data. It will discuss language features in-depth and explain performance implication of different implementation alternatives. Finally it will also present various tools and how they can be used in the OpenMP parallelization cycle.

  • Correctness Tools

05.09.2012, 9:00 – 12:00

Searching for bugs in a program is a tedious task, even for serial programs. For parallel programs this task becomes even harder. In this course we will give an introduction to the TotalView debugger, a tool designed to help programmers finding bugs in serial, OpenMP and MPI parallel programs. We will also present the Intel Inspector XE, a tool used to find data races and deadlocks in OpenMP programs. For both tools exercises will be provided to make your own first experiences with these debugging tools.

  • Performance Tools

12.09.2012, 9:00 – 12:00

In this course we will give an introduction into performance analysis. The Intel VTune Amplifier XE, a tool which can be used to investigate the performance of serial and OpenMP programs, will be presented as well as Vampir, a tracing tool for MPI and hybrid OpenMP/MPI applications. We will also prepare some exercises for both tools, so that you can make first experiences during a hands-on session.



  • Keine Stichwörter