A major part of the RWTH Compute-Cluster comprises of more than 1500 nodes equipped with the latest Intel® Xeon™ processors. To parallelize for this powerful distributed memory resource one typically uses MPI. However, with the different levels of parallelism, namely shared memory within a node and distributed memory across the nodes, good performance becomes increasingly difficult to obtain. This difficulty originates in the need to adapt and tune multiple levels of the application. These levels are typically the MPI parallelization, the MPI-library, multi-threading and job-placement.
This workshop addresses these issues:
- On the first day you will learn about the many tuning opportunities of the Intel® MPI library and potential of improving your performance without any code changes.
- The second day covers the Intel® Trace Analyzer and Collector to help you spot performance issues within your code.
- On the third day we will introduce an additional level of parallelism to leverage todays multi-core CPU: OpenMP. With this hybrid approach you may be able to improve the performance of your codes by adding shared-memory parallelization to avoid NUMA issues and to prevent additional communication overhead caused by scheduling multiple MPI processes per node.
Intel®MPI, MPI Autotuning, Intel®Math Kernel Library
Intel Trace Analyzer and Collector
Attendees are kindly requested to prepare and bring in their own code, if applicable. It is recommended to have good knowledge in MPI and the programming language (C/C++/Fortran) of the code. It is also advised to have a basic understanding of OpenMP, as part of the workshop will cover adding and tuning OpenMP in the MPI-context.
The presentations will be given in English.
This workshop addresses developers using Linux or Windows.
The agenda is available here as a PDF .
Cluster Introduction (Christian Iwainsky, Center for Computing and Communication)
Introduction (Michael Klemm, Intel)
01 MPI Tuning (Michael Klemm, Intel)
02 Intel MKL (Michael Klemm, Intel)
03_MPI_Tuning_with_ITAC_I (Michael Klemm, Intel)
04_MPI_Tuning_with_ITAC_II (Michael Klemm, Intel)
05_Intel_VTune_Amplifier_XE_and_MPI (Michael Klemm, Intel)
06_OpenMP_MPI_Hybrid_Programming (Michael Klemm, Intel)
07_NUMA_Optimization (Michael Klemm, Intel)
2011-10-13_-_01_OpenMP_in_20_Minutes , (Christian Terboven, Center for Computing and Communication)
2011-10-13_-_02_OpenMP_and_the_Hardware_Architecture , (Christian Terboven, Center for Computing and Communication)