Parallel Programming in Computational Engineering and Science 2019

kindly sponsored by


HPC Seminar and Workshop
March, 11 - 15,  2019     
IT  Center RWTH Aachen University
Kopernikusstraße 6
Seminar Room 3 + 4
                   






Please find information about the preceeding
Introduction to HPC on Feb 25, 2019 >>>

 Please provide your feedback to PPCES 2019 here >>>

(Click on "Respond to this Survey/Auf die Umfrage antworten")

 



 

About PPCES

This event continues the tradition of previous annual week-long events that take place in Aachen every spring since 2001.

Throughout the week we will cover parallel programming using OpenMP and MPI in Fortran and C/C++ and performance tuning. Furthermore, we will introduce the participants to GPGPU programming with OpenACC. Hands-on exercises for each topic will be provided, which should not discourage you from working on your own code.

The contents of the courses are generally applicable but will be specialized towards CLAIX the compute cluster which recently has been installed at the RWTH's IT Center.
CLAIX is replacing the outdated BULL Cluster which has been installed in 2011 and will be decommissioned until April 2019 at latest.

The topics will be presented in a modular way, so that you could pick specific ones and register for the particular days only in order to let you invest your time as efficiently as possible. Each major topic is split into two parts: part I with basic information and part II with more advanced information. Part II always relies on contents of Part I.
Please register separately for each part under the individual parts below!

OpenMP  is a widely used approach for programming shared memory architectures, supported by most compilers nowadays. We will cover the basics of the programming paradigm as well as some advanced topics such as programming NUMA machines. We will also cover a selection of performance and verification tools for OpenMP. The nodes of the RWTH Compute Cluster contain an increasing number of cores and thus we consider shared memory programming a vital alternative for applications that cannot be easily parallelized with MPI. We also expect a growing number of application codes to combine MPI and OpenMP for clusters of nodes with a growing number of cores.

The Message Passing Interface (MPI) is the de-facto standard for programming large HPC systems. We will introduce the basic concepts and give an overview of some advanced features. Also covered is hybrid parallelization, i.e. the combination of MPI and shared memory programming, which is gaining popularity as the number of cores per cluster node grows. Furthermore, we will introduce a selection of performance and correctness-checking tools.

OpenACC  is a directive-based programming model for accelerators, which enables delegating the responsibility for low-level (e.g. CUDA or OpenCL) programming tasks to the compiler. Using the OpenACC industry standard, the programmer can offload compute-intensive loops to an attached accelerator with little effort. In this course, we will focus on using OpenACC on NVIDIA GPUs.

Prerequisites

Attendees should be comfortable with C/C++ or Fortran programming in a Linux environment and interested in learning more about the technical details of application tuning and parallelization.
Basic information about parallel computing architectures and about using the modules environment and the Slurm batch scheduler will be  presented in the Introduction to HPC on Feb 25, 2019. Please check out the slides after this introduction if you need a quick start.
The presentations will be given in English.

Agenda

Please find agenda   here >>>

Registration

The first two days will focus on performance programming of single compute nodes mainly using OpenMP. All courses will be accompanied by practical exercises.

  • Shared Memory Programming - Part I: Basic OpenMP Programming -  Monday, March 11, 9:00 - 17:30

    This course provides an introduction into parallel programming with OpenMP. It covers in detail the concepts of parallel region, worksharing, tasking, and synchronization. It also emphasizes how to write correct parallel programs. The lectures are support by many examples and hands-on exercises.
    Prerequisites: nothing specific

  • Shared Memory Programming - Part II : Advanced OpenMP Topics - Tuesday, March 12, 9:00 - 17:30

    This course put the focus on improving performance of OpenMP parallel programs. It covers the interaction of OpenMP programs with the hardware and possible issues, like false sharing and remote memory accesses on NUMA architectures and vectorization for SIMD microarchitectures. It also presents recipies and case studies to improve the scalability and performance of OpenMP programs.

    Prerequisites (if not attending Basic OpenMP Programming): experiences with parallel programming in OpenMP, overview about the most commonly used OpenMP constructs and clauses

Days 3 and 4 will focus on performance programming of multiple compute nodes using Message Passing with MPI

  • Message Passing - Part I: Basic MPI Programming – Wednesday, March 13, 9:00 - 17:30
    It only takes few concepts of MPI to get the first parallel programs up and running: communicators and point-to-point versus collective communication. Pratical exercises are important to get started with MPI  and the first steps in using performance analysis tools help to get a better understanding of the work distribution of MPI programs in execution. 
    Furthermore you will learn how to debug MPI programs and how to submit MPI jobs on CLAIX using the Slurm batch system.
  • Message Passing - Part II: Further MPI Concepts - Thursday, March 14, 9:00 - 17:30
    Actually the MPI interface provides a large number of calls, many of them you will probalby never need. But some of the concepts may be important for your application. Therefore it is important to get a rough overview of what is available and  dive into some of the details at an intermediate knowledge level. The topology of MPI applications and there mapping onto a given cluster may be important for performance reasons. So you need to know how to distribute and place your MPI processes across the nodes and sockets and  cores of the available compute nodes and how to specify this with respect to the Slurm batch system. In order to run your large MPI applications you need to know how to organise your parallel IO. With a growing number of cores per compute node, hybrid parallelization - a combination of MPI and OpenMP - is gaining importance. Things are not getting easier when combining both parallelization approaches: MPI and OpenMP (or MPI and multi-threading in general). How to deal with it when running (submitting using Slurm),  debugging and analysing your parallel application?

The last day will focus on accelerator programming, particularly on programming NVIDIA GPGPUs.

  • GPGPU Programming - Part I: Basic GPGPU Programming Principles with OpenACC - Friday, March 15, 9:00 - 12:30
    The number of GPU-based systems is increasing in the HPC community. However, GPGPU-based hardware architectures differ in design and programming to traditional CPU-based systems. Thus, understanding the basic GPGPU principles is essential to get good performance on such hardware architectures. In this session, we give an overview on the GPU architecture and its differences to CPUs. We introduce the basic concepts for leveraging GPUs, i.e., offloading, creating parallelism and memory management.
    OpenACC is a directive-based programming model that can be used to parallelize scientific applications for GPUs. Step by step, we will introduce how to apply the basic GPU concepts with OpenACC to GPUs. This includes offloading, managed memory and data transfers. We will run through prepared examples and evaluate performance with the help of NVIDIA's Visual Profiler.
    Hands-on sessions are done on RWTH's CLAIX Cluster with NVIDIA Pascal GPUs using PGI‘s OpenACC implementation.
     
  • GPGPU Programming - Part II:  Advanced Concepts and Tuning of GPGPUs with OpenACC - Friday, March 15, 14:00 - 17:30
    Understanding the mode of operations of GPUs will enable developers to achieve good performance for their GPU programs. Therefore, this session will introduce advanced concepts and tuning methods to allow GPU performance analysis and improvement. This includes the concepts of latency hiding and occupancy with hints on appropriate launch configurations and loop schedules. Furthermore, we will cover the method of asynchronous operations that also enables true heterogeneous computing on CPU and GPU. To showcase and practice these concepts, we will use the directive-based programming model OpenACC. Hands-on sessions are done on RWTH's CLAIX Cluster with NVIDIA Pascal GPUs using PGI‘s OpenACC implementation.
    Prerequisites (if not attending GPGPU Programming Part I): basic knowledge on GPU architectures (cores, streaming multiprocessors, global memory, shared memory), basic knowledge on GPU programming concepts (need for offloading, threads - blocks - grids), knowledge on basic OpenACC directives (acc kernels, acc parallel, acc loop, acc data)
    Recommended (if not attending GPGPU Programming Part I): basic knowledge on using NVIDIA's Visual Profiling, basic knowledge on how to submit batch jobs for CLAIX
Course Materials

OpenMP + Serial Performance

01_IntroductionToOpenMP.pdf

Exercises_OMP_2019.pdf

serial_tuning_basics.pdf

02_OpenMPTaskingInDepth.pdf

ppces.2019.numa.ruud slides.pdf

03_OpenMPNumaSimd.pdf

04_OpenMPSummary.pdf


MPI Basic:

01_PPCES2019_MPI_Basic.pdf

PPCES2019_MPI_Lab.pdf

PPCES2019_MPI_Lab.tar.bz2

TotalView.pdf

Performance Engineering.pdf


MPI Advanced:

02_PPCES2019_MPI_Advanced.pdf

03_PPCES2019_Parallel_IO.pdf

Score-P_basic.pdf

PPCES_Correctness_Tools.pdf


GPGPU Programming

PPCES2019_OpenACC-Basics.pdf

PPCES2019_OpenACC-Advanced.pdf

PPCES2019_OpenACC_ProgrammingLab.pdf

PPCES2019_OpenACC-Lab.tar.gz


Further course Materials will be published here as they become available.

Travel Information

Please make your own hotel reservation. You may find a list of hotels in Aachen on the web pages of Aachen Tourist Service.  We recommend that you try to book a room at the Novotel Aachen City, Mercure am Graben or Aachen Best Western Regence hotels. These are nice hotels with reasonable prices within walking distance (20-30 minutes, see city map) from the IT Center through the old city of Aachen. An alternative is the IBIS Aachen Marschiertor hotel, located close to the main station, which is convenient if you are traveling by train and also prefer to commute to the IT Center by train (4 trains per hour, 2 stops).

Please, download a sketch of the city (pdf, 415 KB) with some points of interest marked.
You may find a description of how to reach us by plane, train or car here.
Bus lines 33 and 73 connect the city (central bus station) and the Mies-van-der-Rohe-Straße bus stop 6 times per hour.
Most trains between Aachen and Düsseldorf stop at station Aachen West, which is 10 minutes walk away from the IT Center.
From the bus stop and the train station just walk up Seffenter Weg. The first building on the left side at the junction with Kopernikusstraße is the IT Center of RWTH Aachen University. The event will take place in the extension building located at Kopernikusstraße 6.
The weather in Aachen is usually unpredictable. It is always a good idea to carry an umbrella. If you'll bring one, it might be sunny.
 
Contact

Paul Kapinos
Tel.: +49 (241) 80-24915
Fax/UMS: +49 (241) 80-624915
E-mail: hpcevent@itc.rwth-aachen.de