Seminar: Current Topics in High-Performance Computing (HPC)

Content

High-performance computing (HPC) is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g. GPUs). Leveraging these systems, parallel computing with e.g. MPI, OpenMP or CUDA must be applied.

This seminar focuses on current research topics in the area of HPC and bases on conference and journal papers. Topics might cover e.g. parallel computer architectures (multicore systems, Xeon Phis, GPUs etc.), parallel programming models, performance analysis & correctness checking of parallel programs or performance modeling.

Schedule

The topics are assigned at the beginning of the lecture period. Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course at the end of the lecture period or at the beginning of the exam period.
More information in L²P: https://www3.elearning.rwth-aachen.de/ws14/14ws-29794

Prerequisites

The attendance of the lecture "Introduction to High-Performance computing" (Müller) is helpful, but not required.

Language

We prefer and encourage students to do the report and presentation in English. But, German is also possible.

Topics

Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors
In many areas of scientific computing and engineering, solving partial differential equations is of great significance. Since many of these problems result in sparse linear equation systems, the corresponding matrix-vector multiplication is of singular importance. In order to get a good performance on throughput-oriented processors like GPGPUs, one has to consider the memory access pattern and the execution path. This seminar article and talk is expected to describe which possibilities the underlying matrix storage format provides to reach this.
Supervisor: Tim Cramer

Scheduling Strategies in task-centric programming models at the example of OpenMP
Task-level parallelization is one abstraction level in different programming models, which allows solving e.g. load balancing problems in a convenient way. The specification of the de-facto standard for shared parallel programming OpenMP gives a high degree of freedom for the implementation of this model. This seminar article and talk is expected to describe which possibilities exist for task scheduling strategies in order to get a good performance.
Supervisor: Tim Cramer

Efficient MPI collective communication on hierarchical systems
High-Performance Computing systems are becoming increasingly hierarchical. For example, the typical cluster node of today has two or more multi-core processor packages that present at least one level of NUMA. The addition of co-processors (such as Intel Xeon Phi) further increases the levels in the hierarchy. This creates a challenge for the MPI implementers when it comes to providing efficient collective operations such as barrier synchronisation, broadcasts and global reductions.
The seminar article and talk are expected to provide an overview of the recent development of algorithms for efficient implementation of collective communication in hierarchical computer systems as well as the state of their adoption by the main MPI implementations.
Supervisor: Hristo Iliev

Portable parallel performance with Chapel
While traditional and well-established programming models like MPI are sufficiently abstract to map onto many kinds of parallel computing architectures and provide source-level portability, the same cannot be said for the performance of those models. In order to get the best performance out of a heterogeneous computing system, one usually has to resort to a combination of different programming models (hybrid programming), which leads to complex and hard to maintain source codes.
Chapel is a relatively young high-level general-purpose programming language that supports both data and task parallelism, and allows for nested parallelism. It also provides constructs for multi-resolution control of work distribution, communication, and data locality, which allows the programmer to take a top-down approach in achieving the best performance on different systems with minimum program modifications.
The seminar article and talk are expected to deliver a high-level overview of the Chapel language and to present use case(s) of demonstrable performance portability on heterogeneous systems, e.g. a multicore host with GPGPUs.
Supervisor: Hristo Iliev

Charm++ - Parallel programming with migratable objects
As HPC is approaching the exascale era, new challenges arise in parallel programming. Process oriented parallelization may not be able to sufficiently solve the difficulties of load balancing over millions of nodes or establishing reliable fault-tolerance in scientific codes. Charm++ as an extension to the C++ programming language addresses these issues per design. This seminar talk is expected to explain the key characteristics of the Charm++ language and it's possible benefits in the exascale era.
Supervisor: Felix Münchhalfen

Statistical Techniques in Computer Science Research
Publications in computer science regularly cope with a shortcoming in statistical techniques. This topic includes the basic experiment design as well as the experiment implementation and execution and finally the evaluation and illustration of the resulting data.
The seminar article and talk are expected to discuss the (non-)practice of these techniques by reference to some given papers.
Supervisor: Felix Münchhalfen

Data race detection in practice - ThreadSanitizer
Data races are one of the most common issues found in parallel codes; and both hard to find and reproduce. The automatic detection of data races is challenging and usually requires lots of host or device memory. This seminar talk should give an overview of the novel approach towards data race detection implemented in ThreadSanitizer.
Supervisor: Joachim Protze

Python in HPC
Python got large popularity in the last years. This is also true for the High Performance and Scientific Computing community. HPC/SC specific modules include numpy, scipy and mpi4py. Several projects developed with Python were presented on various workshops in the past (including PyHPC'14). Discussion should include potential trade off regarding performance against development time.
Supervisor: Joachim Protze

Transactional Memory in BG/Q systems
The idea of transactional memory (TM) was proposed 20 years ago to allow atomic operations of arbitrary length. The latest BlueGene/Q systems are one of the first systems with hardware support for TM. In this seminar talk and article TM should be introduced and the implementation in the BG/Q hardware shall be explained as well as a comparison to software based TM and locking based approaches.
Supervisor: Dirk Schmidl

Performance Modeling
Modeling the performance of an application helps in many circumstances, e.g. while application tuning or system procurement. However, modeling the performance in a way that allows to predict the performance for new systems or configurations is a hard task. In this seminar talk and article the student shall provide an overview on performance modeling for HPC applications and provide a plan how performance models can be generated.
Supervisor: Dirk Schmidl

ParalleX and HPX - candidates for Exascale computing?
The upcoming exa-scale era will bring millions of execution units to be handled by applications. State-of-the-Art parallelization techniques, such as MPI and OpenMP, are not guaranteed to solve this challenge. New parallelization paradigms promise to efficiently hide latencies and contain starvation and contention. ParalleX is such a new (and still experimental) parallel execution model with HPX (High Performance ParalleX) as the first implementation. This seminar article and talk is expected to describe the exascale programming challenges and how they are addressed by ParallelX and HPX by giving an overview about this new paradigm and briefly describing an application example.
Supervisor: Christian Terboven

The Status of Software Distributed Shared Memory (DSM) Systems
Distributed Shared Memory (DSM) systems provide a common logical address space across multiple compute nodes that do not necessarily have direct access to all physical memory. Although these systems did not enter widespread application, they have shown to be successful for certain applications. This seminar article and talk is expected to summarize the different approaches of selected software DSM systems and to argue for their relevance in the multi-core era.
Supervisor: Christian Terboven

Challenges on Measuring Programming Productivity in HPC
The performance of an application code running on an HPC cluster is evaluated in every work. However, it does not only dependent on the hardware, but also on the used programming model. Additionally, the programming model mainly defines the development and maintenance effort. Often, programming paradigms that have many capabilities to tune or scale an application are complex in programming and therefore more error-prone and their code may be harder to maintain. Some researcher try to measure and quantify the programming productivity and effort. However, hard metrics (such as GFlop/s for performance) are difficult to gain.
In the context of this seminar report and presentation, the challenges, advantages and disadvantages of commonly-used metrics to measure programming productivity shall be stated and possible solutions from the literature shall be presented. These characteristics shall be illustrated by recent results from research.
Supervisor: Sandra Wienke

Assessing the benefit of miniapps in HPC
With the upcoming exa-scale era, more complex computer architectures are likely to emerge. Thus, a dramatical re-design of software will be needed to suit the novel architectures. The approach of "co-design" of software and hardware communities advances the development of mini applications (miniapps) that shall represent the key performance aspects of important software projects.
The seminar deals with assessing the benefits, challenges and drawbacks of miniapps in high-performance computing. The seminar thesis shall contain performance and programming aspects and a delineation of concepts with respect to common application benchmarks.
Supervisor: Sandra Wienke

Instructors

Prof. Matthias S. Müller
Tim Cramer
Hristo Iliev
Felix Münchhalfen
Joachim Protze
Dirk Schmidl
Christian Terboven
Sandra Wienke

  • No labels