Current Topics in High-Performance Computing (HPC)


High-performance computing (HPC) is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g. GPUs). Leveraging these systems, parallel computing with e.g. MPI, OpenMP or CUDA must be applied, while meeting constraints on power consumption. This seminar focuses on current research topics in the area of HPC and bases on conference and journal papers. Topics might cover e.g. parallel computer architectures (multicore systems, Xeon Phis, GPUs etc.), parallel programming models, performance analysis & correctness checking of parallel programs, performance modeling or energy efficiency of HPC systems.


This seminar belongs to the area of applied computer science. The topics are assigned at the beginning of the lecture period (April 18th, 2018, 14:00 - 15:45 h). Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day at the end of the lecture period or at the beginning of the exam period. Attendance is compulsory for the introductory event and the presentation block. 

 More information in L²P:


Registration/ Application

Seats for this seminar are distributed by the global registration process of the computer science department only. The registration phase is typically in January.
We appreciate if you state your interest or pre-knowledge in HPC in the remarks section during the registration process.


The goals of a seminar series are described in the corresponding Bachelor and Master modules. In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 1-3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.
The attendance of the lecture "Introduction to High-Performance computing" (Müller) is helpful, but not required.
We prefer and encourage students to do the report and presentation in English. But, German is also possible.


Anticipating Quantum Supremacy: Current State of Quantum Computing *Update 19.3.2018*

Quantum computers are expected to revolutionize such areas as cryptography, artificial intelligence, financial investment and material science. First promising algorithms appeared more than 2 decades ago, and now such market leaders as IBM, Intel, Alibaba and Google are in a race delivering bigger and bigger quantum chips. Working prototypes with up to 20 qubits are reported, while HPC simulators give opportunity to model future chips with about 50 qubits.

The seminar article and talk should give an overview of the current state of these developments.

Supervisor: Uliana Alekseeva


Elimination of Unnecessary Data Transfers by Translating OpenMP Target Constructs into OpenCL

The introduction of heterogeneous programming models such as Nvidia CUDA or OpenCL is driven by the current development of heterogeneous hardware. Since the applications of these programming paradigms is typically difficult and complicated other approaches such as OpenMP 4.x or OpenACC have been introduced in order to lower the programming complexity. Unfortunately, for none of these approaches an implementation exists which supports all kinds of accelerators. Furthermore, the expressiveness of less complex paradigms might lower the performance optimization opportunities. At least it might be non-trivial to avoid unnecessary data transfers between a host and an accelerator. In order to overcome these issues, solutions for the source-to-source translation from one paradigm into another came to existence.

This seminar article and talk is expected to give a detailed overview and discussion for such a framework translating from OpenMP target constructs into OpenCL. Furthermore, a discussion on the automatic performance optimization potential is expected.

Supervisor: Tim Cramer


Improving OpenMP Runtime Implementations by the Use of Lightweight Thread Approaches

In high performance computing applications, OpenMP is the de-facto standard for on-node parallelism. The most popular runtime implementations rely on POSIX threads (pthreads). However, since fine-grained constructs like tasking or dynamic scheduling becomes more popular, the performance of such an OpenMP runtime implementation might benefit from approaches using lightweight threads (LWT). One recent implementation is GLTO (Generic Lightweight Thread OpenMP), which is available as open source library.

This seminar article and talk is expected to give detailed overview and discussion about the potential performance benefits of LWT in general, and GLTO in particular. Optionally, an investigation of the open source implementation using the EPCC mircobenchmark suite on the RWTH Compute Cluster can be done.

Supervisor: Tim Cramer


Callback-based Tool Interfaces *Update 3.4.2018*

In high-performance computing, several approaches for obtaining measurement data of the behavior of parallel applications exist. Direct instrumentation only provides measurement data outside the runtime system of parallel applications. Sampling may provide measurement data from within the runtime system, yet, often low-level expert knowledge is needed to interpret the data. Callback-driven tool interfaces enable tools to observe specific, well-defined events from within a runtime system, yet they rely on specific support of such an interface by the runtime system itself.

I this seminar topic, the student shall look into different callback-driven tool interfaces in HPC, similarities and differences in their design, as well as their spectrum of use.

Supervisor: Marc-André Hermanns


Identifying Performance Issues in Parallel Applications using Machine Learning Techniques

Nowadays commodity computer hardware is not just used in everyday life but also combined for building state of the art supercomputers. These computers typically have multiple cores. To exploit the full potential and performance of these multicore systems parallel programming becomes more and more important. Due to the increasing complexity of computer architecture, networks and programming languages for parallel executing there are multiple issues that can arise that might vastly impede the performance such as false sharing, contention, remote memory accesses on NUMA systems or bad memory access patterns. Although these programs still produce correct results it is hard to identify whether parallel code has such performance problems and current performance analysis tools only provide limited information about that. Thus new ideas came up e.g. how to use integrated performance counters and machine learning techniques to identify such performance issues.

Focus of this seminar topic is to give an overview of the techniques that are currently investigated and how they work.

Supervisor: Jannis Klinkenberg


Techniques and Approaches for Job Placement on Supercomputers to Improve Performance and Reduce Runtime Variability

Due to the increasing usage of nowadays large supercomputers job placements play a significant role in terms of application performance. Especially performance of big jobs that require a large portion of the cluster's nodes/resources might suffer from being split into multiple spatially separated fragments. Knowledge about the network topology of the cluster and the communication pattern of the application can also be crucial to ensure well-placed jobs without impacting the utilization of the cluster.

This seminar topic focuses on describing existing approaches to improve performance and reduce network latency and runtime variability.

Supervisor: Jannis Klinkenberg


DataRaceBench: A Benchmark Suite for Systematic Evaluation of Data Race Detection Tools

Data races in multi-threaded parallel applications are notoriously damaging while extremely difficult to detect. Many tools have been developed to help programmers find data races. However, there is no dedicated OpenMP benchmark suite to systematically evaluate data race detection tools for their strengths and limitations. This paper presents DataRaceBench, an open-source Benchmark suite designed to systematically and quantitatively evaluate the effectiveness of data race detection tools. They focus on data race detection in programs written in OpenMP, the popular parallel programming model for multi-threaded applications. In particular, DataRaceBench includes a set of microbenchmark programs with or without data races. These microbenchmarks are either manually written, extracted from real scientific applications, or automatically generated optimization variants.

This seminar topic focuses on code patterns with data races in OpenMP applications. What patterns are missing to cover a broader range of OpenMP applications?

Supervisor: Joachim Protze


An Operational Semantic Basis for OpenMP Race Analysis

OpenMP is the de facto standard to exploit the on-node parallelism in new generation supercomputers. Despite its overall ease of use, even expert users are known to create OpenMP programs that harbor concurrency errors, of which one of the most insidious of errors are data races. OpenMP is also a rapidly evolving standard, which means that future data races may be introduced within unfamiliar contexts. A simple and rigorous operational semantics for OpenMP can help build reliable race checkers and ward off future errors through programmer education and better tooling. This paper’s key contribution is a simple operational semantics for OpenMP, with primitive events matching those generated by today’s popular OpenMP runtimes and tracing methods such as OMPT. This makes the presented operational semantics more than a theoretical document for intellectual edification; it can serve as a blueprint for OpenMP event capture and tool building. 

Focus of this seminar topic is to provide an overview of the presented operational semantics and how this work can be used.

Supervisor: Joachim Protze


Using the Cloud as OpenMP Offloading Device *Update 21.2.2018*

OpenMP is a popular directive-based annotation standard used for shared memory parallel programming on a single node. Running an OpenMP program in a large cluster environment typically requires the usage of additional libraries like MPI which tends to be error-prone. In recent versions, OpenMP supports computation offloading where selected program fragments are executed on a dedicated device such as a GPU or an FPGA. The OmpCloud project aims to integrate the cloud as a new kind of offloading device: A developer can declare program fragments in OpenMP that should be executed on a large cluster in the cloud. Thereby, the usual OpenMP syntax as for any other offloading device is used which makes the integration simple.

The seminar article and talk are expected to give an overview on the OmpCloud project. Furthermore, chances and limitations of the Approach should be discussed. As an optional task, the performance might be evaluated by running own simple experiments.

Supervisor: Simon Schwitanski


MapReduce over MPI for Supercomputers *Update 21.2.2018*

The MapReduce programming model is predominantly used in big data analytics. The Apache Hadoop ecosystem contains the most prominent implementation of MapReduce targeting big data clusters. However, there are crucial differences between a big data cluster and a typical supercomputer, e.g., in the file systems, network interconnects and software stacks. The Message Passing Interface (MPI) is well-suited for an implementation of MapReduce. Mimir is such an implementation of MapReduce over MPI trying to exploit the facilities of supercomputers and achieve high scalability.

The seminar article and talk are expected to present the approaches and architecture of Mimir. Furthermore, a comparison to other MapReduce implementations over MPI should be included. As an optional task, the performance of Mimir might be evaluated on the RWTH compute cluster.

Supervisor: Simon Schwitanski

Dynamic Power Allocating on a Power-Constrained cluster

Future HPC clusters are expected to be power constrained due to an increasing power draw of growing clusters and limited infrastructure. Instead of traditional considerations on how to use hardware in a performance-efficient way, we need to think about how to calculate more power efficiently, with the aim of maximizing cluster-wide performance. Two main challenges appear for such a cluster: power-draw monitoring and power budgeting.

This seminar work will investigate different ways to solve these challenges. In particular, it will focus on the way of solving the problem at runtime (dynamically) due to its simplicity and effectivity.

Supervisor: Bo Wang


Hardware-Aware Power Budgeting

Variability during the manufacturing process introduces performance and power-consuming variation in HPC components, such as CPUs. This kind of variation makes it difficult to optimize application performance, especially for an application whose power budget is limited to a value that is lower than its peak consumption on a power-constrained cluster.

This seminar work will explore hardware-variations, especially in terms of power. Based upon hardware-specific variations, maximizing application performance is the focus of this work.

Supervisor: Bo Wang



Matthias S. Müller

Uliana Alekseeva

Tim Cramer

Marc-André Hermanns

Jannis Klinkenberg  

Joachim Protze

Simon Schwitanski

Bo Wang

Sandra Wienke



  • No labels