Current Topics in High-Performance Computing (HPC)

Bachelor/ Master


Students in Bachelor and Master programs attend the (compulsory) seminar events together. This enables mutual learning where the seminar presentations offer insights into a wide range of HPC-related topics. Bachelor and Master students have to follow slightly different rules (see below). All students are individually supervised.


High-performance computing (HPC) is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g. GPUs). Leveraging these systems, parallel computing with e.g. MPI, OpenMP or CUDA must be applied, while meeting constraints on power consumption.
This seminar focuses on current research topics in the area of HPC and is based on conference and journal papers. Topics might cover, e.g., parallel computer architectures (multicore systems, GPUs, etc.), parallel programming models, performance analysis & correctness checking of parallel programs, performance modeling or energy efficiency of HPC systems.


This seminar belongs to the area of applied computer science. The topics are assigned in the introductory event at the beginning of the lecture period (October 10th, 2019, 10:00am - 12:00, IT Center, Kopernikusstr. 6). Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day (or two days) at the end of the lecture period or in the exam period. Attendance is compulsory for the introductory event and the presentation block. Furthermore, students will have to attend an additional course on "Scientific Writing in Computer Science" of 1.5 hours. This course extends the official introductory course of the computer science department by practical tips and guidelines on how to write a scientific thesis. The course will take place on October 11th, 2019, 12:30pm - 2:00pm (seminar room 4, Kopernikusstr. 6) and participation will be compulsory for Bachelor students (and strongly recommended for Master students).

More information in RWTHmoodle: coming soon.

Registration/ Application

Seats for this seminar are distributed by the global registration process of the computer science department only. We appreciate if you state your interest in HPC, and also your pre-knowledge in HPC (e.g., relevant lectures, software labs, and seminars that you have passed) in the corresponding section during the registration process.


The goals of a seminar series are described in the corresponding Bachelor and Master modules. In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.


The attendance of the lecture "Introduction to High-Performance computing" (Prof. Müller) is helpful, but not required.


We prefer and encourage students to do the report and presentation in English. But, German is also possible.




Investigating the Performance of Hadoop MapReduce on Modern HPC Clusters

The MapReduce programming model has been of great interest in the past years for solving data analytics problems in a scalable manner. Apache Hadoop is one of the most prominent implementations that heavily exploits data locality to achieve good scalability, even on huge amounts of data. The architecture of a Hadoop cluster is significantly different from that of an HPC cluster: In a Hadoop cluster each node has multiple local disks (typically HDDs) installed such that data locality has a big impact on performance. In a modern HPC cluster however, computation nodes are physically separate from the storage system, i.e., storage data accesses in an HPC cluster always lead to network I/O between computation and storage nodes. Thus, network performance may limit scalability of typical MapReduce computations in HPC clusters.

The thesis should present and discuss the key architectural differences of an HPC cluster compared to a big data cluster and how this impacts performance. In a second step, techniques proposed in the literature that improve performance of Hadoop MapReduce on HPC clusters should be investigated, presented, and compared.

Supervisor: Simon Schwitanski

Static Data Race Detection for OpenMP Programs (*Update 7.10.2019*) 
Data races are a common problem in parallel programming, because they can result in serious faults during program execution and are difficult to detect. This is also the case for OpenMP, which is a directive-based annotation standard used for shared memory parallel programming. Different techniques for data race detection in OpenMP have been developed during the last years. Most of them use dynamic analysis, i.e., the program is executed and collected runtime information is analyzed on-the-fly or postmortem. Apart from that, there are static analysis techniques which can detect data races by analyzing the source code. Contrary to dynamic analyses at runtime, static analyses can capture all possible execution paths of an application. However, state explosion or imprecise analyses are a drawback compared to dynamic analyses.

The seminar thesis should survey the different static analysis techniques for OpenMP programs. Further, current challenges of these approaches should be discussed and compared to dynamic analysis techniques.

Supervisor: Simon Schwitanski

CIVL: Formal Verification of Parallel Programs

Verification of parallel programs via static analysis is a challenging task: Beside the state explosion due to the large number of different execution paths / schedules, another problem are the different ways of writing parallel programs using different "dialects": MPI for distributed memory, OpenMP for shared memory, CUDA for GPUs, etc. This requires adaptation of verification algorithms to the syntax and semantics of each parallel programming model, it gets even more complex if a combination of parallel programming models ("hybrid programming") is used in a single program. CIVL-C ("Concurrency Intermediate Verification Language") tackles these problems: It is a generic C language enriched with generic concurrency constructs. Programs written in any concurrency dialect (MPI, OpenMP, CUDA) can be translated to CIVL-C. Verification algorithms or tools based on CIVL-C can then just run on a program with any concurrency dialect (MPI, OpenMP, CUDA) by running on the translated CIVL-C program. In other words, verifying a new concurrency dialect only requires writing a translator to the CIVL-C language.

The thesis should give an overview of the CIVL-C approach and highlight its strengths and weaknesses. Optionally, own experiments regarding precision of the approach can be performed.

Supervisor: Simon Schwitanski

Performance and energy modeling, the keys for energy-efficient computing

The energy costs of high-performance computing clusters have been increasing since decades because of increased energy consumption and raised electricity price. To reduce the costs and compute environmentally friendly, the energy efficiency of an HPC cluster needs to be improved. Under this condition, a compromise between computing performance and energy has to be made that requires accurate models to predict proper runtime settings, like clock rate setting and the number of employed parallel instances.

Focus of this thesis is a comparison of several models. Which model is accurate and promising? Which model is hardware specific and which is applicable for general cases?

Supervisor: Bo Wang

Challenges of graph processing for HPC hardware

Many real-world use cases like social network analysis and road navigation is solved through graph processing. A graph to be processed may possess millions to billions vertices and edges. Thus, a real-time graph processing challenges the underlying hardware essentially. On the other hand, high-performance-computing (HPC) clusters have the most powerful computing components designed for scientific simulations. Since graph processing does not count to the conventional HPC problems, it has to be investigated whether available HPC hardware is suitable for graph problems.

In this thesis, you need learn how a graph can be processed and investigate how operations are characterized on the hardware-level. Then, you need draw conclusions about how the hardware needs be improved to accelerate graph processing.

Supervisor: Bo Wang

Processor variation: hardware failure or software mistake?

In a forest, no two leaves are alike. Similarly, no two processors of an HPC cluster are alike since modern processors possess a huge amount of complex features. Processors differ from each other in many different ways, regarding performance, power draw and etc. Once variation becomes noticeable, in particular if some processors perform worse than the others, the causes have to be clarified in order to tackle it.
In this thesis, you need examine variations with different aspects and analyze their causes. More importantly, a systematic process should be developed to handle a possible variation issue in the future.

Supervisor: Bo Wang

Quantifying and Analyzing Performance Portability

Having more diverse hardware architectures in current and future HPC systems, the importance increases of developing HPC envrionments and applications that are performance portable across different platforms. However, one challenge in following this target is the quantification of performance portability.  

In this thesis, you should investigate metrics to quantify performance portability. Furthermore, you should present results from different studies that use the performance portability metric by Pennycook et al and analyze its tradeoffs.

Supervisor: Sandra Wienke

XcalableACC - Performance and Productivity (*Update 6.9.2019*)

XcalableACC is a parallel programming model that combines XcalableMP (XMP) and OpenACC directives for using accelerators clusters. An implementation is currently given in the OmniCompiler.

In this seminar, an overview of XcalableACC shall be given as. Furthermore, an evaluation of its performance and its productivity shall be conducted. Optionally, benchmarks can be run on the IT Center's GPU compute nodes to include own experiences.

Supervisor: Sandra Wienke

Effects of Spectre and Meltdown Patches on HPC Applications

Spectre and Meltdown exploit speculative execution on hardware architectures to bypass certain security restrictions. Corresponding patches for HPC systems have an impact on performance though.

In this thesis, you should give insights how Spectre and Meltdown work and what their impact is when corrsponding patches are applied to an HPC system. You should cover benchmark results as well as results from HPC applications.

Supervisor: Sandra Wienke

Methods and best practices for user journey mapping within distributed services [This thesis must be written in English] (*Update 8.7.2019*; This topic is not available for assignment!)

In academia, researchers go through many research phases where they interact with big sets of research data within many distributed services. On one hand, researchers often are unaware of available relevant data that can be reused in order to increase their performance and assist them with knowledge discovery. On the other hand, users leave traces of their interaction while employing parallel systems to fulfill their requirement and therefore achieving their goals. The aim is, to identify suitable algorithms and methods that are able to handle concurrency in a user journey within distributed systems.

In this thesis, you need to conduct a systematic literature review and investigate how to extract a process model from the user journey within distributed services? and what are the suitable methods to identify users' intentions within this context?

Supervisor: Amin Yazdi

Ideal process abstraction methods for heaps of user activities in distributed services [This thesis must be written in English] (*Update 8.7.2019*)

Daily, there are many user interactions with heterogeneous and distributed services that are interconnected to one another. In such a corporate setting, users often employ multiple services to accomplish their tasks. By collecting the information over user activities, one can immediately encounter a vast amount of data that also include noise and outliers. This would prevent scientists from running analysis to obtain necessary conclusions and create a sound process model. Hence, besides the process of purification of data, it is essential to abstract the gathered data. Yet, in contrarily, process abstraction should not neglect infrequent important activities in the discovered model.

In this thesis, you need to conduct a systematic literature review to identify the different process abstraction methods, techniques, and algorithms along with their characteristics. Further, suggest and reason for the most suitable process abstraction technique for the distributed systems.

Supervisor: Amin Yazdi

Network induced Performance Variability of multinode HPC jobs

 HPC applications with network utilization increasingly show performance variability between runs. While beeing potentially caused by a variety of reasons, this poses significant challenges for job schedulers,application developers and their respective users.
Focus of this Seminar thesis is to give an overview over attempts of characterizing/measuring these delays, different solution attempts and their respective success.

Supervisor: Daniel Schürhoff


Predicting Runtime and IO of HPC jobs

Current HPC Schedulers only take into account the user specified job characteristics like estimated runtime, memory and core requirements.
This neglects shared resources like network file systems and communication networks
Focus of this Seminar thesis is to give an overview over methods of predicting a job's IO behaviour, how shedulers can benefit from this information and how accurate these predictions have to be in order toachieve measurable benefits

Supervisor: Daniel Schürhoff


Automatic fault detection in HPC systems (Resilience in HPC applications)

"Long-running exascale computations would be severely affected by a variety of failures, which could occur as often as every few minutes.
Therefore, detecting and classifying faults in HPC systems as they occur and initiating corrective actions through appropriate resiliency techniques before they can transform into failures will be essential for operating them."
This thesis shall give an overview of the field.

Supervisor: Daniel Schürhoff


Daniel Schürhoff
Simon Schwitanski
Bo Wang
Sandra Wienke
Amin Yazdi



  • No labels