Current Topics in High-Performance Computing (HPC)

Content

High-performance computing (HPC) is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g. GPUs). Leveraging these systems, parallel computing with e.g. MPI, OpenMP or CUDA must be applied.

This seminar focuses on current research topics in the area of HPC and bases on conference and journal papers. Topics might cover e.g. parallel computer architectures (multicore systems, Xeon Phis, GPUs etc.), parallel programming models, performance analysis & correctness checking of parallel programs or performance modeling.

Schedule

The topics are assigned at the beginning of the lecture period (April 25th, 2016, 9-10.30am). Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day at the end of the lecture period or at the beginning of the exam period. Attendance is compulsory.
More information in L²P: https://www3.elearning.rwth-aachen.de/ss16/16ss-22769

Requisites

The goals of a seminar series are described in the corresponding Bachelor and Master modules.
In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 1-3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.
Prerequisites
The attendance of the lecture "Introduction to High-Performance computing" (Müller/ Bientinesi) is helpful, but not required.
Language
We prefer and encourage students to do the report and presentation in English. But, German is also possible.
Topics
Some topics are already described below to get an idea of the range of topics. A comprehensive description of all topics is coming soon. For more information on typical topics, have a look at the last semersters seminar topics.

On-the-fly Data Race Detection in OpenMP Programs

OpenMP is the de facto standard for parallel shared memory programming. It provides a portable way to develop scalable applications by using nested, task or simple loop-level parallelism. However, incorrect data scoping in OpenMP programs easily lead to one of the most notorious class of concurrency bugs: data races. In principle, on-the-fly data race detection can be done with a happens-before analysis, a lockset analysis or a hybrid approach. Unfortunately, especially the happens-before analysis is quite difficult to implement efficiently and all methods might lead to false positives or false negative reports. This seminar article and talk is expected to provide a general overview and discussion of the existing different techniques for data race detection with the focus on OpenMP programs. Furthermore, it is expected to describe the potential improvements in terms of accuracy and efficiency and of the implementation.
Tim Cramer

Prescriptive Debugging of Parallel Programs at a Large Scale

Parallel Programming is known to be error-prone. While tools for traditional interactive debugging exist, this approach lacks on scalability due the centralized nature of today's supercomputers. The proposed alternative of lightweight debugging scales well, but can only be used on a subset of bug classes. In order to fill this gap between these two different approaches, a more user-guided model can be used in future. This "prescriptive" approach allows the programmers to express and test theirs intuition and reduce the error space in this way. This seminar article and talk is expected to provide a general overview of the prescriptive parallel debugging model and discuss the usability of such an approach.
Tim Cramer

Measurement of Energy Consumption in High Performance Computing

As machines get bigger, the energy consumption and climate impact is getting more important. During the last decade, Green Computing has become a major topic in HPC. Different approaches and technologies can be used to measure the energy consumption of the execution of applications. Multiple metrics exist to measure energy efficiency. This seminar article and talk is expected to describe different metrics related to energy consumption and efficiency. Additionally, the used software and hardware products need to be presented.
Thomas Dondorf

Optimization of Energy Consumption of HPC Applications

In most cases, the execution of HPC applications is optimized regarding speed. Approaches to improve the energy efficiency often concentrate only on the hardware. However, not only the hardware can be optimized, but also the software. Most MPI implementation optimize for speed, the energy efficiency is neglected. The usage of different implementations can lead to a higher energy efficiency with only minimal speed penalties. This seminar article and talk is expected to describe different approaches to optimize the energy consumption of HPC applications on the software level.
Thomas Dondorf

An overview of in-situ data analysis techniques (english speaking supervisor)

Traditionally, scientific simulations generate large amount of output data that is read and analyzed later by a different application. However, large-scale simulations can produce terabytes-to-petabytes of output data, straining the I/O and storage subsystem of the HPC system. In order to mitigate the I/O bottleneck, leadership simulations and systems have begun to use in-situ data analysis, in which the output data is analyzed “live”, as it is produced by the simulation. The analysis software can reside either on the same HPC system and use the same resources as the application or can be co-scheduled on a separate analysis-oriented system. The seminar will look into the state-of-the-art practices in in-situ data analysis, presenting an overview of the employed techniques and also comparing them.
Aamer Shah

Emerging tools for on-node memory analysis (english speaking supervisor)

Modern HPC systems now employ tens of cores on a single compute node. Such configurations allow increased parallelism in applications by employing more threads. However, it also leads to complex memory architecture and complicated memory sharing on a single node. Memory bottlenecks can frequently arise on such fat nodes, impeding the scalability of an application. Recent generations of processors provide performance monitoring units (PMUs) that can collect statistics about program execution to identify such bottlenecks. However, the PMUs provide low level information and by-themselves are not sufficient for analysis. The HPC community has developed several tools and techniques that utilizes the PMUs, along with other contextual information, to better analyze on-node memory performance. The seminar will look at these emerging on-node memory analysis tools, providing an overview and an assessment.
Aamer Shah

Computing under a power capping

As supercomputers (clusters) get bigger, more power is needed. At some point, the infrastructure cannot supply enough power as needed, therefore the whole cluster cannot perform with its peak performance, i.e. there is a power capping for an overprovisioned cluster. Under this capping, a core concern is how to maximize the throughput of the cluster. For that, technologies to cap the power as well as strategies to maximize the throughput should be investigated deeply. Some works have already been done.
Bo Wang

Hardware Technologies to Make Processors Energy Efficiently
Processor, as the central processing unit of a computer, performs not only the most operations, but also consumes the most energy, more than 50% of the overall consumption. Improving the processor’s energy efficiency will reduce the energy cost enormously. Besides general hardware improvements from generation to generation, special technologies have been developed making processor more energy efficiently, such as DVFS, dynamic cycle modulation, power capping and so on. However, each technology needs different inputs and has different side effects. A know-how about these technologies is the basis for a reasonable utilizing.
Bo Wang


DASH: Data Structures and Algorithms with Support for Hierarchical Locality

Data locality is important when parallelizing applications. The C++ standard library containers don't provide mechanisms to respect data locality and distribute the data according to the thread placement. DASH provides a template based abstraction to allow data locality even for distributed memory systems based on a PGAS approach. This seminar work will give an overview about the DASH approach and the development in the last years. Experiments with this paradigm are welcome.
Joachim Protze

An Efficient Algorithm for On-the-Fly Data Race Detection Using an Epoch-Based Technique

Data races represent the most notorious class of concurrency bugs in multithreaded programs. To detect data races precisely and efficiently during the execution of multithreaded programs, the epoch-based FastTrack technique has been employed. However, FastTrack has time and space complexities that depend on the maximum parallelism of the program to partially maintain expensive data structures, such as vector clocks. This paper presents an efficient algorithm, called iFT, that uses only the epochs of the access histories. Unlike FastTrack, the algorithm requires O(1) operations to maintain an access history and locate data races, without any switching between epochs and vector clocks. We implement this algorithm on top of the Pin binary instrumentation framework and compare it with other on-the-fly detection algorithms, including FastTrack, which uses a state-of-the-art happens before analysis algorithm. Empirical results using the PARSEC benchmark show that iFT reduces the average runtime and memory overhead to 84% and 37%, respectively, of those of FastTrack
Joachim Protze


Methods and concepts for parallelization and modernization of legacy Fortran applications

Fortran is one of the most widely used languages in the field of HPC. Legacy Fortran applications still serve a useful purpose. To make these applications furthermore effectively on modern multicore processors and manycore accelerators a parallelization of them is needed. Moreover it is important to improve reusability, maintainability and extensibility of legacy applications. Some strategies to refactor and parallelize serial Fortran 77 applications are using of object-oriented and coarray features of Fortran 2003 and 2008. Another way to improve the power of Fortran applications is extending them with Meta-programming to allow the developers better manipulating of their programs. It is expected detailed describing of these strategies to show the effort of legacy Fortran modernization.
Alesja Dammer 


Benchmarking of Parallel Systems for Scientific Computing

Performance measurements and reports for parallel computing systems are important for HPC, especially to demonstrate quality of research and improvements. Goal of this seminar topic is to summarize best practices and guidelines to analyze and present scientific results. Related topics are experimental design, statistical relevance and reproducibility of results.
Pablo Reble 

Instructors

Prof. Matthias S. Müller
Tim Cramer
Alesja Dammer
Thomas Dondorf
Joachim Protze
Pablo Reble
Aamer Shah
Bo Wang

Contact: contact@hpc.rwth-aachen.de
  • No labels