Current Topics in High-Performance Computing (HPC)

Content

High-performance computing (HPC) is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g. GPUs). Leveraging these systems, parallel computing with e.g. MPI, OpenMP or CUDA must be applied, while meeting constraints on power consumption.
This seminar focuses on current research topics in the area of HPC and bases on conference and journal papers. Topics might cover e.g. parallel computer architectures (multicore systems, Xeon Phis, GPUs etc.), parallel programming models, performance analysis & correctness checking of parallel programs, performance modeling or energy efficiency of HPC systems.

Schedule

This seminar belongs to the are of applied computer science. The topics are assigned at the beginning of the lecture period (18th October 2017, 9am - 10.45am). Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day at the end of the lecture period or at the beginning of the exam period. Attendance is compulsory for the introductory event and the presentation block. More information in L²P: https://www3.elearning.rwth-aachen.de/ws17/17ws-29794

Requisites

The goals of a seminar series are described in the corresponding Bachelor and Master modules. In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 1-3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.
Prerequisites
The attendance of the lecture "Introduction to High-Performance computing" (Müller) is helpful, but not required.
Language
We prefer and encourage students to do the report and presentation in English. But, German is also possible.
Topics

Analysis of Cost-Effective Network Topologies

Today's supercomputers should not only deliver high performance, but they should also be energy- and cost-efficient. Improving networks of HPC systems is one possibility to improve cost-efficiency.  That is why HPC network topology designs are currently shifting from high-performance, higher-cost Fat-Trees to more cost-effective architectures, e.g., three diameter-two topology designs, the Slim Fly, Multi-Layer Full-Mesh, and Two-Level Orthogonal Fat-Tree.
This seminar article is expected to describe the cost-effective topologies, to discuss different routing algorithms and to compare the topologies using representative workloads.

Supervisor: Alesja Dammer

 

GASPI Communication Layer for Fault Tolerant Applications

Limitations of programming with MPI motivate HPC developers to look at other programming models to design codes that can run and scale on systems with hundreds of thousands of cores. GASPI, a Global Address Space Programming Interface, provides a partitioned global address space (PGAS) API. It focuses on three key objectives: scalability, flexibility and fault tolerance. Highly parallel software on exascale computers suffers from many more runtime failures. Applications should survive a failure and/or be able to recover with minimal cost. GASPI provides the missing features to allow the design of fault-tolerant applications.
This seminar article is expected to give a general overview of the GASPI API with its advantages and highlights, to give some comparisons of GASPI to MPI and to explain a building of a fault detector by using GASPI.

Supervisor: Alesja Dammer


An Automatic Classification of Parallel Kernels

Frequently used computational kernels are specified in the literature by parallel patterns and dwarfs. Identifying these patterns can increase the productivity of programmers by leveraging well-established and highly-optimized reference implementations. However, finding such patterns often requires lots of experience in the HPC domain for which reason an automatic classification of parallel patterns shall be investigated in this seminar thesis. Especially, a comparison of existing approaches and their accuracy shall be discussed.

Supervisor: Julian Miller


Evaluating HPC Code Quality

With the increasing heterogeneity of modern HPC systems, the software engineering (SE) process on such clusters tends to become more complex. Tools to measure the quality of a code are used in traditional SE to aid the developers but are yet to be adapted/ developed for HPC setups. Such tools could be used to find flaws in the code base regarding its correctness, performance, maintainability, robustness, and similar means early in the development cycle. This seminar thesis shall investigate the applicability of such tools to the HPC domain

Supervisor: Julian Miller


Modelling memory performance on Intel’s KNL Processor

Intel’s Knights Landing (KNL) Processor is a Manycore Chip with up to 72 processor cores. All cores are coupled cache coherently in a 2D grid topology. KNL is furthermore the first commercially available stand-alone processor with on-chip and off-chip main memory. Understanding the implications of such complicated design changes is a challenging task for programmers.
In this seminar thesis, you should give a description of the high-level design of the KNL, preset benchmark results for the architecture describing basic performance characteristics and describe implications on the performance of applications based on these benchmark results by showing a simple performance model.

Supervisor: Dirk Schmidl


Using data dependencies in OpenMP to improve NUMA-aware task scheduling

When OpenMP tasking used, the task scheduler inside of the OpenMP runtime has to decide where to execute which task. Typically, schedulers aim for a good load balancing between all threads to reach a uniform resource utilization and avoid idle threads. On non-uniform memory architectures (NUMA) also the locality of data has a high impact on the reached performance of an application. The scheduler often ignores this. Since version 4.0 of OpenMP a programmer can specify which data is read or written in a task with the depend clause. Although this feature was added to describe data dependencies between tasks for a more light-weight scheduling, the information can also be used by a scheduler to improve the data locality of tasks. In this thesis, you should describe the problem of task scheduling on NUMA architectures and elaborate on the idea to use the depend clause to improve task scheduling. Your own opinion if this is a suitable approach can be given at the end.

Supervisor: Dirk Schmidl


Performance Engineering with Hardware Metrics on Modern Multicore Processors

Understanding the performance characteristics of applications on ccNuma systems is key to writing and improving code to effectively utilize the capabilities of modern multicore processors. One way of doing this is employing hardware counters to get insight into the program during runtime. Choosing the correct metrics for the right purpose and interpreting measurement data carefully, one can gain information about the concrete performance pattern of an application.
Focus of this seminar topic is to give an overview over the techniques that are currently used, the common performance patterns and their characteristics and the corresponding knowledge gain and optimization potential.

Supervisor: Daniel Schürhoff


Utilizing Accelerators for Deep Neural Networks

Deep neural networks touch everything from image recognition to understanding and translating language. Generally employed for a number of tasks that are easy for humans to solve, but hard to write as an imperative program, one of the main concerns with complex networks is the inference performance of the trained networks. Utilizing the massive parallel capacities of e.g. GPGPUs one can achieve significant speedup and better efficiency.
Focus of this seminar topic is to give an overview over the computational challenges, how they benefit from accelerators and which combination of hardware is efficient in utilizing this.

Supervisor: Daniel Schürhoff


Are ARM Processors Ready for HPC?

Moving towards the exa-scale era, reducing the power consumption of HPC systems becomes more important. Recent investigations on improving energy efficiency cover the usage of clusters of low-power, low-budget ARM processors in the domain of HPC.
In this seminar thesis, it shall be examined whether ARM processors are ready for HPC. The thesis shall look at recent developments in research projects and technologies. Further, it shall give an overview on different aspects like tradeoffs between performance, energy, costs and usability as well as give a critical outlook on what might must be improved in future.

Supervisor: Sandra Wienke


Overview of PEZY-SC for High-Performance Computing

PEZY-SC is an accelerator-type processor with low-power envelope. The Omni OpenACC compiler supports OpenACC on this architecture.
In this seminar thesis, an overview of the PEZY-SC processor shall be given. It shall be evaluated how suitable it is for employing real-world applications in terms of programmability/usability, performance, energy and costs. The first includes an evaluation of the Omni OpenACC compiler and other choices of parallel programming models. An outlook on the follow-up PEZY-SC2 processor shall also be conducted with respect to Top500 and Green500 emergence and future development.

Supervisor: Sandra Wienke

 

Instructors

Alesja Dammer
Julian Miller
Dirk Schmidl
Daniel Schürhoff
Sandra Wienke

Contact: contact@hpc.rwth-aachen.de
  • No labels