HPC December Workshop PartI HPC Tuning Workshop
Monday, Dec 6 - Wednesday, Dec 8 2010
Center for Computing and Communication
09:00 - 18:00
|Presentations, Hands-on, bring in your own code!|
On Windows & Linux systems.
Participation upon individual consultation only!
|09:00 - 18:00|
09:00 - 18:00
There will be a Social Dinner on Wednesday December 08 at 7p.m at Restaurant Elisenbrunnen.
A major part of the RWTH Compute-Cluster comprises of 192 nodes equipped with the latest Intel processors ("Nehalem-EP"). The Intel Xeon 5570 processors (Codename “Nehalem”) are Quadcore Processors where each core can run 2 hardware threads (HyperThreading). Each processor has its own memory controller and is connected to a local part of the main memory. The processors can access the remote memory via Intel's new interconnect called “Quick Path Interconnect”. So these machines are the first Intel machines which build a ccNUMA architecture. This processor type will be the mainstay of our cluster for the foreseeable future and it has many new features to take advantage of.
- This Tuning Workshop consists of presentations and hands-on sessions:
A series of presentations teach you more about tuning, especially on Nehalem EP & EX processors.
During extended hands-on sessions we want to give a selected number of projects an opportunity to improve the performance of their codes. Experts from Intel and from the HPC Team of the RZ will be there to assist you. We will reserve a few Nehalem-based machines running Linux and Windows for your experiments.
Performance tuning is still often a matter of some experimentation, but we can give you advise on a best effort basis. Hopefully this will lead to a noticeable performance improvement, but guarantees cannot be given.
When parallelizing an application, it is important to have tuned for single processor ("serial") performance. Otherwise, one can more quickly run into scalability problems. Therefore most of the focus will be on serial performance, but we will also consider shared memory parallelization with OpenMP where relevant and desired.
To maximize the efficiency of the workshop, we would like to ask you to prepare a test case that reflects a typical production run, but does not take too long to execute. In the ideal case, a run should not take more than 5 to 10 minutes to finish.
It is also important to have an easy way of verifying that the results of this test run are correct.
Attendees are kindly requested to prepare and bring in their own code. It is recommended to have good knowledge in the programming language (C/C++/Fortran) of the code and basic knowledge of multi-threading parallelization paradigms, especially OpenMP, and if necessary MPI. The presentations will be given in English. Windows as well as Linux systems will be used during the Hands-on sessions. Assistance in porting your application to the Nehalem-Cluster prior to the event will be available, if asked for.
- Introduction to Nehalem microprocessor architecture, covering memory subsystem, caches, latencies of caches and memory
- Strategies for code tuning (single core)
- Performance analyzer tools (single core) like VTune and/or PTU
- Optimization examples (cache optimization, vectorization)
- Optimization of compiler settings
- Parallel efficiency analyzer tools like Intel Trace
- Scalability improvement examples
- Intel MPI runtime Environment
AgendaMonday, December 06
09:00 - 10:00 CPU Architecture Refresher (generic)
Wednesday, December 08
- Christopher Dahnken (Intel): Processor Design (PowerPoint)
- Christopher Dahnken (Intel): PTU Guide (PDF)
- Christopher Dahnken (Intel): Thread Profiler (PDF)
Be sure to also consider part II of our december workshop: The Array Building Blocks Tutorial (Intel Ct)