Writing Efficient Programs in C++ - Tutorial and Workshop

Time:  Tutorial Mon, Sep 23, 14:00-17:30 and Tue, Sep 24 - Thu, Sep 26, 9:00 - 17:30, respectively.
                 Tuning Workshop: Fri, Sept 27, 9:00 - 12:30 (limited number of participants !)
Location: 
Speakers:Ruud van der Pas (Sun Microsystems, Application Performance Specialist HPC)
Jörg Striegnitz (Research Center Jülich, ZAM)

 


Tutorial

This tutorial intends to make C++ programmers aware of the strengths and weaknesses of the C++ programming language in the field of scientific computing.

Based on the good experience that we made last year, we want to extend this event to 3 1/2 days and append a half day tuning workshop at the end.
It will begin with a short introduction into basic performance tuning aspects and the Sun programming environment (compilers and performance analyzer) and will then focus on C++ specific programming techniques.

Classes, inheritance, operator overloading, and polymorphism are very suitable and powerful tools for programming at a high level of abstraction. Unfortunately, especially the use of these concepts is often very contrary to the expectation of high performance. Recently, new techniques have been developed that help to bring in line a high level of abstraction and high performance. Some of these concepts are template meta-programming, expression templates, traits classes, partial and lazy evaluation.

During this course the cost of C++ abstractions will be investigated and thoroughly explained; several solutions to overcome the abstraction penalty will be presented and applied during the exercises.

Special issues of the Sun C++ Compiler will be covered in more detail.

The tutorial is open to Sun customers, partners and employees.

Attendees should already have some experience with C++.

Tuning Workshop

In this tuning workshop we are particularly interested in helping users of our Sun Fire SMP Cluster to improve the efficiency of their applications.

You will have the opportunity to ask the experts for advise and help on tuning your application. As we plan to accomodate several representatives from the user community, we will have to time slice between the participants. Where needed, we will give advise and then ask you to try it, while we work with other attendees.
Performance tuning is still often a matter of some experimentation, but we can give you advise on a best effort basis. Hopefully this will lead to a noticeable performance improvement, but guarantees cannot be given.
When parallelizing an application, it is important to have tuned for single processor ("serial") performance first. Otherwise, one can more quickly run into scalability problems. Therefore most of the focus will be on serial performance, but we will also consider shared memory parallelization with OpenMP where relevant and desired.
To maximize the efficiency of the workshop, we would like to ask you to prepare a test case that reflects a typical production run, but does not take too long to execute. In the ideal case, a run should not take more than 5 to 10 minutes to finish.
It is also important to have an easy way of verifying that the results of this test run are correct.
Use of a make file to (re)build the application is highly recommended. If you need help with this set up, please contact us.

Cost:

The seminar is organized in cooperation with the Aachen University of Technology (RWTH) and the Research Center Jülich, and Sun Microsystems. There is no seminar fee. All other costs (e.g. travel, hotel, and consumptions) are at your own expenses.

Registration:

Registration for the Tutorial is mandatory until Sept 15.
We allocated additional places for the labs, so that we were able increase the number of participants.
Please, note as a remark if you rely on the talks to be given in English language.
Please, fill out the registration form carefully, as we will generate certificates of attendance automatically with these data.

There is no open registration for the Tuning Workshop. Participation is after personal consultation only.

Agenda:

Monday, Sept 2314:00 - 17:30Tutorial part IRuud van der Pas
Tuesday, Sept 2409:00 - 12:30Tutorial part IIJörg Striegnitz
14:00 - 17:30Lab exercises part IJörg Striegnitz, Ruud van der Pas
Wednesday, Sept 2509:00 - 12:30Tutorial part IIIJörg Striegnitz
14:00 - 17:30Lab exercises part IIJörg Striegnitz
Thursday, Sept 2609:00 - 12:30Tutorial part IVJörg Striegnitz
14:00 - 17:30Lab exercises part IIIJörg Striegnitz
Friday, Sept 2709:00 - 12:30Tuning WorkshopJörg Striegnitz, Ruud van der Pas, Dieter an Mey

Accommodation and general visitor information for Aachen:

Please make your own hotel reservation.
You may find a list of hotels on the web pages of the Aachen Tourist Service.
A few remarks about some of these hotels are collected here.

 

Getting to Aachen:

The web pages of the Aachen Tourist Service nicely explains, "how to get to" Aachen.
A detailed description of the location of the Computing Center is also available.
and a picture which shows, how to get to the Computing Center by car.
You may as well download a sketch of the city with some points of interest marked.

Further Information:

Videos (Jörg):

 

Additional Course Material, Solution of the Lab exercises:

 

 

Exercise 1 focusses the hidden usage of temporaries when overloaded operators are employed. An array class with a type and a size parameter is defined and the operation

Array <double,SIZE> res, a, b, c, d
res = a * b - c + d

is measured.
Various methods for avoiding temporaries have been discussed during the workshop and we compared the timing of some of the program versions developed during the lab sessions. We also used different C++ compilers installed on the SunFire 6800 system and included C and Fortran codes for comparison.
The following table contains the number of generated temporaries and the runtime in machine cycles per loop step. The array size was set to 500.

compiler: CC (Sun) KCC (KAI) g++ (GNU) g++ (GNU) g++ (GNU)
version: 7.0 4.0 2.95.2 3.0.13.2
 temps cycles temps cycles temps cycles temps cyclestemps cycles
1b 6 28.3 3 12.4 6 64.0 6 110.73 38.7
1b_gnu_return - - - - 3 40.7 3 82.23 22.2
1c 3 18.4 3 12.3 3 38.7 3 40.4  3 38.2
1d 6 39.6 6 39.4 6 92.4 6 125.56 64.7
1d_2 1 26.3 1 25.1 1 49.5 1 55.81 31.6
pete 0 15.1 0 6.1 0 30.0 0 215.50 22.1


The following Program versions have been measured

1b template array class with operator overloading and local temporary
1b_gnu_return template array class with GNU NRV (named return value) optimization
1c template array class with computational constructors
1d template array class with reuse of arithmetic assignment operators
1d_2 explicit usage of arithmetic assignment operators
pete expression templates using the PETE toolset


Measurements of similar C, Fortran77 and Fortran90 programs are included for comparison. It can be seen that with a good C++ compiler - here KCC in combination with the native Sun C compiler - and the usage of expression templates - here using the PETE toolset - the same performance than with C or Fortran can be achieved. Whereas in C and in Fortran77 the array instructions have to be coded with loops, Fortran90 offers the array syntax as intrinsic language elements. Because on the UltraSPARC-III processor up to one memory operation can be issued per cycle, the minimum number of cycles per loop step is 5 in this example. So we are close to the optimum. If the data is not in the L1 cache, the number of cylces per loop step will raise.
In some cases KCC seems to offer the best high level optimizations. The Sun compiler generates much better code than the public domain g++ compiler. In particular the new g++ version 3.0.1 performs worse than the older 2.95.2 one.

compiler: cc (Sun) f77 (Sun) f90 (Sun) g++ (GNU)
version: 6.2 6.2 6.2 2.95.2
 cycles cycles cycles cycles
c 6 - - 24.2
f77 - 7.3 - -
f90 - - 6.6 -


The following compiler options have been used.

Compiler version options
CC (Sun) 7.0 -fast -xarch=v8plusb -xchip=ultra3
KCC (KAI) 4.0 +K3 --backend -fast --backend -xarch=v8plusb --backend -xchip=ultra3
g++ (GNU) 2.95.2 -O6 -mv8
g++ (GNU) 3.0.1 -O6 -mv8plus

See this shar file for all the timing examples and the makefile.

Contact

 

Dieter an Mey

Tel.: +49 241 80 24377


 

 

Stand: 27.05.03

  • Keine Stichwörter