Writing Efficient Programs in C++


Time:        Presentations: Mon Oct 1, and Tue Oct 2 9:00 - 17:00
                 Practical Exercises: Thu, Oct 4 or Mon, Oct 8 or Tue, Oct 9, 9:00 - 17:00
Location:  Aachen University of Technology (RWTH),
                 Lecture Hall of the Center for Computing and Communication

Speakers: Jörg Striegnitz (Research Center Jülich, ZAM)
                  Lawrence Crowl (Sun Microsystems, C++ compiler group)

Because of the recent terror attack Senior levels of management at Sun have decided to suspend air travel for employees. As a result, Lawrence Crowl will be unable to make it to the workshop.


This course intends to make C++ programmers aware of the strengths and weaknesses of the C++ programming language in the field of scientific computing.

Classes, inheritance, operator overloading, and polymorphism are very suitable and powerful tools for programming at a high level of abstraction. Unfortunately, especially the use of these concepts is often very contrary to the expectation of high performance. Recently, new techniques have been developed that help to bring in line a high level of abstraction and high performance. Some of these concepts are template meta-programming, expression templates, traits classes, partial and lazy evaluation.

During this course the cost of C++ abstractions will be investigated and thoroughly explained; several solutions to overcome the abstraction penalty will be presented and applied during the exercises.

Special issues of the Sun C++ Compiler will be covered in more detail.

Attendees should already have some experience with C++.


The seminar is organized in cooperation with the Aachen University of Technology (RWTH) and the Research Center Jülich and sponsored by Sun Microsystems. There is no seminar fee. All other costs (e.g. travel, hotel, and consumptions) are at your own expenses.


Please register separately for the presentation part and the practical part!
Of course the presentation part is a prerequisite for the practical exercises.
We are currently offering two dates for the lab. The number of participants is limited to 32 for each.

We will put all the practical exercises and solutions in the web, and Jörg promised to offer assitance by e-mail, such that external participants have the opportunity to do the exercises remotely.

Please keep in mind that Oct 3 is a German holiday.
If you encounter problems accessing the database server, please send an email

Seminar Times:


  •  Presentations:
     Mon Oct 1    9:00 - 17:00
     Tue Oct 2    9:00 - 17:00
  •  Practical Exercises
     Thu Oct 4    9:00 - 17:00 or alternatively
     Thu Oct 8    9:00 - 17:00 or alternatively
     Tue Oct 9    9:00 - 17:00

Please keep in mind that Oct 3 is a German holiday.

Accommodation and general visitor information for Aachen:

Please make your own hotel reservation.
You may find a list of hotels on the web pages of the Aachen Tourist Service.
A few remarks about some of these hotels are collected here.


Getting to Aachen:

The web pages of the Aachen Tourist Service nicely explains, "how to get to" Aachen.
A detailed description of the location of the Computing Center is also available.
and a picture which shows, how to get to the Computing Center by car.
You may as well download a sketch of the city with some points of interest marked.

Further Information:



Additional Course Material, Solution of the Lab exercises:

Exercise 1 focusses the hidden usage of temporaries when overloaded operators are employed. An array class with a type and a size parameter is defined and the operation

Array <double,SIZE> res, a, b, c, d
res = a * b - c + d

is measured.
Various methods for avoiding temporaries have been discussed during the workshop and we compared the timing of some of the program versions developed during the lab sessions. We also used different C++ compilers installed on the SunFire 6800 system and included C and Fortran codes for comparison.
The following table contains the number of generated temporaries and the runtime in machine cycles per loop step. The array size was set to 500.

compiler: CC (Sun) KCC (KAI) g++ (GNU) g++ (GNU)
version: 6.2 4.0 2.95.2 3.0.1
 temps cycles temps cycles temps cycles temps cycles
1b 6 28.3 3 12.4 6 64.0 6 110.7
1b_gnu_return - - - - 3 40.7 3 82.2
1c 3 18.4 3 12.3 3 38.7 3 40.4
1d 6 39.6 6 39.4 6 92.4 6 125.5
1d_2 1 26.3 1 25.1 1 49.5 1 55.8
pete 0 15.1 0 6.1 0 30.0 0 215.5

The following Program versions have been measured

1b template array class with operator overloading and local temporary
1b_gnu_return template array class with GNU NRV (named return value) optimization
1c template array class with computational constructors
1d template array class with reuse of arithmetic assignment operators
1d_2 explicit usage of arithmetic assignment operators
pete expression templates using the PETE toolset

Measurements of similar C, Fortran77 and Fortran90 programs are included for comparison. It can be seen that with a good C++ compiler - here KCC in combination with the native Sun C compiler - and the usage of expression templates - here using the PETE toolset - the same performance than with C or Fortran can be achieved. Whereas in C and in Fortran77 the array instructions have to be coded with loops, Fortran90 offers the array syntax as intrinsic language elements. Because on the UltraSPARC-III processor up to one memory operation can be issued per cycle, the minimu number of cycles per loop step is 5 in this example. So we are close to the optimum. If the data is not in the L1 cache, the number of cylces per loop step will raise.
In some cases KCC seems to offer the best high level optimizations. The Sun compiler generates much better code than the public domain g++ compiler. In particular the new g++ version 3.0.1 performs worse than the older 2.95.2 one. Unfortunately the Sun compiler did not compile the code generated by PETE.

compiler: cc (Sun) f77 (Sun) f90 (Sun) g++ (GNU)
version: 6.2 6.2 6.2 2.95.2
 cycles cycles cycles cycles
c 6 - - 24.2
f77 - 7.3 - -
f90 - - 6.6 -

The following compiler options have been used.

Compiler version options
CC (Sun) 6.2 -fast -xarch=v8plusb -xchip=ultra3
KCC (KAI) 4.0 +K3 --backend -fast --backend -xarch=v8plusb --backend -xchip=ultra3
g++ (GNU) 2.95.2 -O6 -mv8
g++ (GNU) 3.0.1 -O6 -mv8plus

See this shar file for all the timing examples and the makefile.


Dieter an Mey
Email: anmey@rz.rwth-aachen.de
Phone: +49 241 804377


  • Keine Stichwörter