The IT Center operates high performance computers in order to support institutions and employees in terms of education and research.
All machines are integrated into one “RWTH Compute Cluster” running under the Linux operating system.
General information about usage of the RWTH Compute Cluster is described in this area - whereas information about programming the high performance computers is described in RWTH Compute Cluster - Parallel Programming.
All members of the RWTH Aachen University have free access to the RWTH Compute Cluster. But the amount of resources they can use is limited.
Above a certain threshold applications for more resources have to be submitted which then are reviewed. This application process is also open to external German scientists in institutions related to education and research. Please find related information here.
Please find information about how to get access to the system here.
You can get information about using and programming the RWTH Compute Cluster online on this website, or during our HPC related Events. For many of these Events, particularly turorials, we collect related material on our web site as well - see here. And then there are regular lectures, exercises and software labs of the Chair for HPC covering related topics.
Dear user of the RWTH Compute Cluster,
- The old BULL-Cluster from 2011 will be decommissioned on April 30, 2019.
- All users working on BULL machines, all user and project accounts using BULL machines have to migrate to the new CLAIX-2018 cluster with SLURM instead of LSF and an updates software environment
- Four well-known front ends will cease their work and go offline.
- The LSF batch system will be discontinued and replaced by SLURM.
- Projects allocated on CLAIX-2016 will remain on this partition, but need to employ SLURM instead of LSF, and use the updated software environment
What does it mean to migrate to CLAIX-2018?
You just login to one of the other front-end nodes that are part of CLAIX-2018. On these nodes, LSF commands will no longer be available and you have to use SLURM commands to submit and control your batch jobs. You will need to adapt your batch jobs from LSF to SLURM.
On these nodes an updated software environment is rolled out. It is very likely that you need to recompile/relink your applications with new compiler and MPI library versions that are loaded into your environment by the familiar module system. (By the way, recompilation is also highly recommended for performance reasons.)
From LSF to Slurm.
The LSF Workload Management System (Batch System) will be decommissioned on April 30, 2019. Therefore, starting on April 15, the execution time of LSF-Jobs will be limited to 24 hours maximum. Jobs that will not be finalized until April 30, 7:00 am will be stopped and deleted.
LSF will be replaced by the new batch system called “SLURM”. Thus, you need to adapt all of your batch job scripts and use a new set of commands to submit and control your batch jobs.
SLURM is different from LSF. Consequently, not all features to which you may be used are available with SLURM. As an example, requesting "candidate hosts" for a job is no longer possible.
On the other hand, SLURM provides additional features. You can now login directly to the node on which your own job is running using SSH.
You can find detailed information about SLURM at https://doc.itc.rwth-aachen.de/x/U4AtAg .
Currently, the script r_batch_usage does not yet reflect the resources, which have been consumed under SLURM. We are still in the process of adapting the script to SLURM.
New Front-End Nodes.
The BULL Front-end nodes cluster, cluster-linux, cluster-x, cluster-x2 will be decommissioned until April 30, 2019.
New front-end nodes as part of the CLAIX-2016 and CLAIX-2018 clusters have replaced them.
Their names begin with
- login, login2, login-g, copy (CLAIX2016)
- login18-1, login18-2, login18-3, login18-4, (CLAIX18)
login18-x-1, login18-x-2, (CLAIX 18 w/GUI)
login18-g-1, login18-g-2, (CLAIX18 w/GPUs)
copy18-1, copy18-2 (CLAIX2018 data movement)
and end with the domain *.hpc.itc.rwth-aachen.de
You can find an overview over all dialog and compute nodes at
More information on CLAIX-2018
- Cluster nodes each provide 48 cores and 192 GB of main memory. If your program does not profit from that many cores, it may be advisable to run your batch job in shared mode - thus, in such a case do not turn on exclusive mode explicitly.
- The CLAIX-2018 fabric for MPI communications is Omni-Path as it is for CLAIX-2016.
- We recommend recompiling your application in order to profit from the new hardware features of the SkyLake processors. We expect that some applications will not run without recompilation or at least relinking.
- We recommend to use new versions of the Intel compilers and MPI library. These are used automatically when you apply our default modules and environment variables.
They have been adapted to the new hardware.
(e.g.: openmpi/1.10.4 → intelmpi/2018 and intel/16.0 → intel/19.0)
- Many software packages have been updated. They will be loaded automatically with our module system.
- HOME, WORK and HPCWORK file systems are accessible as usual.
Please carefully document any issues you may encounter and report them to our ServiceDesk. (mailto:firstname.lastname@example.org).
Previous blog post: Status RWTH Compute Cluster 2019-03-28