Service Description


The IT Center operates high performance computers in order to support institutions and employees in terms of education and research.

All machines are integrated into one “RWTH Compute Cluster” running under the Linux operating system.

General information about usage of the RWTH Compute Cluster is described in this area -  whereas information about programming the high performance computers is described in RWTH Compute Cluster - Parallel Programming.

All members of the RWTH Aachen University have free access to the RWTH Compute Cluster. But the amount of resources they can use is limited.

Above a certain threshold applications for more resources have to be submitted which then are reviewed. This application process is also open to external German scientists in institutions related to education and research. Please find related information here.

Please find information about how to get access to the system here.

You can get information about using and programming the RWTH Compute Cluster online on this website, or during our  HPC related Events. For many of these Events, particularly turorials, we collect related material on our web site as well - see  here. And then there are regular lectures, exercises and software labs of the Chair for HPC covering related topics.

Users of the RWTH Compute Cluster will continuously be informed through the HPC Mailinglist (registration, archive )

Maintenance Information


RWTH Störungsmeldungen
Störungsmeldungen für Dienste der RWTH Aachen
Important news - RWTHCC/CLAIX2016
Hinweis von Donnerstag 11.04.2019 11:15 bis Samstag 04.05.2019 12:15 - Bitte beachten Sie die folgenden Informationen zur größeren Änderungen beim Dienst Hochleistungsrechen: https://doc.itc.rwth-aachen.de/display/CC/2019/04/11/IMPORTANT+NEWS+for+users+of+the+RWTH+Compute+Cluster%3A+Major+Operational+Changes+on+May+1%2C+2019
Migration LSF -> SLURM, Maximale Laufzeit 24 Stunden
Änderung von Montag 15.04.2019 00:00 bis Dienstag 30.04.2019 07:00 - Um den Cluster mit LSF möglichst problemlos migrieren zu können, ist ab diesem Zeitpunkt nur nach das Submittieren von Jobs möglich, die maximal 24 Stunden dauern.
Teilstörung Lustre18 - Lustre18
Teilstörung von Montag 22.04.2019 16:20 bis unbekannt - Aufgrund von Problemen zweiter OSS Server des neuen Lustre Dateisystems von Claix2018 kann es zu Dateizugriffsproblemen kommen. An der Störungsbehebung wird bereits gearbeitet.
Migration LSF -> SLURM
Änderung von Dienstag 30.04.2019 07:00 bis Donnerstag 02.05.2019 16:00 - LSF wird abgeschaltet. Alle Jobs, die sich noch in den Queues befinden, werden ersatzlos gelöscht. Der BULL Cluster sowie die Frontends des BULL-Clusters werden abgeschaltet. CLAIX 2016 wird zu SLURM migriert. Alle verbliebenen Projekte werden zu SLURM migriert.

News


Dear user of the RWTH Compute Cluster,

  • The old BULL-Cluster from 2011 will be decommissioned on April 30, 2019.
    • All users working on BULL machines, all user and project accounts using BULL machines have to migrate to the new CLAIX-2018 cluster with SLURM instead of LSF and an updates software environment
    • Four well-known front ends will cease their work and go offline.
    • The LSF batch system will be discontinued and replaced by SLURM.
      • Projects allocated on CLAIX-2016 will remain on this partition, but need to employ SLURM instead of LSF, and use the updated software environment

 

What does it mean to migrate to CLAIX-2018?

You just login to one of the other front-end nodes that are part of CLAIX-2018. On these nodes, LSF commands will no longer be available and you have to use SLURM commands to submit and control your batch jobs. You will need to adapt your batch jobs from LSF to SLURM.

On these nodes an updated software environment is rolled out. It is very likely that you need to recompile/relink your applications with new compiler and MPI library versions that are loaded into your environment by the familiar module system. (By the way, recompilation is also highly recommended for performance reasons.)

 

From LSF to Slurm.

The LSF Workload Management System (Batch System) will be decommissioned on April 30, 2019. Therefore, starting on April 15, the execution time of LSF-Jobs will be limited to 24 hours maximum. Jobs that will not be finalized until April 30, 7:00 am will be stopped and deleted.

LSF will be replaced by the new batch system called “SLURM”. Thus, you need to adapt all of your batch job scripts and use a new set of commands to submit and control your batch jobs.

SLURM is different from LSF. Consequently, not all features to which you may be used are available with SLURM. As an example, requesting "candidate hosts" for a job is no longer possible.

On the other hand, SLURM provides additional features. You can now login directly to the node on which your own job is running using SSH.

You can find detailed information about SLURM at https://doc.itc.rwth-aachen.de/x/U4AtAg .

Currently, the script r_batch_usage does not yet reflect the resources, which have been consumed under SLURM. We are still in the process of adapting the script to SLURM.

 

New Front-End Nodes.

The BULL Front-end nodes cluster, cluster-linux, cluster-x, cluster-x2 will be decommissioned until April 30, 2019.

New front-end nodes as part of the CLAIX-2016 and CLAIX-2018 clusters have replaced them.

Their names begin with

  • login, login2, login-g, copy (CLAIX2016)
  • login18-1, login18-2, login18-3, login18-4, (CLAIX18)
    login18-x-1, login18-x-2,  (CLAIX 18 w/GUI)
    login18-g-1, login18-g-2, (CLAIX18 w/GPUs)
    copy18-1, copy18-2 (CLAIX2018 data movement)

and end with the domain *.hpc.itc.rwth-aachen.de

 

You can find an overview over all dialog and compute nodes at
https://doc.itc.rwth-aachen.de/x/soQp



More information on CLAIX-2018

  • Cluster nodes each provide 48 cores and 192 GB of main memory. If your program does not profit from that many cores, it may be advisable to run your batch job in shared mode - thus, in such a case do not turn on exclusive mode explicitly.
  • The CLAIX-2018 fabric for MPI communications is Omni-Path as it is for CLAIX-2016.
  • We recommend recompiling your application in order to profit from the new hardware features of the SkyLake processors. We expect that some applications will not run without recompilation or at least relinking.
  • We recommend to use new versions of the Intel compilers and MPI library. These are used automatically when you apply our default modules and environment variables.
    They have been adapted to the new hardware.
    (e.g.: openmpi/1.10.4 → intelmpi/2018 and intel/16.0 → intel/19.0)
  • Many software packages have been updated. They will be loaded automatically with our module system.
  • HOME, WORK and HPCWORK file systems are accessible as usual.

Please carefully document any issues you may encounter and  report them to our ServiceDesk. (mailto:servicedesk@itc.rwth-aachen.de).

 

 

Previous blog post: Status RWTH Compute Cluster 2019-03-28

Icon

All members of the RWTH Aachen University have free access to the RWTH Compute Cluster. But the amount of resources they can use is limited.

Above a certain threshold applications for more resources have to be submitted which then are reviewed.
This application process is also open to all German scientists in institutions related to education and research.

Page: General Page: Usage Page: FAQ Page: News archive