Rackham is now available
We are happy to announce that UPPMAX's cluster Rackham is now available!
Rackham is now available for all local projects with names starting with "snic2017", and for all new course projects.
First of March we will decommission Tintin, and begin to move all Tintin projects to Rackham. This migration will probably be finished within a few days.
Rackham consists of four login nodes and 304 compute nodes. Each compute node contains two 10-core Intel Xeon CPUs together with 128GB ("thin") or 256GB (fat") of memory. Your project data will be stored on Crex, Rackham's storage system, currently capable of storing 1PB of data. Crex is a high performance file storage system from DDN that uses the Lustre filesystem.
If you are used to Tintin, we ask you to pay attention to the following
* More nodes!
Rackham has 304 nodes (with more on its way!) while Tintin
in the end only had 150 nodes. Do not however assume that Rackham's nodes are identical to Tintin's, they're not. You will find that on Rackham fewer nodes are needed to perform the same work and you will need to adjust your job scripts accordingly.
* More cores!
Rackham has 20 cores per node, a 25% increase from Tintin's 16 cores per node. Remember that when scheduling your node jobs! For the tech interested users: Each Rackham core is an Intel Xeon E5-2630 v4 2.2GHz CPU with 25MB shared memory and a maximum turbo frequency of 3.1 GHz. If the previous sentence means nothing to you don't worry - the only thing you'll need to know is that your core jobs will finish much faster due to the newer generation of CPUs.
An important mention is that if you've built your own applications tailored for Tintin's AMD Bulldozer CPUs you will need to recompile on Rackham to take advantage of the Intel CPUs. Tip: Try compiling using the Intel compilers and tools from "module load intel" and you will likely see a jump in performance. Remember, faster code equals less compute time and less billing of your project core hours.
* More memory!
Each node comes with 128GB of memory (or 6.4GB per core) vs. Tintin's 64GB (or 4GB per core). For the most memory intensive applications you may also request up to 32 fat nodes each containing 256GB of memory.
The biggest differences you will find having your project directory on Crex instead of Pica is:
* No .snapshot directory. If you lost a file you need to contact firstname.lastname@example.org and we will reach into the backups. The .snapshot directory as previously found inside any directory of your project is no longer supported (for you home directory, the .snapshot is still available.)
* Smaller initial storage for your project data. The default size of the project and nobackup areas will be 128GB in total. Applying for more data if needed will be possible.
* We no longer support Webexport.
For Fysast1, Milou, and Tintin, UPPMAX provides a webexport service:
The service is based on some storage space on Pica, that will not be available on Rackham. Rackham has no available space for the webexport service, so it will not be provided.
Lastly, how do you get access to Rackham? If you already have a project on Tintin, UPPMAX will migrate it to Rackham in the beginning of March.
If you don't have a Tintin project and are interested in working on Rackham and Crex, you may apply for a SNIC-project on https://supr.snic.se/round/2017smalluppmax/.
* A note on Software
For a complete list of currently installed software please run after logging in:
As on Milou, you can search for modules with the "module spider" command:
module spider name-of-software
The list of available software will be updated in the coming weeks. At this time we have most of the compilers (icc, mpicc, gcc, gfortan and javac) and interpreters (Python, Perl, R) and software (MATLAB, GAUSSIAN, COMSOL, RStudio) installed. OpenFOAM, VASP and GROMACS have been scheduled for installment and will soon be available. If you are missing software and are unable to install it yourself, you may ask for support at email@example.com.
We look forward hearing your thoughts and feedback on Rackham!
Problem with Slurm on Milou -- fixed
Interrupts in Slurm service on Rackham -- fixed
Bianca's storage system Castor has problems -- fixed
Resetting your password from the homepage is not working --fixed
Resetting your password from this page is currently not working. If you need to reset your password please contact firstname.lastname@example.org
Update 2017-04-18: This issue should now be fixed.
Funk-accounts and new certificates
Some of the shared funk-accounts used on Irma and Milou might stop working due to the IP-address change.
Maintenance window Wednesday 2017-04-05 -- finished
Smog will be decommissioned on Wednesday 5th of April
Smog will be decommissioned on Wednesday 5th of April. As previously mentioned the SNIC Cloud Team is currently working on bringing up a new cloud to replace Smog and join the other two regions in the SNIC Science Cloud project.
For questions ,please contact email@example.com (and not the UPPMAX support queues).
Rackham2, one of Rackham's login nodes, got into problems -- now fixed
Maintenance window for Bianca Wednesday 2017-03-22 -- finished
Problem with file permissions in certain projects
Poor performance using Intel MPI on Rackham
We have idenfied performance issues when using Intel MPI on Rackham. In some cases you see a 10x slowdown (or worse) using Intel MPI compared to Open MPI. We are investigating this issue and hope to have it solved soon. For now, please use Open MPI.
Fixed: "Project p123456 may not run jobs on this cluster (rackham)"
An issue exist on Rackham affecting projects of the form "p123456". The projects are not allowed to run due to the monthly core allocation incorrectly being set to 0 hours. We are investigating why this happens.
Update 2017-03-10: The issue should now be fixed.
Rackham will soon be open for all users
Many Tintin users have missed that Rackham will replace Tintin. We are currently migrating all projects from Tintin to Rackham and when this is done, all users will get access to Rackham. We will announce this per email and on our homepage.
Maintenance window Wednesday 2017-03-01 -- finished
Today we decommission Tintin
1st of March 2017 is the day we decommission Tintin. It will be replaced by the Rackham cluster. All projects on Tintin will be moved to our new Rackham cluster.
Poor performance on Milou and Tintin