Maintenance window Wednesday 2017-01-04 -- FINISHED
We start at 0900 hours.
This time we will:
- Upgrade Slurm and other system software on Milou, Fysast1 and Tintin
- Upgrade firmware on Milou and Fysast1
The firmware upgrade requires power cycling so Slurm queues are stopped. Queued jobs will start after the maintenance.
Login nodes on Fysast1, Milou, and Tintin will be rebooted once during the day (we will warn an hour ahead). Slurm commands. like sbatch and jobinfo will not be available all the time.
We will not stop Slurm queues on Tintin, Irma and Bianca. Maintenance on Irma and Lupus will be done next week, January 11th.
This page will be updated during the maintenance, to keep you informed about our progress.
We plan to finish before evening (today, Wednesday).
Update at 1120 hours
Maintenance work continues.
Slurm has already been upgraded. Login nodes will probably be restarted at 1300 hours.
Update at 14:20 hours
Login nodes are upgraded and have restarted successfully.
Firmware upgrade continues.
Update at 17:00 hours
Most nodes on Milou and Fysast1 are successfully upgraded and back in production. The remaining nodes will be released later, when their upgrades are completed.
Update on Thursday at 1020 hurs
We need to make a second change to the Slurm installation, meaning that Slurm commands will not be available all the time. Jobs will keep running.
Now, we guess that we are finished with the maintenance sometime during today's afternoon.
Update on Thursday at 1530 hours
Still working with the Slurm upgrade. Still thinking that we will finish before evening.
Update on Thursday at 1640 hours
Slurm has been upgraded on Tintin, Milou and Fysast1. Nodes are still rebooting and are planned to be back in production within 2 hours.
Firmware upgrade failed on 4 (out of 26) chassis. This has been reported to the manufacturer for further troubleshooting. As a consequence, 32 Milou nodes are out of production, until this is solved. For the remaining chassis, the firmware upgrade was successful.
Problem with Slurm on Milou -- fixed
Interrupts in Slurm service on Rackham -- fixed
Bianca's storage system Castor has problems -- fixed
Resetting your password from the homepage is not working --fixed
Resetting your password from this page is currently not working. If you need to reset your password please contact email@example.com
Update 2017-04-18: This issue should now be fixed.
Funk-accounts and new certificates
Some of the shared funk-accounts used on Irma and Milou might stop working due to the IP-address change.
Maintenance window Wednesday 2017-04-05 -- finished
Smog will be decommissioned on Wednesday 5th of April
Smog will be decommissioned on Wednesday 5th of April. As previously mentioned the SNIC Cloud Team is currently working on bringing up a new cloud to replace Smog and join the other two regions in the SNIC Science Cloud project.
For questions ,please contact firstname.lastname@example.org (and not the UPPMAX support queues).
Rackham2, one of Rackham's login nodes, got into problems -- now fixed
Maintenance window for Bianca Wednesday 2017-03-22 -- finished
Problem with file permissions in certain projects
Poor performance using Intel MPI on Rackham
We have idenfied performance issues when using Intel MPI on Rackham. In some cases you see a 10x slowdown (or worse) using Intel MPI compared to Open MPI. We are investigating this issue and hope to have it solved soon. For now, please use Open MPI.
Fixed: "Project p123456 may not run jobs on this cluster (rackham)"
An issue exist on Rackham affecting projects of the form "p123456". The projects are not allowed to run due to the monthly core allocation incorrectly being set to 0 hours. We are investigating why this happens.
Update 2017-03-10: The issue should now be fixed.
Rackham will soon be open for all users
Many Tintin users have missed that Rackham will replace Tintin. We are currently migrating all projects from Tintin to Rackham and when this is done, all users will get access to Rackham. We will announce this per email and on our homepage.
Maintenance window Wednesday 2017-03-01 -- finished
Today we decommission Tintin
1st of March 2017 is the day we decommission Tintin. It will be replaced by the Rackham cluster. All projects on Tintin will be moved to our new Rackham cluster.
Poor performance on Milou and Tintin