Maintenance window Wednesday 2017-06-07 -- FINISHED
Maintenance starts at 0900 hours and will probably last all day long. This time, we will:
Upgrade kernel and other system software on all nodes of Bianca, Fysast1, Grus, Irma, Milou, and Rackham.
Upgrade firmware on two controllers of Crex.
Replace Battery Backup Unit of Lupus.
Reconfigure internal networks of Bianca.
We will restart all login nodes during the day. Otherwise Fysast1, Irma, Milou, and Rackham will be available, probably only with small disturbances with the Slurm connection. Jobs on these four UPPMAX clusters will continue to run during maintenance.
Biianca will probably be unavailable all day, with stopped Slurm queues.
Storage system Grus will be unavailable during the upgrade.
This news text will be updated during the maintenance, so here you will be able to see when the maintenance has finished.
Update at 1120 hours
Firmware on storage system Crex has been upgraded.
Storage system Castor has been upgraded.
Login nodes have been upgraded . We are now restarting them.
Update at 1300 hours
Login nodes has been restarted and checked.
Battery Backup Unit of Lupus has been replaced.
Storage system Grus has been upgraded.
Upgrade of Bianca proceeds slowly but without any known problems, so we still think that we will finish today.
Update at 1600 hours
Only Bianca maintenance is left to do. We have upgraded everything. Now we are reconfiguring internal networks. We will also make several function tests before allowing new logins. We still plan to finish today.
Update at 1720 hours
We have finished also maintenance of Bianca, and will now close the maintenance window. Please get in touch if something has stopped working.
Next maintenance window is Wednesday July 5th.
Urgent kernel upgrade -- FINISHED
Today we are performing an urgent kernel upgrade on Milou, Fysast1, Rackham, Irma, and Bianca. Login nodes will be restarted during the day. No running jorbs or queues are stopped. We will update on the progress here in System News during the day.
UPDATE 16:00 - Update completed.
Intelmpi performance issues
Bianca graphical login now working
Uses Thinlinc Web Access. Not X-forwardning.
Bianca's storage system Castor has problems -- FIXED
Maintenance window Wednesday 2017-06-07 -- FINISHED
Issues with X11 on milou (X11Forwarding) -- SOLVED
We have observed and several users have reported issues with running X11 applications on Milou. We are investigating it.
milou2 and milou-b rebooted
The login nodes milou2.uppmax.uu.se and milou-b.uppmax.uu.se were rebooted 15:00 today (29th of May) due to some issues with the kernel NFS module.
Cooling stop at 17.00 hours the 23rd of May -- CANCELLED
Issues with certain project volumes for milou/pica 20170515 and onwards.
Some project volumes on pica are very heavily loaded and slow/next to unusable for interactive use. We're doing what we can to resolve this but can not promise any set time for when things will behave as normal again.
UPDATE: We've had some continuing issues with this due to some nodes not realizing when resources behave better, we're working on these issues but this may have caused disturbances like failed jobs or missing output.
Support may be slow May 11th and 12th due to conference
The UPPMAX system group hosts the spring 'SONC' conference where administrators from all SNIC-centers meet and discuss how to improve our centers. With many UPPMAX adminstrators being out of office during the conference (Thursday 11th and Friday 12th) the support will likely be less responsive.
slurm disturbance on milou 2017-05-10
Due to a misconfiguration active on a certain number of nodes around 12AM today, some jobs that were launched on milou could not start.
If you have jobs that were victims of this, they will likely show up as completed although with a very short run time (a few seconds).
Disturbances in Slurm today Tuesday -- finished
Maintenance window Wednesday 2017-05-03 -- finished
Slurm problems on Rackham -- fixed
Intel license server not responding --fixed
We have gotten reports that the Intel license server is not responding. We are investigating it. This might manifest itself with hangs or freezes during compilations.
Problem "Invalid account or account/partition..." --solved
We have identified a problem with the Slurm account database. If you just got added or created a new project you might get the following message when scheduling jobs "Invalid account or account/partition...". It affects primarily Rackham and Milou.
Problem with Slurm on Milou -- fixed
Interrupts in Slurm service on Rackham -- fixed
Bianca's storage system Castor has problems -- fixed
Resetting your password from the homepage is not working --fixed
Resetting your password from this page is currently not working. If you need to reset your password please contact firstname.lastname@example.org
Update 2017-04-18: This issue should now be fixed.
Funk-accounts and new certificates
Some of the shared funk-accounts used on Irma and Milou might stop working due to the IP-address change.
Maintenance window Wednesday 2017-04-05 -- finished
Smog will be decommissioned on Wednesday 5th of April
Smog will be decommissioned on Wednesday 5th of April. As previously mentioned the SNIC Cloud Team is currently working on bringing up a new cloud to replace Smog and join the other two regions in the SNIC Science Cloud project.
For questions ,please contact email@example.com (and not the UPPMAX support queues).
Rackham2, one of Rackham's login nodes, got into problems -- now fixed
Maintenance window for Bianca Wednesday 2017-03-22 -- finished