Uppsala Multidisciplinary Center for Advanced Computational Science

Maintenance window Wednesday 2016-11-02 -- NOW FINISHED

2016-11-02

We start the maintenance at 0900 hours.

We have already finished the upgrade of Fysast1, Milou, Mosler, and Tintin, so we have no remaining maintenance for them today.

We will reimplement the webexport service on a new server, as a part of making room for our new cluster Rackham. So this service will not work from today until  we have reimplemented it. Within System News, we will  keep you posted about this. We are sorry, that we have not informed you about this earlier. We guess that it will be back in service in the middle of next week.

Hard quota limits will be set on your home directories today. These are higher than the limits given by command uquota, meaning that you can temporarily go somewhat over the soft quota limit withour any problems. When you hit the hard quota limit, you will get write errors.

Storage system Lupus, belonging to cluster Irma, is not working correctly, so we plan to service it today. Irma will be fully out of service during this maintenance. This maintenance probably takes more than one day. Details are:

  • Replace bad internal memory
  • Upgrade BIOS
  • Run a fsck (file system check) on a bad part of the Lupus storage
  • Service an ethernet switch that loses too many communication packages
  • Cold start all parts of Lupus
  • Set up an improved logging of Lupus
  • Benchmark of Irma
  • After maintenance, check the function of Lupus and Irma, before setting them back in service

On cloud system Smog, we plan to:

  • Integrate it with the cloud system of C3SE, a SNIC center in Gothenburg
  • Upgrade kernel and other system software on host machines

Update at 1140 hours

We have started maintenance on Smog, which we think will be finished before tomorrow evening.

For Lupus, we are waiting for our support vendor to appear in person. If it will be a no-show, we will need to adjust our planning.

Update at 1715 hours

Hard quota limits are now set on home directories of Fysast1, Milou, and Tintin.

Webexport service will not work until next week.

Smog maintenance continues tomorrow.

Only part of Lupus maintenance is done.  We will set Irma back in production today, but Lupus will be as bad as before. Please look carefully for error messages, when running jobs. We will try to agree with the support vendor about a day in the near future, when we will continue the repair. We are sorry that it could not be solved today.

Update at 1845 hours

Irma is now back in service, but still with a bad storage system Lupus. (Please read above.)

Update Thursday at 1145 hours

Smog maintenance continues. The integration with C3SE brings problems, that we try to solve. If we do not succeed, we will roll back to previous, non-integrated status. In parallel, we have started the upgrade of kernel and system software. We guess that Smog is back in service before tomorrow (Friday) evening.

Update Friday at 1720 hours

We were not able to solve the Smog issues regarding integration with C3SE, so we rolled back to previous, non-integrated state. Kernel and other system software have been upgraded. A lot of other cleanup of has been done. Two machines, sm68 and sm94, are not yet fully upgraded/working. Some security changes are done.

Webexport service will not work until next week.

We are now closing the maintenance window.

Next maintenance window is the 7th of December.

Old System News