How does backup at UPPMAX work?
Backup of data is especially important to data-driven science. This page provides the ins and outs of how backup works on UPPMAX storage systems.
As PI, you are ultimately responsible for your data. We recommend you maintain a primary copy of your data on a system you control, when possible. At the very least, double-check that your collaborators are taking care of your data in a responsible way.
What does "backup" mean for my data?
The type of backup that is generally available at UPPMAX is incremental backup with 30 day retention. This means that any file that was deleted more than 30 days ago is irretrievably gone. Changes in a file are kept for 30 days, so that you can retrieve an old version up to a month after you edited it.
The backup service tries to backup all changes as often as they occur, but rapid changes are ignored. If you create a file and delete it five minutes later, the backup service will probably not have saved it for you. As a rule of thumb, the backup service will save things within a day or three.
Where is backup available?
Backup is done on:
- All of Bianca (projects named sensYYYYXXX, up to 9.5 PB), except in folders named "nobackup"
- SciLifeLab Storage projects (named sllstoreYYYYXXX, 1 PB), except in folders named "nobackup"
- SNIC projects (named snicYYYY-X-ZZZZ, < 200 TB)
Backup is not done on:
- UPPMAX Storage ("Uppstore" projects named uppstoreYYYYXXX, up to 5 PB)
What should I put in directories with backup?
In short, irreplaceable data should be placed there. This includes especially raw sequencing data and any other data that cannot be recreated by any effort. Scripts and other files that are needed to reproduce or repeat the analyses should also be placed on backup.
What should I not put in directories with backup?
Directories where you are actively working, especially if you are creating or modifying many files. The backup mechanisms cannot keep with with very many files changing on a rapid basis.
How robust is uppmax storage?
All UPPMAX storage systems use RAID technology to make storage more robust through redundancy. This means that two or more disks must fail in the same "RAID volume" before there is a risk of data loss.
However, this technology does not protect against user error (e.g. "rm -rf * in your project directory) or in case of a significant disaster (e.g. fire in computer hall). Off-site backup is crucial.