Uppsala Multidisciplinary Center for Advanced Computational Science

Disk storage guide

Users have access to shared network storage on various cluster file systems. This mean that whenever you are logged in to a login server or if you are running on a compute node you will have the same view of the storage.

There are several different classes of disk storage available with different policies for usage, limits and backup:

  1. The user home file system
  2. The Global scratch file system
  3. Local scratch file systems
  4. The global project and nobackup file system
  5. Temporary virtual filesystem

Users have access to shared network storage on various cluster file systems, and backup home directories and some project storages to tape.

If you need more quota ...

If more quota is needed, contact support (support@uppmax.uu.se) for advice. The uquota command is used to check current disk usage and limits. This command is available once you load the 'uppmax' module.

Environmental variables

We have defines several environment variables to help our users. They are:

  • $HOME (or $SNIC_BACKUP) is a traditional one, pointing to the users home directory
  • $TMPDIR or ($SNIC_TMP) points to node-local storage, suitable for temporary files that can be deleted when the job finishes
  • $SNIC_NOBACKUP points to an UPPMAX-wide storage suitable for temporary files (not deleted when the job is finished)

Types of storage

User Home directories

Paths: $HOME or $SNIC_BACKUP

Permanent storage of users files during the lifetime of the accounts. Shared access on all cluster nodes. Snapshots are normally enabled on this file system, and you can access the snapshots in every directory by 'ls .snapshot' or 'cd .snapshot'. The default quota is 32GB per user. We provide backup of this volume, and we keep the files on tape up to 90 days after they are deleted from disk. If you have files you do not want to back up place them in a folder called 'nobackup'.

Local Scratch

Paths: $TMPDIR or $SNIC_TMP

Each node has a /scratch volume for local access providing the most efficient disk storage for temporary files. Users have read/write access to this file system. SLURM defines the environment variable TMPDIR which you may use in job scripts. On clusters with SLURM you may use /scratch/$SLURM_JOB_ID. This area is for local access only, and is not directly reachable from other nodes or from the front node. There is no backup of the data and the lifetime of any file created is limited to the current user session or batch job. Files are automatically erased when space is needed by other users.

Projects global storage

Paths: /proj/[proj-id]

The project global storage is permanent storage of project's files during the lifetime of the project. Disk quota on this volume is shared by all project members. Default quota allocation is 0.5 TB.

UPPNEX projects, that store and analyze data from Next Generation Sequencing data, can also request extra storage quota by contacting UPPMAX support.

The files are backed up to tape and we keep the files for 40 days after they are deleted from disk. In the project folder you should keep all your raw data and important scripts.

All temporary files, and files that can be regenerated (e.g.. data created from your computations), should be moved to the nobackup folder. This folder have a separate quota than the project folder (also this default 0.5 TB), and for UPPNEX projects it is usually easier to get an increase of this than for the backuped area.

Please also note that the INBOX folder, were UPPNEX projects receive their sequencing data, is not backed up. That means that you should move your data as soon as possible, to suitable places.

Temporary virtual filesystem

Paths: /dev/shm/[job-id]

On all our clusters we have a temporary virtual filesystem implemented as a shared memory area. I.e. it uses primarily the RAM for storage (until it eventually might have to swap out to physical disk), and can be accessed via the path /dev/shm/[job-id].

In some situations this "disk" area can be quicker to read/write to, but depending on the circumstances it can also be slower than local scratch disk. Also note that it is a shared resource among all running jobs on a specific node, so depending on the node and how much memory your job has been allocated, the amount of data you can write will vary.