Uppsala Multidisciplinary Center for Advanced Computational Science

How to use the nodes own hard drive for analysis

Short version: Always copy the files you want to use in the anaylsis to /scratch before starting the job, and store all output there as well. The last thing you do in the job is to copy the files you want to keep back to Pica.

Long version: If many jobs are reading and/or writing a lot to the same place in the file system, it will quickly slow down access to files on the same storage volume as the affected area. Not only that, all other volumes that are controlled by the same physical node that the affected volume will notice slowdowns as well. The 20 storage volumes of Pica are controlled by only 4 physical machines, so if one user runs jobs in a bad way, up to 25% of all projects on Uppmax can experience slow file access.

Sometimes it's not only 1 user that causes the problem. If multiple users who all run their jobs in a semi-bad way happen to run their jobs simultaiously, it could trigger this effect. This is the tricky part for the users, because they have run their jobs exactly the same way in the past without any problems. It's just that the never run their jobs at the same time as the other users, so nobody thinks they are the guilty party. This is also what makes it so hard to find which users are causing the problems. Sometimes it can even be a lot of users who all run their jobs in a good way that happend to overload the system, just because they happend to start their job at the same time that reads/writes to files in the same place in the file system.

Fortunately, it's easy to avoid this situation by always running your analysis using files that are located on the calculation nodes own hard drive. Instead of your jobs hammering Pica to read/write files, each job will only work on the nodes own hard drive. That way the load is spread over many more hard drives, and overloading Pica will happend much less. 

The hard drive of the node is located at /scratch, and each job that runs on a node gets a folder created automatically with the same name as the jobid, /scratch/<jobid>   This folder name is also stored in the environment variable $SNIC_TMP for ease of use. The idea is that you copy all the files you will be reading from to $SNIC_TMP the first thing that happens in the job. You then run your analysis and put all the output files in $SNIC_TMP as well. After the analysis is done, you copy back all the output files you want to keep to your projects nobackup folder. Everything in /scratch/<jobid> will be deleted as soon as the job is finished.

An example would be a script that runs bwa to align read. Usually they look something like this:

#!/bin/bash -l
#SBATCH -A b2017999
#SBATCH -t 01:00:00
#SBATCH -p core
#SBATCH -n 16
 
# load modules
module load bioinfo-tools bwa/0.7.13 samtools/1.3
 
# run the alignment and convert it to bam format directly
bwa mem -t 16 /proj/b2017999/nobackup/ref/hg19.fa /proj/b2017999/rawdata/sample.fq.gz | samtools view -b -o /proj/b2017999/nobackup/results/sample.bam

The only thing that has to be changed is to first copy the files to $SNIC_TMP and then copy the results back once the alignment is done.

#!/bin/bash -l
#SBATCH -A b2017999
#SBATCH -t 01:00:00
#SBATCH -p core
#SBATCH -n 16
 
# load modules
module load bioinfo-tools bwa/0.7.13 samtools/1.3
 
# copy the files used in the analysis to $SNIC_TMP
cp /proj/b2017999/nobackup/ref/hg19.fa* /proj/b2017999/rawdata/sample.fq.gz $SNIC_TMP
 
# go to the $SNIC_TMP folder to make sure any temporary files are created there as well
cd $SNIC_TMP
 
# run the alignment using the files in $SNIC_TMP and convert it to bam format directly
bwa mem -t 16 $SNIC_TMP/hg19.fa $SNIC_TMP/sample.fq.gz | samtools view -b -o $SNIC_TMP/sample.bam
 
# copy the results back to the network file system
cp $SNIC_TMP/sample.bam /proj/b2017999/nobackup/results/

It's not harder than that. This way, the files are copied to $SNIC_TMP in a single long operation, which is much less straining for the file system than small random read/writes. The whole analysis then only uses the nodes local hard drive which keeps the load of Pica. When the alignment is finished the results is copied back to Pica so that it can be used in other analysis.

One problem that can happen is if your files and the results are too large for the node's hard dirve. The drive is 3TiB, so if your files are larger than that you will not be able to do this.