Uppsala Multidisciplinary Center for Advanced Computational Science

Blast databases

Many pipelines involving annotation/assembly comparison involve BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Several blast versions are available as modules, e.g.:

  • blast/2.2.31+ (the Blast+ suite, recommended)
  • blast/2.2.26

Use module spider blast (on milou) or module avail blast (on tintin, after module load bioinfo-tools) to see available versions.

In addition, blast databases (ftp://ftp.ncbi.nih.gov/blast/db/README) are available on uppmax, at

/sw/data/uppnex/blast_databases

Ideally they are updated once a week.

At the moment, the following databases are available in this location:

  • nr
  • nt
  • refseq_rna
  • refseq_protein
  • refseq_genomic
  • human_genomic
  • other_genomic
  • swissprot
  • uniprot_sprot
  • uniprot_trembl
  • uniprot_all
  • env_nr
  • wgs
  • pdbaa
  • UniVec and UniVec_Core (as Fasta files)
  • taxdb

To use those databases, it is possible either to specify their location in the blast command:

blastp -db /sw/data/uppnex/blast_databases/nr -query input.fasta (with Blast+)

blastall -p blastp -d /sw/data/uppnex/blast_databases/nr -i input.fasta (with Blast)

It is also possible to define an environment variable to store the default location of the blast databases, and then to indicate the name of the database only:

export BLASTDB=/sw/data/uppnex/blast_databases
blastp -db nr -query input.fasta

The first line of this snippet can be placed in your file ~/.bash_profile or ~/.bashrc to avoid declaring it at each session.