Skip to content

Using DIAMOND with GV

Ismail Moghul edited this page Oct 17, 2018 · 2 revisions

Using DIAMOND with GV

Diamond is a faster alternative to BLAST and as such can be used to speed up a GeneValidator analysis.

Below is an example illustrating how Diamond output files can be passed to GeneValidator

# Install GV 
sh -c "$(curl -fsSL https://install-genevalidator.wurmlab.com)"

cd genevalidator

# Download SwissProt 
curl -L -o blast_db/uniprot_sprot.fasta.gz ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

gunzip blast_db/uniprot_sprot.fasta.gz

# Requires prior installation of Diamond 
# Make Diamond BLAST database 
diamond makedb --in blast_db/uniprot_sprot.fasta -d blast_db/swissprot

# Run BLASTP with Diamond
diamond blastp --db blast_db/swissprot --outfmt 5 --query exemplar_data/protein_data.fa --out exemplar_data/protein_data.xml

After producing the DIAMOND BLAST XML, there are two options to runs GV.

  1. Use Samtools Faidx to produce a FASTA file for each HSP and pass this to GV.
# Requires prior installation of Samtools
# Index the original FASTA database file
samtools faidx blast_db/uniprot_sprot.fasta

# Extract the FASTA Sequence using the Hit IDs
./lib/ruby/bin.real/nokogiri -e 'puts @doc.css("Hit_id").map(&:content)' exemplar_data/protein_data.xml | xargs samtools faidx blast_db/uniprot_sprot.fasta > exemplar_data/protein_data.raw_sequences.fa

genevalidator -n 8 -m 4 -r exemplar_data/protein_data.raw_sequences.fa -x exemplar_data/protein_data.xml exemplar_data/protein_data.fa 
  1. Create BLAST database from the FASTA database file (needs to be the same one used for creating the DIAMOND database)
# Create BLAST database
makeblastdb -in blast_db/uniprot_sprot.fasta -dbtype prot -parse_seqids

genevalidator -n 8 -m 4 -d blast_db/uniprot_sprot.fasta -x exemplar_data/protein_data.xml exemplar_data/protein_data.fa 
Clone this wiki locally