-
Notifications
You must be signed in to change notification settings - Fork 18
Using DIAMOND with GV
Ismail Moghul edited this page Oct 17, 2018
·
2 revisions
Diamond is a faster alternative to BLAST and as such can be used to speed up a GeneValidator analysis.
Below is an example illustrating how Diamond output files can be passed to GeneValidator
# Install GV
sh -c "$(curl -fsSL https://install-genevalidator.wurmlab.com)"
cd genevalidator
# Download SwissProt
curl -L -o blast_db/uniprot_sprot.fasta.gz ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
gunzip blast_db/uniprot_sprot.fasta.gz
# Requires prior installation of Diamond
# Make Diamond BLAST database
diamond makedb --in blast_db/uniprot_sprot.fasta -d blast_db/swissprot
# Run BLASTP with Diamond
diamond blastp --db blast_db/swissprot --outfmt 5 --query exemplar_data/protein_data.fa --out exemplar_data/protein_data.xml
After producing the DIAMOND BLAST XML, there are two options to runs GV.
- Use Samtools Faidx to produce a FASTA file for each HSP and pass this to GV.
# Requires prior installation of Samtools
# Index the original FASTA database file
samtools faidx blast_db/uniprot_sprot.fasta
# Extract the FASTA Sequence using the Hit IDs
./lib/ruby/bin.real/nokogiri -e 'puts @doc.css("Hit_id").map(&:content)' exemplar_data/protein_data.xml | xargs samtools faidx blast_db/uniprot_sprot.fasta > exemplar_data/protein_data.raw_sequences.fa
genevalidator -n 8 -m 4 -r exemplar_data/protein_data.raw_sequences.fa -x exemplar_data/protein_data.xml exemplar_data/protein_data.fa
- Create BLAST database from the FASTA database file (needs to be the same one used for creating the DIAMOND database)
# Create BLAST database
makeblastdb -in blast_db/uniprot_sprot.fasta -dbtype prot -parse_seqids
genevalidator -n 8 -m 4 -d blast_db/uniprot_sprot.fasta -x exemplar_data/protein_data.xml exemplar_data/protein_data.fa