-
Notifications
You must be signed in to change notification settings - Fork 2
metamlst
The core of the pipeline, it reconstructs the MLST-loci and saves the results in an an intermediate file (one file per sample analysed).
usage: metamlst.py [-h] [-o OUTPUT FOLDER] [-d DB PATH]
[--filter species1,species2...] [--penalty PENALTY]
[--minscore MINSCORE] [--max_xM XM] [--min_read_len LENGTH]
[--min_accuracy CONFIDENCE] [--debug] [--presorted]
[--quiet] [--legacy_samtools] [--version] [--nloci NLOCI]
[--log] [-a]
[BAMFILE]
Reconstruct the MLST loci from a BAMFILE aligned to the reference MLST loci
positional arguments:
BAMFILE BowTie2 BAM file containing the alignments (default:
None)
optional arguments:
-h, --help show this help message and exit
-o OUTPUT FOLDER Output Folder (default: ./out) (default: ./out)
-d DB PATH MetaMLST SQLite Database File (created with metaMLST-
index) (default:
/scratchCM/repos/metamlst/metamlstDB_2017.db)
--filter species1,species2...
Filter for specific set of organisms only (METAMLST-
KEYs, comma separated. Use metaMLST-index.py
--listspecies to get MLST keys) (default: None)
--penalty PENALTY MetaMLST penaty for under-represented alleles
(default: 100)
--minscore MINSCORE Minimum alignment score for each alignment to be
considered valid (default: 80)
--max_xM XM Maximum SNPs rate for each alignment to be considered
valid (BowTie2s XM value) (default: 5)
--min_read_len LENGTH
Minimum BowTie2 alignment length (default: 50)
--min_accuracy CONFIDENCE
Minimum threshold on Confidence score (percentage) to
pass the reconstruction step (default: 0.9)
--debug Debug Mode (default: False)
--presorted The input BAM file is sorted and indexed with
samtools. If set, MetaMLST skips this step (default:
False)
--quiet Suppress text output (default: False)
--legacy_samtools Legacy mode (for samtools < 1.3) (default: False)
--version Prints version informations (default: False)
--nloci NLOCI Do not discard samples where at least NLOCI (percent)
are detected. This can lead to imperfect MLST typing
(default: 100)
--log generate logfiles (default: False)
-a Write known sequences (default: False)
Parameters
- -o: Selects the output folder for the intermediate files.
- -d: Specifies the database file (downloaded or created with MetaMLST-index).
- --filter: Filters for a subset of species, instead of all the species of the provided database. Species-list must be entered as comma-separated list of MLST-keys. To get the full list of species, use the --listkeys option on the database with MetaMLST-index.
- --penalty: Specifies a different penalty for the Reference Allele Selection.
- -a: Write all the sequences in the intermediate output file file. Normally, known MLST sequences (i.e. not new) are not written in the file (a placeholder / identifier is written instead). This option overrides this setting and forces all the sequences to be written in the file(s).
- --presorted: Allows to skip the BAM file sorting: if your input BAM file has been already sorted, this option can lead to a faster run of MetaMLST
- --legacy_samtools: If your system runs samtools 0.x, this option allows for backwards compatibility. This will generate errors if your samtools version is 1.x
- --quiet: suppresses all the stdout messages
Thresholds
- --minscore: Specifies a different minimum alignment score. Alignments with an alignment score lower than this threshold will be discarded.
- --max_xM: Specifies a different maximum edit distance from the closest allele from the alignments (Bowtie2 XM parameter). Alignments with an XM value greater than this threshold will be discarded.
- --min_read_len: Specifies a different minimum alignment length for the Bowtie2 alignments. Alignments with a length lower than this threshold will be discarded.
- --min_accuracy: Specifies a different minimum "MetaMLST-Confidence" score threshold. If at most one MLST locus is reconstracted with a Confidence below this threshold, the sample will be discarded.
- --nloci: Allows to attempt the reconstruction of organisms for which not all loci are detectable in the sample. If the percentage of detected loci / total loci is lower than this thresholds, the sample will be discarded. By default it is set to 100% (all loci must be present to type). Note: a locus is present if at least one reads matches against it.
metamlst.py -d metamlstDB_2015.db YOUR_ALIGNMENTS.bam
Where -metamlstDB_2015.db is the MetaMLST-database and YOUR_ALIGNMENTS.bam is a Bowtie2 mapped BAM file, as done here
▲ The output of metamlst.py: for each detected organism, the closest reference allele is determined. The pipeline then reconstructs the loci reporting:
Output | Description |
---|---|
Closest Allele Identification (Step 1) | |
Locus | MLST locus name |
Avg. Coverage | Average coverage of the locus: sum of length of the reads aligning / length of the locus |
Score | MetaMLST-score of the selected Reference Allele |
Hits | Number of reads aligning to the selected Reference Allele |
Reference Allele(s) | Reference alleles |
Consensus Sequence (Step 2) | |
Locus | MLST locus name |
Ref | Reference allele selected |
Length | Length of the locus |
Ns | Number of positions not covered by any read on the locus |
SNPs | Number of differences of the reconstructed locus w.r.t. the reference allele |
Confidence | Confidence score: 1 - Ns/L * 100 |
Notes | Specifies whether the allele is a new allele (NEW) or not (--) |
metamlst.py -d metamlstDB_2015.db --max_xm 10 --minscore 40 --min_accuracy 0.5 YOUR_ALIGNMENTS.bam
metamlst.py -d metamlstDB_2015.db --filter sepidermidis,pacnes,saureus YOUR_ALIGNMENTS.bam
Note: you can view the list of available species-keys for filtering using:
metaMLST-index.py --listkeys metamlstDB_2015.db
MetaMLST is a project of the Computational Metagenomics Lab at CIBIO, University of Trento, Italy.
M. Zolfo, A. Tett, O. Jousson, C. Donati and N. Segata - MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples - Nucleic Acids Research, 2016 DOI: 10.1093/nar/gkw837