Skip to content

metamlst

Moreno Zolfo edited this page Feb 13, 2020 · 8 revisions

Profiling one sample with metamlst.py


The core of the pipeline, it reconstructs the MLST-loci and saves the results in an an intermediate file (one file per sample analysed).

▸ Usage

usage: metamlst.py [-h] [-o OUTPUT FOLDER] [-d DB PATH]
                   [--filter species1,species2...] [--penalty PENALTY]
                   [--minscore MINSCORE] [--max_xM XM] [--min_read_len LENGTH]
                   [--min_accuracy CONFIDENCE] [--debug] [--presorted]
                   [--quiet] [--legacy_samtools] [--version] [--nloci NLOCI]
                   [--log] [-a]
                   [BAMFILE]

Reconstruct the MLST loci from a BAMFILE aligned to the reference MLST loci

positional arguments:
  BAMFILE               BowTie2 BAM file containing the alignments (default:
                        None)

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT FOLDER      Output Folder (default: ./out) (default: ./out)
  -d DB PATH            MetaMLST SQLite Database File (created with metaMLST-
                        index) (default:
                        /scratchCM/repos/metamlst/metamlstDB_2017.db)
  --filter species1,species2...
                        Filter for specific set of organisms only (METAMLST-
                        KEYs, comma separated. Use metaMLST-index.py
                        --listspecies to get MLST keys) (default: None)
  --penalty PENALTY     MetaMLST penaty for under-represented alleles
                        (default: 100)
  --minscore MINSCORE   Minimum alignment score for each alignment to be
                        considered valid (default: 80)
  --max_xM XM           Maximum SNPs rate for each alignment to be considered
                        valid (BowTie2s XM value) (default: 5)
  --min_read_len LENGTH
                        Minimum BowTie2 alignment length (default: 50)
  --min_accuracy CONFIDENCE
                        Minimum threshold on Confidence score (percentage) to
                        pass the reconstruction step (default: 0.9)
  --debug               Debug Mode (default: False)
  --presorted           The input BAM file is sorted and indexed with
                        samtools. If set, MetaMLST skips this step (default:
                        False)
  --quiet               Suppress text output (default: False)
  --legacy_samtools     Legacy mode (for samtools < 1.3) (default: False)
  --version             Prints version informations (default: False)
  --nloci NLOCI         Do not discard samples where at least NLOCI (percent)
                        are detected. This can lead to imperfect MLST typing
                        (default: 100)
  --log                 generate logfiles (default: False)
  -a                    Write known sequences (default: False)

Parameters

  • -o: Selects the output folder for the intermediate files.
  • -d: Specifies the database file (downloaded or created with MetaMLST-index).
  • --filter: Filters for a subset of species, instead of all the species of the provided database. Species-list must be entered as comma-separated list of MLST-keys. To get the full list of species, use the --listkeys option on the database with MetaMLST-index.
  • --penalty: Specifies a different penalty for the Reference Allele Selection.
  • -a: Write all the sequences in the intermediate output file file. Normally, known MLST sequences (i.e. not new) are not written in the file (a placeholder / identifier is written instead). This option overrides this setting and forces all the sequences to be written in the file(s).
  • --presorted: Allows to skip the BAM file sorting: if your input BAM file has been already sorted, this option can lead to a faster run of MetaMLST
  • --legacy_samtools: If your system runs samtools 0.x, this option allows for backwards compatibility. This will generate errors if your samtools version is 1.x
  • --quiet: suppresses all the stdout messages

Thresholds

  • --minscore: Specifies a different minimum alignment score. Alignments with an alignment score lower than this threshold will be discarded.
  • --max_xM: Specifies a different maximum edit distance from the closest allele from the alignments (Bowtie2 XM parameter). Alignments with an XM value greater than this threshold will be discarded.
  • --min_read_len: Specifies a different minimum alignment length for the Bowtie2 alignments. Alignments with a length lower than this threshold will be discarded.
  • --min_accuracy: Specifies a different minimum "MetaMLST-Confidence" score threshold. If at most one MLST locus is reconstracted with a Confidence below this threshold, the sample will be discarded.
  • --nloci: Allows to attempt the reconstruction of organisms for which not all loci are detectable in the sample. If the percentage of detected loci / total loci is lower than this thresholds, the sample will be discarded. By default it is set to 100% (all loci must be present to type). Note: a locus is present if at least one reads matches against it.

▸ Example

metamlst.py -d metamlstDB_2015.db YOUR_ALIGNMENTS.bam

Where -metamlstDB_2015.db is the MetaMLST-database and YOUR_ALIGNMENTS.bam is a Bowtie2 mapped BAM file, as done here

▸ Interface

MetaMLST-Output

The output of metamlst.py: for each detected organism, the closest reference allele is determined. The pipeline then reconstructs the loci reporting:

Output Description
Closest Allele Identification (Step 1)
Locus MLST locus name
Avg. Coverage Average coverage of the locus: sum of length of the reads aligning / length of the locus
Score MetaMLST-score of the selected Reference Allele
Hits Number of reads aligning to the selected Reference Allele
Reference Allele(s) Reference alleles
Consensus Sequence (Step 2)
Locus MLST locus name
Ref Reference allele selected
Length Length of the locus
Ns Number of positions not covered by any read on the locus
SNPs Number of differences of the reconstructed locus w.r.t. the reference allele
Confidence Confidence score: 1 - Ns/L * 100
Notes Specifies whether the allele is a new allele (NEW) or not (--)

▸ Customizing the parameters

Low stringency thresholds (not recommended)

metamlst.py -d metamlstDB_2015.db --max_xm 10 --minscore 40 --min_accuracy 0.5 YOUR_ALIGNMENTS.bam 

Applying a filter for specific organisms

metamlst.py -d metamlstDB_2015.db --filter sepidermidis,pacnes,saureus YOUR_ALIGNMENTS.bam 

Note: you can view the list of available species-keys for filtering using:

metaMLST-index.py --listkeys metamlstDB_2015.db