👥 Authors
- Wend Yam D D Ouedraogo & Aida Ouangraoua, CoBIUS lab, Department of Computer Science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Canada*
💡 If you are using our algorithm in your research, please cite our recent paper: Upcoming
A progressive supertree construction algorithm that relies on a dynamic programming approach to infer a transcript phylogeny based on precomputed clusters of orthologous transcripts.
python3 (at leat python 3.6)
ETE toolkit (ete3)
usage: minevolrec.py [-h] -l LABEL -clus CLUSTERS -nhx NHX -map MAPPINGS -matx MATRIX [-forest FOREST] [-forest_threshold FOREST_THRESHOLD]
[-join JOIN] [-outf OUTPUT] [-outp PREFIX] [-c COMPUTE]
parsor program parameter
-h, --help show this help message and exit
-l LABEL, --label LABEL
0 | 1, Boolean variable controlling the utilization of the labeled version of the algorithm (0) or the non-labeled
version (1).
-clus CLUSTERS, --clusters CLUSTERS
FASTA file containing cluster IDs in front of each id_transcript separated by semicolons
-nhx NHX, --nhx NHX txt file containg gene tree format => NHX format
-map MAPPINGS, --mappings MAPPINGS
FASTA file containg transcripts and their corresponding genes separated by semicolons.
-matx MATRIX, --matrix MATRIX
CSV file containing matrix separated by comma ';'(header: True, index: True)Note: Index must correspond to the
-forest FOREST, --forest FOREST
0 | 1, boolean variable controls the reconstruction of a rooted binary transcript tree(1) or transcript forest(0).
-forest_threshold FOREST_THRESHOLD, --forest_threshold FOREST_THRESHOLD
float variable represents threshold to cut the dendogram(minimum evolution tree)
-join JOIN, --join JOIN
min(1) | mean(0) | max(2). By default(1)
-outf OUTPUT, --output OUTPUT
output folder
-outp PREFIX, --prefix PREFIX
prefix of output files
-c COMPUTE, --compute COMPUTE
0 | 1, Compute all solutions(By default False(1))
Usage example
python3 scripts/minevolrec.py -l 0 -map example/mappings.maps -clus example/clusters.clus -nhx example/tree.nhx -matx example/matrix.matx
sh ./minevolrec.sh
Output expected
Inputs files [mandatory parameters]
mappings.maps [-map mappings.maps]
A file that lists each transcript (t) in the observation along with its corresponding gene (g). The row format is as follows: >t:g
clusters.clus [-clus clusters.clus]
A file that lists each transcript (t) in the observation along with the cluster id (IDcluster) where it is included. The row format is as folows: >t:IDcluster
matrix.matx [-matx matrix.matx]
A CSV file that describes the pairwise similarity score for a set of observed transcripts. The file must contains a header and records separated by the character ';'. The header should be as follows : ['transcripts', [record]]. An example is given in the folder 'example/input/'.
tree.nhx [-nhx tree.nhx]
A NHX tree describing the evolution of the studied gene family.
Outputs folders/files
The folder contains the guide tree in NEWICK.
The folder contains each ortholog trees.
The file contain the reconstructed transcript phylogeny In NHX format
The file yield a visualization of the transcript tree, as follows for instance:
green node 🟢: creation, red node 🟥: gene duplication, blue node 🔵 speciation. The transcripts in the same ortholog trees are displayed with a distinct color at the leaves. The LCA-reconciliation cost is given at the top of the figure.
The folder data contains dataset used for the studies and also the results obtained.
Copyright © 2024 CoBIUS LAB