🌴 Inferring Transcript Phylogenies from Transcript Ortholog Clusters 🌴

👥 Authors

Wend Yam D D Ouedraogo & Aida Ouangraoua, CoBIUS lab, Department of Computer Science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Canada*

💡 If you are using our algorithm in your research, please cite our recent paper: Upcoming

📧 Contact: wend.yam.donald.davy.ouedraogo@usherbrooke.ca

📖 Table of Contents

➤ About the project
➤ Inferring Transcript Phylogenies from Transcript Ortholog Clusters

📝 About The Project

☁️ Overview

A progressive supertree construction algorithm that relies on a dynamic programming approach to infer a transcript phylogeny based on precomputed clusters of orthologous transcripts.

👨‍💻 Operating System

The program was both developed and tested on a system operating Ubuntu version 22.04.6 LTS and Windows 11 OS.

⚒️ Requirements

python3 (at leat python 3.6)
Pandas
Numpy
ETE toolkit (ete3)
PyQt5
PhyloTreeLib

Inferring transcript phylogenies from transcript ortholog clusters

📦 About the package

upcoming

🚀 Getting Started

Command

usage: minevolrec.py [-h] -l LABEL -clus CLUSTERS -nhx NHX -map MAPPINGS -matx MATRIX [-forest FOREST] [-forest_threshold FOREST_THRESHOLD]
                     [-join JOIN] [-outf OUTPUT] [-outp PREFIX] [-c COMPUTE]

parsor program parameter

options:
  -h, --help            show this help message and exit
  -l LABEL, --label LABEL
                        0 | 1, Boolean variable controlling the utilization of the labeled version of the algorithm (0) or the non-labeled
                        version (1).
  -clus CLUSTERS, --clusters CLUSTERS
                        FASTA file containing cluster IDs in front of each id_transcript separated by semicolons
  -nhx NHX, --nhx NHX   txt file containg gene tree format => NHX format
  -map MAPPINGS, --mappings MAPPINGS
                        FASTA file containg transcripts and their corresponding genes separated by semicolons.
  -matx MATRIX, --matrix MATRIX
                        CSV file containing matrix separated by comma ';'(header: True, index: True)Note: Index must correspond to the
                        header.
  -forest FOREST, --forest FOREST
                        0 | 1, boolean variable controls the reconstruction of a rooted binary transcript tree(1) or transcript forest(0).
  -forest_threshold FOREST_THRESHOLD, --forest_threshold FOREST_THRESHOLD
                        float variable represents threshold to cut the dendogram(minimum evolution tree)
  -join JOIN, --join JOIN
                        min(1) | mean(0) | max(2). By default(1)
  -outf OUTPUT, --output OUTPUT
                        output folder
  -outp PREFIX, --prefix PREFIX
                        prefix of output files
  -c COMPUTE, --compute COMPUTE
                        0 | 1, Compute all solutions(By default False(1))

Usage example

python3 scripts/minevolrec.py -l 0 -map example/mappings.maps -clus example/clusters.clus -nhx example/tree.nhx -matx example/matrix.matx

OR

sh ./minevolrec.sh

Output expected

📁 Project Files Description

⌨️ Inputs description

Inputs files [mandatory parameters]

mappings.maps [-map mappings.maps]

A file that lists each transcript (t) in the observation along with its corresponding gene (g). The row format is as follows: >t:g

clusters.clus [-clus clusters.clus]

A file that lists each transcript (t) in the observation along with the cluster id (IDcluster) where it is included. The row format is as folows: >t:IDcluster

matrix.matx [-matx matrix.matx]

A CSV file that describes the pairwise similarity score for a set of observed transcripts. The file must contains a header and records separated by the character ';'. The header should be as follows : ['transcripts', [record]]. An example is given in the folder 'example/input/'.

‼️ The transcript similarity used for the pre-computation of the matrix file is described in another repository [📖 read the paper]

tree.nhx [-nhx tree.nhx]

A NHX tree describing the evolution of the studied gene family.

💽 Outputs description

Outputs folders/files

dendogram/

The folder contains the guide tree in NEWICK.

ortholog_trees/

The folder contains each ortholog trees.

solution.nhx

The file contain the reconstructed transcript phylogeny In NHX format

solution.svg

The file yield a visualization of the transcript tree, as follows for instance:

legend: green node 🟢: creation, red node 🟥: gene duplication, blue node 🔵 speciation. The transcripts in the same ortholog trees are displayed with a distinct color at the leaves. The LCA-reconciliation cost is given at the top of the figure.

✔️ Dataset

The folder data contains dataset used for the studies and also the results obtained.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
example		example
public		public
scripts		scripts
LICENSE		LICENSE
README.md		README.md
minevolrec.sh		minevolrec.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌴 Inferring Transcript Phylogenies from Transcript Ortholog Clusters 🌴

📖 Table of Contents

📝 About The Project

☁️ Overview

👨‍💻 Operating System

⚒️ Requirements

Inferring transcript phylogenies from transcript ortholog clusters

📦 About the package

🚀 Getting Started

📁 Project Files Description

⌨️ Inputs description

💽 Outputs description

✔️ Dataset

About

Releases

Packages

Languages

License

UdeS-CoBIUS/TranscriptPhylogenies

Folders and files

Latest commit

History

Repository files navigation

🌴 Inferring Transcript Phylogenies from Transcript Ortholog Clusters 🌴

📖 Table of Contents

📝 About The Project

☁️ Overview

👨‍💻 Operating System

⚒️ Requirements

Inferring transcript phylogenies from transcript ortholog clusters

📦 About the package

🚀 Getting Started

📁 Project Files Description

⌨️ Inputs description

💽 Outputs description

✔️ Dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages