Skip to content

Inferring Transcript Phylogenies from Transcript Ortholog Clusters

License

Notifications You must be signed in to change notification settings

UdeS-CoBIUS/TranscriptPhylogenies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌴 Inferring Transcript Phylogenies from Transcript Ortholog Clusters 🌴

theme

👥 Authors

  • Wend Yam D D Ouedraogo & Aida Ouangraoua, CoBIUS lab, Department of Computer Science, Faculty of Science, Université de Sherbrooke, Sherbrooke, Canada*

💡 If you are using our algorithm in your research, please cite our recent paper: Upcoming

📧 Contact: wend.yam.donald.davy.ouedraogo@usherbrooke.ca

📖 Table of Contents

  1. ➤ About the project
    1. ➤ Overview
    2. ➤ Operating System
    3. ➤ Requirements
  2. ➤ Inferring Transcript Phylogenies from Transcript Ortholog Clusters
    1. ➤ Package Pypi
    2. ➤ Getting Started
    3. ➤ Project files descriptions
      1. ➤ Inputs description
      2. ➤ Outputs description

-----------------------------------------------------

📝 About The Project

☁️ Overview

A progressive supertree construction algorithm that relies on a dynamic programming approach to infer a transcript phylogeny based on precomputed clusters of orthologous transcripts.

👨‍💻 Operating System

The program was both developed and tested on a system operating Ubuntu version 22.04.6 LTS and Windows 11 OS.

⚒️ Requirements

  • python3 (at leat python 3.6)
  • Pandas
  • Numpy
  • ETE toolkit (ete3)
  • PyQt5
  • PhyloTreeLib

-----------------------------------------------------

Inferring transcript phylogenies from transcript ortholog clusters

📦 About the package

upcoming

-----------------------------------------------------

🚀 Getting Started

Command

usage: minevolrec.py [-h] -l LABEL -clus CLUSTERS -nhx NHX -map MAPPINGS -matx MATRIX [-forest FOREST] [-forest_threshold FOREST_THRESHOLD]
                     [-join JOIN] [-outf OUTPUT] [-outp PREFIX] [-c COMPUTE]

parsor program parameter

options:
  -h, --help            show this help message and exit
  -l LABEL, --label LABEL
                        0 | 1, Boolean variable controlling the utilization of the labeled version of the algorithm (0) or the non-labeled
                        version (1).
  -clus CLUSTERS, --clusters CLUSTERS
                        FASTA file containing cluster IDs in front of each id_transcript separated by semicolons
  -nhx NHX, --nhx NHX   txt file containg gene tree format => NHX format
  -map MAPPINGS, --mappings MAPPINGS
                        FASTA file containg transcripts and their corresponding genes separated by semicolons.
  -matx MATRIX, --matrix MATRIX
                        CSV file containing matrix separated by comma ';'(header: True, index: True)Note: Index must correspond to the
                        header.
  -forest FOREST, --forest FOREST
                        0 | 1, boolean variable controls the reconstruction of a rooted binary transcript tree(1) or transcript forest(0).
  -forest_threshold FOREST_THRESHOLD, --forest_threshold FOREST_THRESHOLD
                        float variable represents threshold to cut the dendogram(minimum evolution tree)
  -join JOIN, --join JOIN
                        min(1) | mean(0) | max(2). By default(1)
  -outf OUTPUT, --output OUTPUT
                        output folder
  -outp PREFIX, --prefix PREFIX
                        prefix of output files
  -c COMPUTE, --compute COMPUTE
                        0 | 1, Compute all solutions(By default False(1))

Usage example

python3 scripts/minevolrec.py -l 0 -map example/mappings.maps -clus example/clusters.clus -nhx example/tree.nhx -matx example/matrix.matx 

OR

sh ./minevolrec.sh

Output expected

output

-----------------------------------------------------

📁 Project Files Description

⌨️ Inputs description

Inputs files [mandatory parameters]

mappings.maps [-map mappings.maps]

A file that lists each transcript (t) in the observation along with its corresponding gene (g). The row format is as follows: >t:g

clusters.clus [-clus clusters.clus]

A file that lists each transcript (t) in the observation along with the cluster id (IDcluster) where it is included. The row format is as folows: >t:IDcluster

matrix.matx [-matx matrix.matx]

A CSV file that describes the pairwise similarity score for a set of observed transcripts. The file must contains a header and records separated by the character ';'. The header should be as follows : ['transcripts', [record]]. An example is given in the folder 'example/input/'.

‼️ The transcript similarity used for the pre-computation of the matrix file is described in another repository [📖 read the paper]

tree.nhx [-nhx tree.nhx]

A NHX tree describing the evolution of the studied gene family.


💽 Outputs description

Outputs folders/files

dendogram/

The folder contains the guide tree in NEWICK.

ortholog_trees/

The folder contains each ortholog trees.

solution.nhx

The file contain the reconstructed transcript phylogeny In NHX format

solution.svg

The file yield a visualization of the transcript tree, as follows for instance:

solution

legend: green node 🟢: creation, red node 🟥: gene duplication, blue node 🔵 speciation. The transcripts in the same ortholog trees are displayed with a distinct color at the leaves. The LCA-reconciliation cost is given at the top of the figure.

✔️ Dataset

The folder data contains dataset used for the studies and also the results obtained.







Copyright © 2024 CoBIUS LAB

Releases

No releases published

Packages

No packages published