This repository analyzes gene family dynamics in Daphnia species, focusing on gene family expansion and contraction related to ecological adaptation and evolutionary innovation. This pipeline can be applied to any combination of species that is well-annotated (or custom gene models!) within the NCBI genome catalog.
graph TD
AA[Download NCBI Genomes/Proteomes] --> A
AA --> C
A[Extract Longest Transcript] --> B[Annotate GO Terms]
C[Run BUSCO on Proteomes] --> G[Extract/Trim/Align/Estimate BUSCO Gene Trees & Consensus Species-Tree]
A --> I[Run OrthoFinder]
I --> J[Filter Gene Families]
J --> K[Estimate Gene Family Evolution CAFE5]
K --> L[GO Term Enrichment]
I --> M[Run Selection Tests]
G --> P[Estimate Time-Calibrated Tree MCMCtree]
P --> K
- Download updated protein coding GTFs from NCBI and genomes and determine your appropriate in-group and out-group species for biological hypothesis testing.
- BUSCO assesses the completeness of Daphnia genome assemblies and annotations by evaluating conserved benchmark universial single-copy orthologous genes.
- OrthoFinder detects orthologs within and across Daphnia species to understand evolutionary relationships.
- Phylogenomic analysis infers evolutionary relationships and dynamics within Daphnia species using MCMCtree on BUSCO genes.
- This section explores gene family expansion and contraction across Daphnia species, focusing on genes related to spermatogenesis and stress responses using Cafe5 and ClusterProfiler.
- Selection analysis investigates evolutionary pressures on specific gene families, particularly those undergoing expansion, using codon-based models like PAML and HyPhy.
To build the Apptainer image, use the following command:
apptainer build gene_family_evolution.sif definition.def
To run the Nextflow pipeline, use the following command:
nextflow run main.nf -profile standard
Ensure that the nextflow.config
file is in the same directory as main.nf
or specify its path using the -c
option.
- While I am using
apptainer run latest_image.sif
for most processes, you could modify the code to runapptainer exec docker://image:latest
so you do not have to pull images. I am currently editing this feature so it is more user-friendly. - This is currently a work-in-progress project and I am learning best practices with NextFlow in general, any help or tips would be appreciated!
- TimeTree constraints are used to calibrate the phylogenetic tree with divergence times obtained from the TimeTree database. These constraints help in estimating the divergence times between species accurately.
- In the pipeline, the
makeConsensusMCMC
process includes the application of these constraints using themcmctree_prep.py
script. This script adds time constraints to the species tree based on known divergence times.
- The divergence time between Daphnia magna and Drosophila melanogaster is constrained between 474.8 and 530 million years.
- This constraint is applied in the
makeConsensusMCMC
process as follows:python ${params.scripts_dir}/mcmctree_prep.py \ --left_species magna \ --right_species melanogaster \ --lower_bound 474.8 \ --upper_bound 530 \ --tree - \
This project is licensed under the MIT License.