Skip to content

Commit 023488c

Browse files
authored
Merge pull request #129 from Plant-Food-Research-Open/dev
Release candidate for 0.6.0
2 parents ee702d7 + d069633 commit 023488c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+2767
-131
lines changed

.github/workflows/branch.yml

+8-8
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
name: nf-core branch protection
2-
# This workflow is triggered on PRs to master branch on the repository
3-
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
2+
# This workflow is triggered on PRs to main branch on the repository
3+
# It fails when someone tries to make a PR against the Plant-Food-Research-Open `main` branch instead of `dev`
44
on:
55
pull_request_target:
6-
branches: [master]
6+
branches: [main]
77

88
jobs:
99
test:
1010
runs-on: ubuntu-latest
1111
steps:
12-
# PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
12+
# PRs to the nf-core repo main branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
1313
- name: Check PRs
1414
if: github.repository == 'Plant-Food-Research-Open/genepal'
1515
run: |
@@ -22,7 +22,7 @@ jobs:
2222
uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2
2323
with:
2424
message: |
25-
## This PR is against the `master` branch :x:
25+
## This PR is against the `main` branch :x:
2626
2727
* Do not close this PR
2828
* Click _Edit_ and change the `base` to `dev`
@@ -32,9 +32,9 @@ jobs:
3232
3333
Hi @${{ github.event.pull_request.user.login }},
3434
35-
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
36-
The `master` branch on nf-core repositories should always contain code from the latest release.
37-
Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
35+
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `main` branch.
36+
The `main` branch should always contain code from the latest release.
37+
Because of this, PRs to `main` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
3838
3939
You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
4040
Note that even after this, the test will continue to show as failing until you push a new commit.

.github/workflows/ci.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ jobs:
5555
uses: actions/checkout@v4.2.1
5656

5757
- name: Install Nextflow
58-
uses: nf-core/setup-nextflow@v2
58+
uses: nf-core/setup-nextflow@v2.0.0
5959
with:
6060
version: "${{ matrix.NXF_VER }}"
6161

.github/workflows/download_pipeline.yml

+4-4
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name: Test successful pipeline download with 'nf-core pipelines download'
22

33
# Run the workflow when:
44
# - dispatched manually
5-
# - when a PR is opened or reopened to master branch
5+
# - when a PR is opened or reopened to main branch
66
# - the head branch of the pull request is updated, i.e. if fixes for a release are pushed last minute to dev.
77
on:
88
workflow_dispatch:
@@ -17,10 +17,10 @@ on:
1717
- edited
1818
- synchronize
1919
branches:
20-
- master
20+
- main
2121
pull_request_target:
2222
branches:
23-
- master
23+
- main
2424

2525
env:
2626
NXF_ANSI_LOG: false
@@ -30,7 +30,7 @@ jobs:
3030
runs-on: ubuntu-latest
3131
steps:
3232
- name: Install Nextflow
33-
uses: nf-core/setup-nextflow@v2
33+
uses: nf-core/setup-nextflow@v2.0.0
3434

3535
- name: Disk space cleanup
3636
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1

.github/workflows/linting.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4
3535

3636
- name: Install Nextflow
37-
uses: nf-core/setup-nextflow@v2
37+
uses: nf-core/setup-nextflow@v2.0.0
3838

3939
- uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
4040
with:

.nf-core.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,5 @@ template:
3030
outdir: .
3131
skip_features:
3232
- igenomes
33-
version: 0.5.0
33+
version: 0.6.0
3434
update: null

CHANGELOG.md

+26
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,32 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6+
## v0.6.0 - [20-Dec-2024]
7+
8+
### 'Added'
9+
10+
1. Added cDNA and CDS outputs to <OUTPUT_DIR>/annotations/<SAMPLE> directory [#118](https://github.com/Plant-Food-Research-Open/genepal/issues/118)
11+
2. Added parameter `add_attrs_to_proteins_cds_fastas`
12+
3. Added parameter `filter_genes_by_aa_length` with default set to `24` which allows removal of genes with ORFs shorter than 24 [#125](https://github.com/Plant-Food-Research-Open/genepal/issues/125)
13+
14+
### `Fixed`
15+
16+
1. Fixed an issue where TSEBRA failed because LIFTOFF lifted non-protein coding genes [#121](https://github.com/Plant-Food-Research-Open/genepal/issues/121)
17+
2. Switched branch name from `master` to `main` in the GHA CIs
18+
3. Fixed an issue in `genepal_report.Rmd` which caused the pangene matrix plot to fail when the number of clusters exceeded 65536 [#124](https://github.com/Plant-Food-Research-Open/genepal/issues/124)
19+
4. Fixed an issue where `GENEPALREPORT` process failed due to OOM kill signal from SLURM [#123](https://github.com/Plant-Food-Research-Open/genepal/issues/123)
20+
5. Fixed an issue where Gff merge after liftoff failed when one of the Gff files did not contain any genes
21+
6. Fixed an issue where `gxf_fasta_agat_spaddintrons_spextractsequences` crashed due to short introns [#89](https://github.com/Plant-Food-Research-Open/genepal/issues/89)
22+
23+
### `Dependencies`
24+
25+
1. Nextflow!>=24.04.2
26+
2. nf-schema@2.1.1
27+
28+
### `Deprecated`
29+
30+
1. Removed parameter `add_attrs_to_proteins_fasta`
31+
632
## v0.5.0 - [21-Nov-2024]
733

834
### `Added`

CITATION.cff

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ authors:
3131
- family-names: "Thomson"
3232
given-names: "Susan"
3333
title: "genepal: A Nextflow pipeline for genome and pan-genome annotation"
34-
version: 0.5.0
34+
version: 0.6.0
3535
date-released: 2024-11-21
3636
url: "https://github.com/Plant-Food-Research-Open/genepal"
3737
doi: 10.5281/zenodo.14195006

README.md

+10-3
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,16 @@
3535
- Merge multi-reference liftoffs
3636
- Remove liftoff transcripts marked by _valid_ORF=False_
3737
- Remove liftoff genes with any intron shorter than 10 bp
38-
- Remove rRNA and tRNA from liftoff
38+
- Remove rRNA, tRNA and other non-protein coding models from liftoff
3939
- Optionally, allow or remove iso-forms
4040
- Remove BRAKER models from Liftoff loci
4141
- Merge Liftoff and BRAKER models
4242
- Optionally, remove models without any EggNOG-mapper hits
4343
- [EggNOG-mapper](https://github.com/eggnogdb/eggnog-mapper): Add functional annotation to gff
4444
- [GenomeTools](https://github.com/genometools/genometools): GFF format validation
45-
- [GffRead](https://github.com/gpertea/gffread): Extraction of protein sequences
45+
- [GffRead](https://github.com/gpertea/gffread)
46+
- Extraction of protein sequences
47+
- Optionally, remove models with ORFs shorter than `N` amino acids
4648
- [OrthoFinder](https://github.com/davidemms/OrthoFinder): Perform phylogenetic orthology inference across genomes
4749
- [GffCompare](https://github.com/gpertea/gffcompare): Compare and benchmark against an existing annotation
4850
- [BUSCO](https://gitlab.com/ezlab/busco): Completeness statistics for genome and annotation through proteins
@@ -97,7 +99,7 @@ sbatch ./pfr_genepal
9799

98100
plant-food-research-open/genepal workflows were originally scripted by Jason Shiller ([@jasonshiller](https://github.com/jasonshiller)). Usman Rashid ([@gallvp](https://github.com/gallvp)) wrote the Nextflow pipeline.
99101

100-
We thank the following people for their extensive assistance in the development of this pipeline:
102+
We thank the following people for extensive assistance in the development of the pipeline,
101103

102104
- Cecilia Deng [@CeciliaDeng](https://github.com/CeciliaDeng)
103105
- Charles David [@charlesdavid](https://github.com/charlesdavid)
@@ -107,6 +109,10 @@ We thank the following people for their extensive assistance in the development
107109
- Susan Thomson [@cflsjt](https://github.com/cflsjt)
108110
- Ting-Hsuan Chen [@ting-hsuan-chen](https://github.com/ting-hsuan-chen)
109111

112+
and for contributions to the codebase,
113+
114+
- Liam Le Lievre [@liamlelievre](https://github.com/liamlelievre)
115+
110116
The pipeline uses nf-core modules contributed by following authors:
111117

112118
<a href="https://github.com/gallvp"><img src="https://github.com/gallvp.png" width="50" height="50"></a>
@@ -139,6 +145,7 @@ The pipeline uses nf-core modules contributed by following authors:
139145
<a href="https://github.com/charles-plessy"><img src="https://github.com/charles-plessy.png" width="50" height="50"></a>
140146
<a href="https://github.com/bunop"><img src="https://github.com/bunop.png" width="50" height="50"></a>
141147
<a href="https://github.com/abhi18av"><img src="https://github.com/abhi18av.png" width="50" height="50"></a>
148+
<a href="https://github.com/liamlelievre"><img src="https://github.com/liamlelievre.png" width="50" height="50"></a>
142149

143150
## Contributions and Support
144151

assets/multiqc_config.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
report_comment: >
22
This report has been generated by the <a href="https://github.com/plant-food-research-open/genepal" target="_blank">plant-food-research-open/genepal</a>
33
analysis pipeline. For information about how to interpret these results, please see the
4-
<a href="https://github.com/plant-food-research-open/genepal/blob/0.5.0/docs/usage.md" target="_blank">documentation</a>.
4+
<a href="https://github.com/plant-food-research-open/genepal/blob/0.6.0/docs/usage.md" target="_blank">documentation</a>.
55
66
report_section_order:
77
"plant-food-research-open-genepal-methods-description":

bin/genepal_report.Rmd

+25-4
Original file line numberDiff line numberDiff line change
@@ -190,22 +190,43 @@ cat("<br>")
190190

191191

192192
```{r pheatmap, eval=(exists("n0_df") && !is.null(n0_df$heatmap)), results='hide', fig.align='center', fig.cap="Heatmap showing number of proteins present in each orthocluster (clusters where all individuals have 1 copy are excluded). Columns = Orthologue cluster, Row = Individual", fig.width=7, fig.height=7, dpi=150, warning=FALSE}
193-
pheatmap(n0_df$heatmap,
193+
194+
# Max 65536 allowed
195+
# https://github.com/Plant-Food-Research-Open/genepal/issues/124
196+
197+
n_cols <- ncol(n0_df$heatmap)
198+
max_cols_allowed <- min(n_cols, 5000)
199+
200+
# Approach 1: Random selection of columns
201+
# selected_cols <- sample(n_cols, max_cols_allowed)
202+
203+
# Approach 2: First N largest clusters
204+
selected_cols <- order(colSums(n0_df$heatmap), decreasing = TRUE)[seq(1, max_cols_allowed)]
205+
206+
prefix_text <- ""
207+
208+
if ( n_cols != max_cols_allowed ) {
209+
prefix_text <- paste0("Top ", max_cols_allowed, " ")
210+
}
211+
212+
pheatmap(n0_df$heatmap[, selected_cols],
194213
show_colnames = FALSE,
195-
main = "Orthologue clusters containing accessory proteins",
214+
main = paste0(prefix_text, "Orthologue clusters"),
196215
legend = TRUE,
197216
legend_labels = TRUE,
198217
border_color = "white"
199218
)
200219
201-
pheatmap(n0_df$heatmap,
220+
pheatmap(n0_df$heatmap[, selected_cols],
202221
filename = file.path(outputs_folder, "pangene.matrix.heatmap.pdf"),
203222
show_colnames = FALSE,
204-
main = "Orthologue clusters containing accessory proteins",
223+
main = paste0(prefix_text, "Orthologue clusters"),
205224
legend = TRUE,
206225
legend_labels = TRUE,
207226
border_color = "white"
208227
)
228+
229+
write.csv(x = transform_hogs(n0o), file = file.path(outputs_folder, "pangenome.matrix.csv"), row.names = FALSE)
209230
```
210231

211232

conf/base.config

+3
Original file line numberDiff line numberDiff line change
@@ -74,4 +74,7 @@ process {
7474
cpus = { 8 * task.attempt }
7575
time = { 7.days * task.attempt }
7676
}
77+
withName:GENEPALREPORT {
78+
memory = { 20.GB * task.attempt }
79+
}
7780
}

conf/modules.config

+28-3
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ process { // SUBWORKFLOW: FASTA_LIFTOFF
199199
}
200200

201201
withName: '.*:FASTA_LIFTOFF:GFFREAD_BEFORE_LIFTOFF' {
202-
ext.args = '--no-pseudo --keep-genes'
202+
ext.args = '--no-pseudo --keep-genes -C'
203203
}
204204

205205
withName: '.*:FASTA_LIFTOFF:MERGE_LIFTOFF_ANNOTATIONS' {
@@ -212,7 +212,7 @@ process { // SUBWORKFLOW: FASTA_LIFTOFF
212212

213213
withName: '.*:FASTA_LIFTOFF:GFFREAD_AFTER_LIFTOFF' {
214214
ext.prefix = { "${meta.id}.liftoff" }
215-
ext.args = '--keep-genes'
215+
ext.args = '--no-pseudo --keep-genes -C'
216216
}
217217

218218
withName: '.*:FASTA_LIFTOFF:GFF_TSEBRA_SPFILTERFEATUREFROMKILLLIST:AGAT_CONVERTSPGFF2GTF' {
@@ -240,6 +240,10 @@ process { // SUBWORKFLOW: GFF_MERGE_CLEANUP
240240
ext.prefix = { "${meta.id}.liftoff.braker" }
241241
}
242242

243+
withName: '.*:GFF_MERGE_CLEANUP:FILTER_BY_ORF_SIZE' {
244+
ext.args = params.filter_genes_by_aa_length ? "--no-pseudo --keep-genes -C -l ${ ( params.filter_genes_by_aa_length + 1 ) * 3 }" : ''
245+
}
246+
243247
withName: '.*:GFF_MERGE_CLEANUP:GT_GFF3' {
244248
ext.args = '-tidy -retainids -sort'
245249
}
@@ -286,7 +290,7 @@ process { // SUBWORKFLOW: GFF_STORE
286290
}
287291

288292
withName: '.*:GFF_STORE:EXTRACT_PROTEINS' {
289-
ext.args = params.add_attrs_to_proteins_fasta ? '-F -D -y' : '-y'
293+
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -y' : '-y'
290294
ext.prefix = { "${meta.id}.pep" }
291295

292296
publishDir = [
@@ -295,6 +299,27 @@ process { // SUBWORKFLOW: GFF_STORE
295299
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
296300
]
297301
}
302+
303+
withName: '.*:GFF_STORE:EXTRACT_CDS' {
304+
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -x' : '-x'
305+
ext.prefix = { "${meta.id}.cds" }
306+
307+
publishDir = [
308+
path: { "${params.outdir}/annotations/$meta.id" },
309+
mode: params.publish_dir_mode,
310+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
311+
]
312+
}
313+
withName: '.*:GFF_STORE:EXTRACT_CDNA' {
314+
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -w' : '-w'
315+
ext.prefix = { "${meta.id}.cdna" }
316+
317+
publishDir = [
318+
path: { "${params.outdir}/annotations/$meta.id" },
319+
mode: params.publish_dir_mode,
320+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
321+
]
322+
}
298323
}
299324

300325
process { // SUBWORKFLOW: FASTA_ORTHOFINDER

docs/output.md

+2
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,8 @@ If more than one genome is included in the pipeline, [ORTHOFINDER](https://githu
169169
- `Y/`
170170
- `Y.gt.gff3`: Final annotation file for genome `Y` which contains gene models and their functional annotations
171171
- `Y.pep.fasta`: Protein sequences for the gene models
172+
- `Y.cdna.fasta`: cDNA sequences for the gene models
173+
- `Y.cds.fasta`: Coding sequences for the gene models
172174

173175
</details>
174176

docs/parameters.md

+11-10
Original file line numberDiff line numberDiff line change
@@ -59,19 +59,20 @@ A Nextflow pipeline for consensus, phased and pan-genome annotation.
5959

6060
## Post-annotation filtering options
6161

62-
| Parameter | Description | Type | Default | Required | Hidden |
63-
| ----------------------------- | ----------------------------------------------------------------- | --------- | ------- | -------- | ------ |
64-
| `allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
65-
| `enforce_full_intron_support` | Require every model to have external evidence for all its introns | `boolean` | True | | |
66-
| `filter_liftoff_by_hints` | Use BRAKER hints to filter Liftoff models | `boolean` | True | | |
67-
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |
62+
| Parameter | Description | Type | Default | Required | Hidden |
63+
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ |
64+
| `allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
65+
| `enforce_full_intron_support` | Require every model to have external evidence for all its introns | `boolean` | True | | |
66+
| `filter_liftoff_by_hints` | Use BRAKER hints to filter Liftoff models | `boolean` | True | | |
67+
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |
68+
| `filter_genes_by_aa_length` | Filter genes with open reading frames shorter than the specified number of amino acids excluding the stop codon. If set to `null`, this filter step is skipped. | `integer` | 24 | | |
6869

6970
## Annotation output options
7071

71-
| Parameter | Description | Type | Default | Required | Hidden |
72-
| ----------------------------- | ------------------------------------ | --------- | ------- | -------- | ------ |
73-
| `braker_save_outputs` | Save BRAKER files | `boolean` | | | |
74-
| `add_attrs_to_proteins_fasta` | Add gff attributes to proteins fasta | `boolean` | | | |
72+
| Parameter | Description | Type | Default | Required | Hidden |
73+
| ---------------------------------- | --------------------------------------------- | --------- | ------- | -------- | ------ |
74+
| `braker_save_outputs` | Save BRAKER files | `boolean` | | | |
75+
| `add_attrs_to_proteins_cds_fastas` | Add gff attributes to proteins/cDNA/CDS fasta | `boolean` | | | |
7576

7677
## Evaluation options
7778

modules.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@
111111
},
112112
"gxf_fasta_agat_spaddintrons_spextractsequences": {
113113
"branch": "main",
114-
"git_sha": "7bf6fbca23edc94490ffa6709f52b2f71c6fb130",
114+
"git_sha": "ed4146008dbdcfd4823252b456de32059e2d07f4",
115115
"installed_by": ["subworkflows"]
116116
}
117117
}

0 commit comments

Comments
 (0)