Skip to content

Commit 7294f50

Browse files
committed
Updated output doc
1 parent e6151d9 commit 7294f50

File tree

4 files changed

+69
-14
lines changed

4 files changed

+69
-14
lines changed

docs/images/mqc_fastqc_adapter.png

-22.9 KB
Binary file not shown.

docs/images/mqc_fastqc_counts.png

-33.1 KB
Binary file not shown.

docs/images/mqc_fastqc_quality.png

-54.5 KB
Binary file not shown.

docs/output.md

+69-14
Original file line numberDiff line numberDiff line change
@@ -2,42 +2,97 @@
22

33
## Introduction
44

5-
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
6-
7-
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
8-
9-
<!-- TODO nf-core: Write this documentation describing your workflow's output -->
5+
This document describes the output produced by the pipeline. The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
106

117
## Pipeline overview
128

139
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
1410

15-
- [FastQC](#fastqc) - Raw read QC
11+
- [BAM to FastQ](#bam-to-fastq) - Convert input Bam files to FastQ files
12+
- [FastQC Raw](#fastqc-raw) - Raw read QC statistics
13+
- [FASTP](#fastp) - Adapter trimming
14+
- [FastQC Trimmed](#fastqc-trimmed) - QC statistics for the reads trimmed by FASTP
15+
- [CAT](#cat) - Concatenate reads by group
16+
- [Seqkit/fq2fa](#seqkitfq2fa) - Convert trimmed reads to Fasta
1617
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
1718
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
1819

19-
### FastQC
20+
### BAM to FastQ
21+
22+
The conversion is performed using the `bam2fastq` command of the [PBTK](https://github.com/PacificBiosciences/pbtk) toolkit from Pacific Biosciences.
23+
24+
### FastQC Raw
2025

2126
<details markdown="1">
2227
<summary>Output files</summary>
2328

24-
- `fastqc/`
29+
- `fastqc_raw/`
2530
- `*_fastqc.html`: FastQC report containing quality metrics.
2631
- `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
2732

2833
</details>
2934

3035
[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
3136

32-
![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
37+
### FASTP
38+
39+
<details markdown="1">
40+
<summary>Output files</summary>
41+
42+
- `fastp/`
43+
- `fail`
44+
- `*.fail.fastq.gz`: Reads which failed to pass the filter
45+
- `html`
46+
- `*.fastp.html`: Sample wise HTML report
47+
- `json`
48+
- `*.fastp.json`: Sample wise JSON report
49+
- `log`
50+
- `*.fastp.log`: Sample wise log file
51+
- `pass`
52+
- `*.fastp.fastq.gz`: Reads which passed the filter
53+
54+
</details>
55+
56+
[FASTP](https://github.com/OpenGene/fastp) is an ultra-fast all-in-one FASTQ preprocessor to perform QC, adapter trimming, filtering, splitting and merging.
57+
58+
### FastQC Trimmed
59+
60+
<details markdown="1">
61+
<summary>Output files</summary>
62+
63+
- `fastqc_trim/`
64+
- `*_fastqc.html`: FastQC report containing quality metrics.
65+
- `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
66+
67+
</details>
3368

34-
![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
69+
FastQC is applied to the trimmed reads from FASTP `*.fastp.fastq.gz`.
3570

36-
![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
71+
### CAT
72+
73+
<details markdown="1">
74+
<summary>Output files</summary>
75+
76+
- `groups/`
77+
- `fastq`
78+
- `*.merged.fastq.gz`: Concatenated fastq file
79+
80+
</details>
81+
82+
Samples with the same `group` column in the `samplesheet.csv` are concatenated together. For single-end samples, a single `*.merged.fastq.gz` file is created. For paired-end samples, two separate files for `reads_1` and `reads_2` are saved. The concatenation is performed with the [cat](https://www.linfo.org/cat.html) command.
83+
84+
### Seqkit/fq2fa
85+
86+
<details markdown="1">
87+
<summary>Output files</summary>
88+
89+
- `groups/`
90+
- `fasta`
91+
- `*.fa.gz`: Concatenated fasta file
92+
93+
</details>
3794

38-
:::note
39-
The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
40-
:::
95+
Concatenated FastQ files `*.merged.fastq.gz` are converted into Fasta files with `fq2fa` command of the [Seqkit](https://bioinf.shenwei.me/seqkit/) toolkit.
4196

4297
### MultiQC
4398

0 commit comments

Comments
 (0)