|
2 | 2 |
|
3 | 3 | ## Introduction
|
4 | 4 |
|
5 |
| -This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. |
6 |
| - |
7 |
| -The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. |
8 |
| - |
9 |
| -<!-- TODO nf-core: Write this documentation describing your workflow's output --> |
| 5 | +This document describes the output produced by the pipeline. The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. |
10 | 6 |
|
11 | 7 | ## Pipeline overview
|
12 | 8 |
|
13 | 9 | The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
|
14 | 10 |
|
15 |
| -- [FastQC](#fastqc) - Raw read QC |
| 11 | +- [BAM to FastQ](#bam-to-fastq) - Convert input Bam files to FastQ files |
| 12 | +- [FastQC Raw](#fastqc-raw) - Raw read QC statistics |
| 13 | +- [FASTP](#fastp) - Adapter trimming |
| 14 | +- [FastQC Trimmed](#fastqc-trimmed) - QC statistics for the reads trimmed by FASTP |
| 15 | +- [CAT](#cat) - Concatenate reads by group |
| 16 | +- [Seqkit/fq2fa](#seqkitfq2fa) - Convert trimmed reads to Fasta |
16 | 17 | - [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
|
17 | 18 | - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
|
18 | 19 |
|
19 |
| -### FastQC |
| 20 | +### BAM to FastQ |
| 21 | + |
| 22 | +The conversion is performed using the `bam2fastq` command of the [PBTK](https://github.com/PacificBiosciences/pbtk) toolkit from Pacific Biosciences. |
| 23 | + |
| 24 | +### FastQC Raw |
20 | 25 |
|
21 | 26 | <details markdown="1">
|
22 | 27 | <summary>Output files</summary>
|
23 | 28 |
|
24 |
| -- `fastqc/` |
| 29 | +- `fastqc_raw/` |
25 | 30 | - `*_fastqc.html`: FastQC report containing quality metrics.
|
26 | 31 | - `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
|
27 | 32 |
|
28 | 33 | </details>
|
29 | 34 |
|
30 | 35 | [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
|
31 | 36 |
|
32 |
| - |
| 37 | +### FASTP |
| 38 | + |
| 39 | +<details markdown="1"> |
| 40 | +<summary>Output files</summary> |
| 41 | + |
| 42 | +- `fastp/` |
| 43 | + - `fail` |
| 44 | + - `*.fail.fastq.gz`: Reads which failed to pass the filter |
| 45 | + - `html` |
| 46 | + - `*.fastp.html`: Sample wise HTML report |
| 47 | + - `json` |
| 48 | + - `*.fastp.json`: Sample wise JSON report |
| 49 | + - `log` |
| 50 | + - `*.fastp.log`: Sample wise log file |
| 51 | + - `pass` |
| 52 | + - `*.fastp.fastq.gz`: Reads which passed the filter |
| 53 | + |
| 54 | +</details> |
| 55 | + |
| 56 | +[FASTP](https://github.com/OpenGene/fastp) is an ultra-fast all-in-one FASTQ preprocessor to perform QC, adapter trimming, filtering, splitting and merging. |
| 57 | + |
| 58 | +### FastQC Trimmed |
| 59 | + |
| 60 | +<details markdown="1"> |
| 61 | +<summary>Output files</summary> |
| 62 | + |
| 63 | +- `fastqc_trim/` |
| 64 | + - `*_fastqc.html`: FastQC report containing quality metrics. |
| 65 | + - `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. |
| 66 | + |
| 67 | +</details> |
33 | 68 |
|
34 |
| - |
| 69 | +FastQC is applied to the trimmed reads from FASTP `*.fastp.fastq.gz`. |
35 | 70 |
|
36 |
| - |
| 71 | +### CAT |
| 72 | + |
| 73 | +<details markdown="1"> |
| 74 | +<summary>Output files</summary> |
| 75 | + |
| 76 | +- `groups/` |
| 77 | + - `fastq` |
| 78 | + - `*.merged.fastq.gz`: Concatenated fastq file |
| 79 | + |
| 80 | +</details> |
| 81 | + |
| 82 | +Samples with the same `group` column in the `samplesheet.csv` are concatenated together. For single-end samples, a single `*.merged.fastq.gz` file is created. For paired-end samples, two separate files for `reads_1` and `reads_2` are saved. The concatenation is performed with the [cat](https://www.linfo.org/cat.html) command. |
| 83 | + |
| 84 | +### Seqkit/fq2fa |
| 85 | + |
| 86 | +<details markdown="1"> |
| 87 | +<summary>Output files</summary> |
| 88 | + |
| 89 | +- `groups/` |
| 90 | + - `fasta` |
| 91 | + - `*.fa.gz`: Concatenated fasta file |
| 92 | + |
| 93 | +</details> |
37 | 94 |
|
38 |
| -:::note |
39 |
| -The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. |
40 |
| -::: |
| 95 | +Concatenated FastQ files `*.merged.fastq.gz` are converted into Fasta files with `fq2fa` command of the [Seqkit](https://bioinf.shenwei.me/seqkit/) toolkit. |
41 | 96 |
|
42 | 97 | ### MultiQC
|
43 | 98 |
|
|
0 commit comments