Skip to content

Sequence_Trimming

Paul Hoffman edited this page Sep 26, 2019 · 3 revisions

Basic Usage

The Sequence_Trimming handler trims adapter sequences and performs trimming based on quality on a set of FASTQs. This script utilizes Trimmomatic to perform the trimming. Sequence_Trimming supports both uncompressed and gzipped FASTQ files as input.

To run Sequence_Trimming, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Sequence_Trimming can be submitted to a job scheduler with the following command (assuming that you are in the directory containing RNApipeline):

./main.sh Sequence_Trimming proj.conf

Handler-Specific Variables

The following are a list of variables that need to be defined within the configuration file. In addition to the handler-specific variables, all common variables must be defined.

Variable Function
ST_QSUB QSub settings for batch submission
FORWARD_NAMING Shared suffix for forward reads. Example: If your files are named sample1_R1.fastq and sample2_R1.fastq, then FORWARD_NAMING=_R1.fastq
REVERSE_NAMING Shared suffix for reverse reads. Example: If your files are named sample1_R2.fastq and sample2_R2.fastq, then REVERSE_NAMING=_R2.fastq
ADAPTERS A plain text or FASTA file with the adapter sequences. These sequences will depend on the technology and platform used for sequencing, but most common adapters for various platforms can be found online
PHRED64 Use the phred64 scale instead of the phred33 quality scale

Note: If you have single-end samples, leave FORWARD_NAMING and REVERSE_NAMING filled with values that do not match your samples. If none of your samples match the forward or reverse naming suffixes, Adapter_Trimming will automatically assume that the samples are single-end.

Output

Sequence_Trimming will output a trimmed, gzipped FASTQ file for each sample. If you have paired-end data, then each sample should end in _forward_paired.fastq.gz for forward reads and _reverse_paired.fastq.gz for reverse reads; single-end data should end in _trimmed.fastq.gz.

In addition, a list of all trimmed FASTQ files will be generated for use with other handlers. The full file path to this list will be ${OUT_DIR}/Sequence_Trimming/${PROJECT}_trimmed.txt

Dependencies

The Sequence_Trimming handler depends on: