-
Notifications
You must be signed in to change notification settings - Fork 1
Sequence_Trimming
The Sequence_Trimming handler trims adapter sequences and performs trimming based on quality on a set of FASTQs. This script utilizes Trimmomatic to perform the trimming. Sequence_Trimming supports both uncompressed and gzipped FASTQ files as input.
To run Sequence_Trimming, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Sequence_Trimming can be submitted to a job scheduler with the following command (assuming that you are in the directory containing RNApipeline):
./main.sh Sequence_Trimming proj.conf
The following are a list of variables that need to be defined within the configuration file. In addition to the handler-specific variables, all common variables must be defined.
Variable | Function |
---|---|
ST_QSUB |
QSub settings for batch submission |
FORWARD_NAMING |
Shared suffix for forward reads. Example: If your files are named sample1_R1.fastq and sample2_R1.fastq , then FORWARD_NAMING=_R1.fastq
|
REVERSE_NAMING |
Shared suffix for reverse reads. Example: If your files are named sample1_R2.fastq and sample2_R2.fastq , then REVERSE_NAMING=_R2.fastq
|
ADAPTERS |
A plain text or FASTA file with the adapter sequences. These sequences will depend on the technology and platform used for sequencing, but most common adapters for various platforms can be found online |
PHRED64 |
Use the phred64 scale instead of the phred33 quality scale |
Note: If you have single-end samples, leave FORWARD_NAMING
and REVERSE_NAMING
filled with values that do not match your samples. If none of your samples match the forward or reverse naming suffixes, Adapter_Trimming will automatically assume that the samples are single-end.
Sequence_Trimming will output a trimmed, gzipped FASTQ file for each sample.
If you have paired-end data, then each sample should end in _forward_paired.fastq.gz
for forward reads and _reverse_paired.fastq.gz
for reverse reads; single-end data should end in _trimmed.fastq.gz
.
In addition, a list of all trimmed FASTQ files will be generated for use with other handlers.
The full file path to this list will be ${OUT_DIR}/Sequence_Trimming/${PROJECT}_trimmed.txt
The Sequence_Trimming handler depends on: