Quantify_Summarize

Basic Usage

The Quantify_Summarize handler quantifies expression and generates some population summary statistics using processed BAM files. This script utilizes featureCounts for the gene quantification, and custom BASH functions for summary information.

To run Quantify_Summarize, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Quantify_Summarize can be submitted to a job scheduler with the following command (assuming that you are in the directory containing RNApipeline):

./main.sh Quantify_Summarize proj.conf

Handler-Specific Variables

The following are a list of variables that need to be defined within the configuration file. In addition to the handler-specific variables, all common variables must be defined.

Variable	Function
`QS_QSUB`	QSub settings for batch submission
`BAM_LIST`	A list of full file paths to the finished BAM files. This will be `${OUT_DIR}/SAM_Processing/${PROJECT}_bams.txt` if using SAM_Processing
`REF_ANN`	Annotations for reference genome; must be in GTF or GFF3 format
`STRUCTURAL`	Annotations denoting regions of structural RNA; must be in GTF, GFF3, or BED format
`DSTAT_EXPR`	Reference expression table for DSTAT calculations

Output

Quantify_Summarize will an expression matrix quantifying the expression of each gene for each sample provided. In addition, a file called ${PROJECT}_stats.tsv will be generated with the following information:

fraction of reads that are duplicates (p_dup)
number of genes expressed (expr_div)
dstat scores for each population present in the reference expression table (dstat_*)
fraction of reads aligning to structural RNA regions (p_struct)

Dependencies

The Quantify_Summarize handler depends on:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantify_Summarize

Basic Usage

Handler-Specific Variables

Output

Dependencies

Main Information

Recommended Workflow Handlers

Clone this wiki locally