-
Notifications
You must be signed in to change notification settings - Fork 1
Quantify_Summarize
The Quantify_Summarize handler quantifies expression and generates some population summary statistics using processed BAM files. This script utilizes featureCounts for the gene quantification, and custom BASH functions for summary information.
To run Quantify_Summarize, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Quantify_Summarize can be submitted to a job scheduler with the following command (assuming that you are in the directory containing RNApipeline):
./main.sh Quantify_Summarize proj.conf
The following are a list of variables that need to be defined within the configuration file. In addition to the handler-specific variables, all common variables must be defined.
Variable | Function |
---|---|
QS_QSUB |
QSub settings for batch submission |
BAM_LIST |
A list of full file paths to the finished BAM files. This will be ${OUT_DIR}/SAM_Processing/${PROJECT}_bams.txt if using SAM_Processing
|
REF_ANN |
Annotations for reference genome; must be in GTF or GFF3 format |
STRUCTURAL |
Annotations denoting regions of structural RNA; must be in GTF, GFF3, or BED format |
DSTAT_EXPR |
Reference expression table for DSTAT calculations |
Quantify_Summarize will an expression matrix quantifying the expression of each gene for each sample provided.
In addition, a file called ${PROJECT}_stats.tsv
will be generated with the following information:
- fraction of reads that are duplicates (
p_dup
) - number of genes expressed (
expr_div
) - dstat scores for each population present in the reference expression table (
dstat_*
) - fraction of reads aligning to structural RNA regions (
p_struct
)
The Quantify_Summarize handler depends on: