Skip to content

Quantify_Summarize

Paul Hoffman edited this page Sep 26, 2019 · 3 revisions

Basic Usage

The Quantify_Summarize handler quantifies expression and generates some population summary statistics using processed BAM files. This script utilizes featureCounts for the gene quantification, and custom BASH functions for summary information.

To run Quantify_Summarize, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Quantify_Summarize can be submitted to a job scheduler with the following command (assuming that you are in the directory containing RNApipeline):

./main.sh Quantify_Summarize proj.conf

Handler-Specific Variables

The following are a list of variables that need to be defined within the configuration file. In addition to the handler-specific variables, all common variables must be defined.

Variable Function
QS_QSUB QSub settings for batch submission
BAM_LIST A list of full file paths to the finished BAM files. This will be ${OUT_DIR}/SAM_Processing/${PROJECT}_bams.txt if using SAM_Processing
REF_ANN Annotations for reference genome; must be in GTF or GFF3 format
STRUCTURAL Annotations denoting regions of structural RNA; must be in GTF, GFF3, or BED format
DSTAT_EXPR Reference expression table for DSTAT calculations

Output

Quantify_Summarize will an expression matrix quantifying the expression of each gene for each sample provided. In addition, a file called ${PROJECT}_stats.tsv will be generated with the following information:

  • fraction of reads that are duplicates (p_dup)
  • number of genes expressed (expr_div)
  • dstat scores for each population present in the reference expression table (dstat_*)
  • fraction of reads aligning to structural RNA regions (p_struct)

Dependencies

The Quantify_Summarize handler depends on: