28 Nov 11:54

sof202

v1.3.0 Latest

Latest

Features

Added usage to main script that shows if called incorrectly
Added a supplementary script to recreate transitions matrices without main diagonal, increasing interpretability (#36)
Made a colour blind friendly (and more easily interpretable) colour scheme for heritability heatmaps (#38)
Added an option to customise the ldsc window size, helps with model misspecification (#39)
Added a check to see if all states in smaller models are indeed in the 'optimum model' for further validation (#40)
Added a check that warns the user if the optimum model has a larger BIC than a smaller model (unlikely to occur in my experience)
Added another point to the redundant states criterion. States are expected to have a certain level of 'stability' which is now also measured via calculating the expected number of contiguous state assignments for each state. Low contiguity being associated with instability (#41)

Build

Revamp the way software is used in ChromOptimise
- Conda is now used to obtain R, Java openjdk and bedtools (#44)
- renv is now used to obtain the necessary R packages/libraries (#46)

Bug fixes

Now selects a single state assignments file if multiple cell types were used in training
Various X11 issues on remote servers
Warning message for heritability plot is written to correct output directory
Force use of Cairo for image generation
Correctly output the number of states in optimum model for the rest of the pipeline to use in the case where the largest model is considered optimal.
Add execution permissions to all Rscripts to avoid file permission errors with SLURM on some systems

Refactors

use [[""]] notation for extraction in Rscripts
Use seq_along and seq_len over 1:nrow() where possible
Favour the usage of dplyr package where possible (also results in performance increases in some areas)
Removal of setwd()
Use of file.path for more portable file paths (though it is expected that scripts are ran on Linux systems still)
Prefer the usage of .tsv files over .csv files.
Remove filenames from scripts in favour of basename $0, this reduces misleading incorrect file names (#42)

Documentation

Added redirection to wiki page to give suggestions on max wall times
Updated software requirements and dependencies (for individual scripts)
Fixed some broken mathematical equations (not rendering correctly)
Added explanation behind colour scheme seen in heritability heatmap plots (#38)
Removed duplication of information in preambles (usage has same info)
Fix the ordering of sidebars to reflect importance and ordering of pipeline scripts

Assets 2

08 Aug 15:44

sof202

v1.2.1

Features

The heatmap for partitioned heritability will now convert all negative enrichments into NA values
- Such enrichments are coloured pink in the heatmap like before, however, now you can identify which enrichments are nonsensical at a quick glance

Assets 2

08 Aug 15:12

sof202

v1.2.0

Features

Heatmap for partitioned heritability enrichment is now less crowded and instead shows ** or * for significance
Allow user to binarize bed and bam files (not just bam)
Added ability for user to specify the maximum number of iterations ChromHMM should use whilst learning models
Add an install packages script [for R]
Added warning message if the model with the optimum number of states has a higher BIC than the next (less complexity) model
The model with the optimum number of states will now be moved to the optimum states directory (so one only has to keep a single file)
Added a plot for isolation scores which can be used to help users decide what the threshold should be in Config.R

Refactors

Overhauled build process (config files)
Removed subsampling [and earlier] scripts (not really required and added a lot of complexity)
Removed preambles from top of shell scripts and updated usage to reflect this.
Removed 'checkpoint script' (no longer required due to subsampling stage being removed)
Output of RedundantStateChecker.R is more human readable now

Bug fixes

Fixed BIC and Likelihood plots not being created (bad file formatting)

Docs

Made necessary changes to reflect refactors and features
Removed config files from website (now outlined in README)

Assets 2

05 Apr 16:01

sof202

v1.1.0

New features

Uses baseline categories recommended by the creators of ldsc. This should lead to more accurate enrichment values and also better p-values.
- baseline categories add a lot of noise to the enrichment heatmaps, so separate heatmaps are made solely for the annotations that the user can control (state assignments etc.)
Bedtools is now used to find the windows that a SNP falls into. This is techinically faster, but the speed is largely cancelled out from the excessive I/O usage. Speed ups will be significant if a high number of states/marks are used in analysis.
Epigenetic marks (from ChromHMM binary files) are now factored into the annot files and therefore partitioned heritability
Due to the high number of traits and annotations being considered. Two metrics have been added to p-value bar plots
- Bonferroni significance (this doesn't take into account factors out of your control, mainly the number of annotations in the baseline annot files)
- FDR correction (uses the BH method)

Other changes

Split script 7 into two scripts (one for reference LDSCores, the other for heritability)
- This was to avoid memory issues that came from including the baseline annotations
Removed --gwas (-g) flag for ldsc scripts. These caused lots of problems

Bug fixes

Fix multiple bugs that could cause partitioned heritability to break (python related and glob related bugs)
Updated the logic

Assets 2

18 Mar 12:06

sof202

v1.0.0

Initial release of ChromOptimise

Pipeline tested (possibly more bugs to iron out).

Main features

Processing pipeline from bam files to binary files
Generating ChromHMM models of varying sizes
Optimising the number of states under statistical measures
Ability to view biological relevance via LDSC

Assets 2