Releases: sof202/ChromOptimise
Releases · sof202/ChromOptimise
v1.3.0
Features
- Added usage to main script that shows if called incorrectly
- Added a supplementary script to recreate transitions matrices without main diagonal, increasing interpretability (#36)
- Made a colour blind friendly (and more easily interpretable) colour scheme for heritability heatmaps (#38)
- Added an option to customise the ldsc window size, helps with model misspecification (#39)
- Added a check to see if all states in smaller models are indeed in the 'optimum model' for further validation (#40)
- Added a check that warns the user if the optimum model has a larger BIC than a smaller model (unlikely to occur in my experience)
- Added another point to the redundant states criterion. States are expected to have a certain level of 'stability' which is now also measured via calculating the expected number of contiguous state assignments for each state. Low contiguity being associated with instability (#41)
Build
- Revamp the way software is used in ChromOptimise
Bug fixes
- Now selects a single state assignments file if multiple cell types were used in training
- Various X11 issues on remote servers
- Warning message for heritability plot is written to correct output directory
- Force use of Cairo for image generation
- Correctly output the number of states in optimum model for the rest of the pipeline to use in the case where the largest model is considered optimal.
- Add execution permissions to all Rscripts to avoid file permission errors with SLURM on some systems
Refactors
- use
[[""]]
notation for extraction in Rscripts - Use
seq_along
andseq_len
over1:nrow()
where possible - Favour the usage of
dplyr
package where possible (also results in performance increases in some areas) - Removal of
setwd()
- Use of
file.path
for more portable file paths (though it is expected that scripts are ran on Linux systems still) - Prefer the usage of
.tsv
files over.csv
files. - Remove filenames from scripts in favour of
basename $0
, this reduces misleading incorrect file names (#42)
Documentation
- Added redirection to wiki page to give suggestions on max wall times
- Updated software requirements and dependencies (for individual scripts)
- Fixed some broken mathematical equations (not rendering correctly)
- Added explanation behind colour scheme seen in heritability heatmap plots (#38)
- Removed duplication of information in preambles (usage has same info)
- Fix the ordering of sidebars to reflect importance and ordering of pipeline scripts
v1.2.1
v1.2.0
Features
- Heatmap for partitioned heritability enrichment is now less crowded and instead shows ** or * for significance
- Allow user to binarize bed and bam files (not just bam)
- Added ability for user to specify the maximum number of iterations ChromHMM should use whilst learning models
- Add an install packages script [for R]
- Added warning message if the model with the optimum number of states has a higher BIC than the next (less complexity) model
- The model with the optimum number of states will now be moved to the optimum states directory (so one only has to keep a single file)
- Added a plot for isolation scores which can be used to help users decide what the threshold should be in
Config.R
Refactors
- Overhauled build process (config files)
- Removed subsampling [and earlier] scripts (not really required and added a lot of complexity)
- Removed preambles from top of shell scripts and updated usage to reflect this.
- Removed 'checkpoint script' (no longer required due to subsampling stage being removed)
- Output of RedundantStateChecker.R is more human readable now
Bug fixes
- Fixed BIC and Likelihood plots not being created (bad file formatting)
Docs
- Made necessary changes to reflect refactors and features
- Removed config files from website (now outlined in README)
v1.1.0
New features
- Uses baseline categories recommended by the creators of ldsc. This should lead to more accurate enrichment values and also better p-values.
- baseline categories add a lot of noise to the enrichment heatmaps, so separate heatmaps are made solely for the annotations that the user can control (state assignments etc.)
- Bedtools is now used to find the windows that a SNP falls into. This is techinically faster, but the speed is largely cancelled out from the excessive I/O usage. Speed ups will be significant if a high number of states/marks are used in analysis.
- Epigenetic marks (from ChromHMM binary files) are now factored into the annot files and therefore partitioned heritability
- Due to the high number of traits and annotations being considered. Two metrics have been added to p-value bar plots
- Bonferroni significance (this doesn't take into account factors out of your control, mainly the number of annotations in the baseline annot files)
- FDR correction (uses the BH method)
Other changes
- Split script 7 into two scripts (one for reference LDSCores, the other for heritability)
- This was to avoid memory issues that came from including the baseline annotations
- Removed --gwas (-g) flag for ldsc scripts. These caused lots of problems
Bug fixes
- Fix multiple bugs that could cause partitioned heritability to break (python related and glob related bugs)
- Updated the logic
v1.0.0
Initial release of ChromOptimise
Pipeline tested (possibly more bugs to iron out).
Main features
- Processing pipeline from bam files to binary files
- Generating ChromHMM models of varying sizes
- Optimising the number of states under statistical measures
- Ability to view biological relevance via LDSC