Skip to content

Releases: sof202/ChromOptimise

v1.3.0

28 Nov 11:54
3438644
Compare
Choose a tag to compare

Features

  • Added usage to main script that shows if called incorrectly
  • Added a supplementary script to recreate transitions matrices without main diagonal, increasing interpretability (#36)
  • Made a colour blind friendly (and more easily interpretable) colour scheme for heritability heatmaps (#38)
  • Added an option to customise the ldsc window size, helps with model misspecification (#39)
  • Added a check to see if all states in smaller models are indeed in the 'optimum model' for further validation (#40)
  • Added a check that warns the user if the optimum model has a larger BIC than a smaller model (unlikely to occur in my experience)
  • Added another point to the redundant states criterion. States are expected to have a certain level of 'stability' which is now also measured via calculating the expected number of contiguous state assignments for each state. Low contiguity being associated with instability (#41)

Build

  • Revamp the way software is used in ChromOptimise
    • Conda is now used to obtain R, Java openjdk and bedtools (#44)
    • renv is now used to obtain the necessary R packages/libraries (#46)

Bug fixes

  • Now selects a single state assignments file if multiple cell types were used in training
  • Various X11 issues on remote servers
  • Warning message for heritability plot is written to correct output directory
  • Force use of Cairo for image generation
  • Correctly output the number of states in optimum model for the rest of the pipeline to use in the case where the largest model is considered optimal.
  • Add execution permissions to all Rscripts to avoid file permission errors with SLURM on some systems

Refactors

  • use [[""]] notation for extraction in Rscripts
  • Use seq_along and seq_len over 1:nrow() where possible
  • Favour the usage of dplyr package where possible (also results in performance increases in some areas)
  • Removal of setwd()
  • Use of file.path for more portable file paths (though it is expected that scripts are ran on Linux systems still)
  • Prefer the usage of .tsv files over .csv files.
  • Remove filenames from scripts in favour of basename $0, this reduces misleading incorrect file names (#42)

Documentation

  • Added redirection to wiki page to give suggestions on max wall times
  • Updated software requirements and dependencies (for individual scripts)
  • Fixed some broken mathematical equations (not rendering correctly)
  • Added explanation behind colour scheme seen in heritability heatmap plots (#38)
  • Removed duplication of information in preambles (usage has same info)
  • Fix the ordering of sidebars to reflect importance and ordering of pipeline scripts

v1.2.1

08 Aug 15:44
18611c5
Compare
Choose a tag to compare

Features

  • The heatmap for partitioned heritability will now convert all negative enrichments into NA values
    • Such enrichments are coloured pink in the heatmap like before, however, now you can identify which enrichments are nonsensical at a quick glance

v1.2.0

08 Aug 15:12
780bd0f
Compare
Choose a tag to compare

Features

  • Heatmap for partitioned heritability enrichment is now less crowded and instead shows ** or * for significance
  • Allow user to binarize bed and bam files (not just bam)
  • Added ability for user to specify the maximum number of iterations ChromHMM should use whilst learning models
  • Add an install packages script [for R]
  • Added warning message if the model with the optimum number of states has a higher BIC than the next (less complexity) model
  • The model with the optimum number of states will now be moved to the optimum states directory (so one only has to keep a single file)
  • Added a plot for isolation scores which can be used to help users decide what the threshold should be in Config.R

Refactors

  • Overhauled build process (config files)
  • Removed subsampling [and earlier] scripts (not really required and added a lot of complexity)
  • Removed preambles from top of shell scripts and updated usage to reflect this.
  • Removed 'checkpoint script' (no longer required due to subsampling stage being removed)
  • Output of RedundantStateChecker.R is more human readable now

Bug fixes

  • Fixed BIC and Likelihood plots not being created (bad file formatting)

Docs

  • Made necessary changes to reflect refactors and features
  • Removed config files from website (now outlined in README)

v1.1.0

05 Apr 16:01
Compare
Choose a tag to compare

New features

  • Uses baseline categories recommended by the creators of ldsc. This should lead to more accurate enrichment values and also better p-values.
    • baseline categories add a lot of noise to the enrichment heatmaps, so separate heatmaps are made solely for the annotations that the user can control (state assignments etc.)
  • Bedtools is now used to find the windows that a SNP falls into. This is techinically faster, but the speed is largely cancelled out from the excessive I/O usage. Speed ups will be significant if a high number of states/marks are used in analysis.
  • Epigenetic marks (from ChromHMM binary files) are now factored into the annot files and therefore partitioned heritability
  • Due to the high number of traits and annotations being considered. Two metrics have been added to p-value bar plots
    • Bonferroni significance (this doesn't take into account factors out of your control, mainly the number of annotations in the baseline annot files)
    • FDR correction (uses the BH method)

Other changes

  • Split script 7 into two scripts (one for reference LDSCores, the other for heritability)
    • This was to avoid memory issues that came from including the baseline annotations
  • Removed --gwas (-g) flag for ldsc scripts. These caused lots of problems

Bug fixes

  • Fix multiple bugs that could cause partitioned heritability to break (python related and glob related bugs)
  • Updated the logic

v1.0.0

18 Mar 12:06
ec758be
Compare
Choose a tag to compare

Initial release of ChromOptimise

Pipeline tested (possibly more bugs to iron out).

Main features

  • Processing pipeline from bam files to binary files
  • Generating ChromHMM models of varying sizes
  • Optimising the number of states under statistical measures
  • Ability to view biological relevance via LDSC