Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clodius crashes when converting bedgraph into hitile files #133

Open
MartinPippel opened this issue Nov 7, 2024 · 0 comments
Open

Clodius crashes when converting bedgraph into hitile files #133

MartinPippel opened this issue Nov 7, 2024 · 0 comments

Comments

@MartinPippel
Copy link
Contributor

Describe the bug
command:

sort -k1,1V -k2,2n -k3,3n  hifiasm-scaffolded-default_telomer.bedgraph | \
clodius aggregate bedgraph --no-header --chromsizes-filename test.sizes -o hifiasm-scaffolded-default_telomer.hitile -

crash report

Traceback (most recent call last):
  File "/opt/conda/bin/clodius", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/opt/conda/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/clodius/cli/aggregate.py", line 1408, in bedgraph
    _bedgraph(
  File "/opt/conda/lib/python3.12/site-packages/clodius/cli/aggregate.py", line 869, in _bedgraph
    d.attrs["chrom-names"] = chrom_order
    ~~~~~~~^^^^^^^^^^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/opt/conda/lib/python3.12/site-packages/h5py/_hl/attrs.py", line 104, in __setitem__
    self.create(name, data=value)
  File "/opt/conda/lib/python3.12/site-packages/h5py/_hl/attrs.py", line 202, in create
    attr = h5a.create(self._id, name, htype, space)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5a.pyx", line 50, in h5py.h5a.create
OSError: Unable to synchronously create attribute (object header message is too large)

To Reproduce
Steps to reproduce the behavior:
This seems to happen when the number of contigs is very large (>>1000 contigs). But when reducing the contigs another problem occurs - and that happens kind of randomly:

Traceback (most recent call last):
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/bin/clodius", line 11, in <module>
    sys.exit(cli())
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/clodius/cli/aggregate.py", line 1400, in bedgraph
    zoom_step,
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/clodius/cli/aggregate.py", line 1023, in _bedgraph
    dsets[curr_zoom][curr_pos : curr_pos + chunk_size] = curr_chunk
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 980, in __setitem__
    mspace = h5s.create_simple(selection.expand_shape(mshape))
  File "/cfs/klemming/scratch/p/pippel/prog/conda_envs/clodius/lib/python3.7/site-packages/h5py/_hl/selections.py", line 264, in expand_shape
    raise TypeError("Can't broadcast %s -> %s" % (source_shape, self.array_shape))  # array shape
TypeError: Can't broadcast (9852211,) -> (8672655,)

Solution

MartinPippel added a commit to MartinPippel/Earth-Biogenome-Project-pilot that referenced this issue Nov 7, 2024
mahesh-panchal added a commit that referenced this issue Nov 25, 2024
* Add Pixi dev env

* Update/install YAHS and BWAMEM2 modules

* Add Scaffold skeleton

* Add scaffold subworkflow to main.nf

* add pairtools and pairix modules for HiC pipeline

* add more modules for scaffolding workflow

* add .DS_Store to .gitignore

* add pairtools to scaffolding subworkflow

* add initial yahs scaffolding subworkflow

* fix file names and process argument

* add version file collection

* remove hard-coded ncbi.taxdb path from test profile

* fix typo ch_versions

* increase all cpu and memory default values for scaffolding subworkflow

* increase time limits for scaffolding processes

* add publishDir for yahs process

* fix stage name for yahs publishDir

* publish hic alignment statistics and hard code the threads for compression and decomression tasks

* bump up pairtools version to  1.1.0

* add tiny read set: PacBio HiFi and HiC, species: drosophila, 2Mb

* Add testing (#126)

* Initialise nf-test

* Remove defunct tests

* Add test data provenance README

* Update docker settings

* Docker as default profile

* Add Tiny CI test yaml

* Update test config

* Update pairtools patch

* Update gitignore

* Comment out cpus

* Add Resume to test config

* Fix intermittent null object issue with LazyMap

* Move around resource configurations

* Improve process tags

* Clean up resource allocation

* Add config README

* Fix pairtools parse cpu allocation

* Update tests/data/tiny/README.md

---------

Co-authored-by: Martin Pippel <martin.pippel@gmail.com>

* Fix Scaffolding to work in Multiple HiC

* Assembly report (#127)

* Update pairtools patch

* Move and clean up resource configurations

* Sketch out prototype

* Move Groovy things inside workflow

* Add mamba for conda development when container doesn't exist

* Add debug profile

* Add GenomeScope2 plots

* Update assets/notebooks/assembly_report.qmd

Co-authored-by: Martin Pippel <martin.pippel@gmail.com>

* Add Pixi tasks to toml

* Add container to Quarto notebook

* Update report qmd

* Add dev task

* Add pandas to quarto env

* Change debug flag name to diagnostics

* Update report

* Transpose DToL table

* Add more tables and plots

* Add KatComp

* Add HiFiasm Kmer graph

* Add Quast

* Add Merqury{,FK} figures and tables

* Add Purgedups plot

* Add Pairtools plots

---------

Co-authored-by: Martin Pippel <martin.pippel@gmail.com>

* Remove custom purging cutoff

* copy scaffolding subworkflow to new curation subworkflow

* add initial curation subworkflow

* cleanup view calls and add tool versions of curation subworkflow

* bugfix renamed process name aliases: SAMTOOLS_MERGE_HIC and SAMTOOLS_MERGE_HIFI

* scaffolding: add dedup stats file to output directory

* write final higlass and pretext results to output dir

* switch from combine to join channels

* cleanup code

* add default runtime limits: cpus,memory,time for some processes

* cite sanger and arima

* Update configs/modules.config

Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se>

* Update configs/modules.config

Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se>

* remove hardcoded ncbi.taxdb path

* add curly brackets

* add resourceLimits to test.config and cleanup

* Update modules/local/hic_curation/bam2bed_sort.nf

Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se>

* Update modules/local/hic_curation/bam2coverageTracks.nf

Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se>

* cite Sanger and DAMARVEL repo for BAM2BED_SORT module

* replace container = <CONT> with container <CONT>

* Update modules/local/hic_curation/create_telomer_hitile_track.nf

Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se>

* Update modules/local/hic_curation/pretext_tracks_ingestion.nf

Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se>

* add comment to transpose(by:1)

* add comments about joining

* merge conflict

* bugfix add missing mapQV value two two_read_combiner

* bugfix - groupTuples on bam files

* Add test resource limits (#132)

* Update test profile limits

* Add resource limits to Gitpod

* small bugfixed

* bugfix convert string to integer

* increase memory for FASTK

* bugfix FASTK resources: issue compressed reads are written out uncompressed to disk

* reduce cpus for meryl count process

* reduce memory limit for bwa index process

* fix read variable and reduce memory in meryl count module

* fix MemoryUnit multiplication

* fix MemoryUnit multiplication - again

* create bigwig files instead of hitile file - solve issue #133

* cleanup

* add Maheshs Quarto fix from #aa25974

* add exception for FCSGX resource to all profiles

* bugfix: add commas to configs

* bugfix missing closing bracket

* another bugfix trial

* increase memory limit for MINIMAP2_ALIGN_ASSEMBLY_PRIMARY

* bugfix FCSGX_RUNGX runtime limits

* skip hitile creation, use bigwig insteadf

* cleanup uppmax.config, force FCSGX_RUNGX jobs to run on node partition on rackham

* Follow dardel config update from nf-core/configs#803

* Move task.cpus from config to process

* Remove extra resourceLimits

* Move params.hifi_coverage_cap to process input

* Move params.hic_map_sort_by to process input:

* Fix report

* Fix decontaminate subworkflow

* cleanup, remove -5 bwa default argument, remove samtools fixmate step, add ugly grep hack to avoid samtools header issue

* bugfix args3

---------

Co-authored-by: MartinPippel <martin.pippel@nbis.se>
Co-authored-by: Martin Pippel <martin.pippel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant