-
Notifications
You must be signed in to change notification settings - Fork 0
Re-implement spooker in python #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #71 +/- ##
==========================================
+ Coverage 73.79% 75.16% +1.37%
==========================================
Files 22 25 +3
Lines 1595 1784 +189
==========================================
+ Hits 1177 1341 +164
- Misses 418 443 +25 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…to spooker-cli
no separate files for tree, jobby, etc
fbaec15
to
3698bca
Compare
just use tree to determine number of files & directory size instead of du |
for all master & slurm jobs that did not complete, get the log out/err files and include the text in the json |
Actually if you get the slurmjobids (which have status != COMPLETED) ... we can get the file paths of the .err or .out files from the tree itself... no need to glob. |
|
@kelly-sovacool The output should be nested-JSON file: {
"pipeline_metadata": {
"pipeline_name": "XYZ (parsed as input)",
"pipeline_path": "/path/to/pipeline (how are we getting this?)",
"pipeline_outdir": "/path/to/output (parsed as input)",
"pipeline_outdir_size": "(from tree JSON; look for type:report)",
"pipeline_version": "1.0.0 (parsed as input)",
"user": "user_name (from os.environ['USER'])",
"groups": "group1 group2 (from `groups` command)",
"date": "2025-05-12T15:37:48 (ISO 8601 format)",
"nsamples": "(lookup via pipeline_regex.JSON and apply regex)"
},
"jobby": {
"example_key": "example_value (output from `jobby --json`)"
},
"outdir_tree": {
"example_tree": "output of `tree -J` on the output directory"
},
"master_job_log": {
"txt": ""
},
"failed_jobs": {
"12345": {
"logfilepath": "/path/to/logfile (derived from tree)",
"logfiletxt": "Content of log file here",
"errfilepath": "/path/to/errfile (derived from tree)",
"errfiletxt": "Content of error file here"
}
}
} we can then gzip this JSON and move it to user-staging folder under |
@kelly-sovacool add |
…to spooker-cli
Current version is working. Need to test it on a run that had some jobs fail. Also need to test on nextflow. |
Example output json (with tree, jobby, & master job log omitted for brevity) {
"outdir_tree": "...",
"pipeline_metadata": {
"pipeline_name": "RENEE",
"pipeline_path": "/data/CCBR_Pipeliner/Pipelines/RENEE/renee-dev-sovacool",
"pipeline_outdir": "/data/sovacoolkl/renee_test_hg38_48",
"pipeline_outdir_size": 2840578746,
"pipeline_version": "v2.6.7-dev",
"ccbrpipeliner_module": "unknown",
"user": "sovacoolkl",
"uid": "60731",
"groups": "CCBR CCBR_Pipeliner SCLCgenomics Ziegelbauer_lab sovacoolkl NCI-workbench-users SCLC_scRNA",
"date": "2025-05-14T17-52-37",
"nsamples": 4
},
"jobby": "...",
"master_job_log": {
"txt": "..."
},
"failed_jobs": {}
} |
For champagne. I added job name, state, & exit code to the failed jobs dict for better context. {
"outdir_tree": "...",
"pipeline_metadata": {
"pipeline_name": "CHAMPAGNE",
"pipeline_path": "/data/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool",
"pipeline_outdir": "/data/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool",
"pipeline_outdir_size": 370213210424,
"pipeline_version": "v0.4.1-dev",
"ccbrpipeliner_module": "unknown",
"user": "sovacoolkl",
"uid": "60731",
"groups": "CCBR CCBR_Pipeliner SCLCgenomics Ziegelbauer_lab sovacoolkl NCI-workbench-users SCLC_scRNA",
"date": "2025-05-15T09-33-30",
"nsamples": 0
},
"jobby": "...",
"master_job_log": {
"txt": "..."
},
"failed_jobs": {
"57140712": {
"JobName": "nf-CHIPSEQ_QC_PRESEQ_(SPT5_T0_1)",
"JobState": "FAILED",
"ExitCode": 11,
"log_out_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/0f/2dec15cccbbe8eb2982f720e37cda5/.command.out",
"log_out_txt": "",
"log_err_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/0f/2dec15cccbbe8eb2982f720e37cda5/.command.err",
"log_err_txt": "WARNING: Not virtualizing pid namespace by configuration\n/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/0f/2dec15cccbbe8eb2982f720e37cda5/.command.sh: line 3: 3058827 Segmentation fault (core dumped) preseq lc_extrap -B -D -o SPT5_T0_1.lc_extrap.txt SPT5_T0_1.filtered.bam -seed 12345 -v -l 100000000000 2> SPT5_T0_1.preseq.log\n"
},
"57140974": {
"JobName": "nf-CHIPSEQ_QC_PRESEQ_(SPT5_T0_2)",
"JobState": "FAILED",
"ExitCode": 11,
"log_out_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/3a/2913ebefce8ee7fe2c953d7c8b4f3e/.command.out",
"log_out_txt": "",
"log_err_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/3a/2913ebefce8ee7fe2c953d7c8b4f3e/.command.err",
"log_err_txt": "WARNING: Not virtualizing pid namespace by configuration\n/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/3a/2913ebefce8ee7fe2c953d7c8b4f3e/.command.sh: line 3: 1580954 Segmentation fault (core dumped) preseq lc_extrap -B -D -o SPT5_T0_2.lc_extrap.txt SPT5_T0_2.filtered.bam -seed 12345 -v -l 100000000000 2> SPT5_T0_2.preseq.log\n"
},
"57141181": {
"JobName": "nf-CHIPSEQ_QC_PRESEQ_(SPT5_INPUT)",
"JobState": "FAILED",
"ExitCode": 1,
"log_out_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/f7/824c11f58831bea0eea62183897593/.command.out",
"log_out_txt": "",
"log_err_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/f7/824c11f58831bea0eea62183897593/.command.err",
"log_err_txt": "WARNING: Not virtualizing pid namespace by configuration\n"
},
"57142656": {
"JobName": "nf-CHIPSEQ_PHANTOM_PEAKS_(SPT5_INPUT)",
"JobState": "FAILED",
"ExitCode": 1,
"log_out_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/8f/e3f6d397128b65cb34864742b24737/.command.out",
"log_out_txt": "################\nChIP data: SPT5_INPUT.filtered.dedup.sort.f66.bam \nControl data: NA \nstrandshift(min): -500 \nstrandshift(step): 5 \nstrandshift(max) 1500 \nuser-defined peak shift NA \nexclusion(min): 10 \nexclusion(max): NaN \nnum parallel nodes: NA \nFDR threshold: 0.01 \nNumPeaks Threshold: NA \nOutput Directory: . \nnarrowPeak output file name: NA \nregionPeak output file name: NA \nRdata filename: NA \nplot pdf filename: SPT5_INPUT.ppqt.pdf \nresult filename: SPT5_INPUT.spp.out \nOverwrite files?: FALSE\n\n[1] TRUE\nReading ChIP tagAlign/BAM file SPT5_INPUT.filtered.dedup.sort.f66.bam \nopened /tmp/RtmpCMj6B7/SPT5_INPUT.filtered.dedup.sort.f66.tagAlign1b57d369129cea\ndone. read 0 fragments\n",
"log_err_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/8f/e3f6d397128b65cb34864742b24737/.command.err",
"log_err_txt": "WARNING: Not virtualizing pid namespace by configuration\nLoading required package: Rcpp\nError in read.table(bam2align.filename, nrows = 500) : \n no lines available in input\nCalls: read.align -> read.table\nExecution halted\n"
},
"57143272": {
"JobName": "nf-CHIPSEQ_CALL_PEAKS_GEM_(SPT5_T0_1)",
"JobState": "FAILED",
"ExitCode": 1,
"log_out_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/46/de0f89421c920f106e09570fbe4b65/.command.out",
"log_out_txt": "\nGEM (version 3.4)!\n\nPlease cite: \nYuchun Guo, Shaun Mahony, David K. Gifford (2012) PLoS Computational Biology 8(8): e1002638. \nHigh Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. \ndoi:10.1371/journal.pcbi.1002638\n\nGifford Laboratory at MIT (http://cgs.csail.mit.edu/gem/).\n\n----------------------------------\n\nStart time: 2025/05/14 18:16:14\n\nLoading data...\n Loading reads from: SPT5_T0_1.filtered.dedup.sort.bam ... Loaded\n Loading reads from: SPT5_INPUT.filtered.dedup.sort.bam ... Loaded\n",
"log_err_path": "/gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CHAMPAGNE/champagne-dev-sovacool/work/46/de0f89421c920f106e09570fbe4b65/.command.err",
"log_err_txt": "WARNING: Not virtualizing pid namespace by configuration\nException in thread \"main\" java.lang.NullPointerException\n\tat edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.populateArrays(AlignmentFileReader.java:260)\n\tat edu.mit.csail.cgs.deepseq.utilities.SAMReader.countReads(SAMReader.java:86)\n\tat edu.mit.csail.cgs.deepseq.utilities.AlignmentFileReader.getTotalHits(AlignmentFileReader.java:310)\n\tat edu.mit.csail.cgs.deepseq.utilities.FileReadLoader.<init>(FileReadLoader.java:147)\n\tat edu.mit.csail.cgs.deepseq.DeepSeqExpt.<init>(DeepSeqExpt.java:84)\n\tat edu.mit.csail.cgs.deepseq.DeepSeqExpt.<init>(DeepSeqExpt.java:78)\n\tat edu.mit.csail.cgs.deepseq.discovery.GEM.<init>(GEM.java:127)\n\tat edu.mit.csail.cgs.deepseq.discovery.GEM.main(GEM.java:343)\n"
}
}
} |
…to spooker-cli
Changes
--outerr
and--include-completed
)Issues
PR Checklist
(
Strikethroughany points that are not applicable.)CHANGELOG.md
with a short description of any user-facing changes and reference the PR number. Guidelines: https://keepachangelog.com/en/1.1.0/