Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in sambamba depth base #522

Open
emollier opened this issue Feb 7, 2025 · 0 comments
Open

Segmentation fault in sambamba depth base #522

emollier opened this issue Feb 7, 2025 · 0 comments
Labels

Comments

@emollier
Copy link

emollier commented Feb 7, 2025

Greetings,

I think I hit a difficult issues with sambamba and I pondered whether it would be of interest or not, as it is not a hard blocker. In doubt, here it is.

Describe the bug

I observe a rare occurrence of Segmentation fault when running sambamba depth base using many threads:

$ sambamba depth base --min-coverage=0 reads_sort.bam -L target.bed --nthreads=12 2>/dev/null
REF     POS     COV     A       C       G       T       DEL     REFSKIP SAMPLE
ref     6       1       0       0       0       1       0       0       *
ref     7       1       0       0       0       1       0       0       *
ref     8       3       3       0       0       0       0       0       *
ref     9       3       0       0       3       0       0       0       *
[…]
ref2    32      3       3       0       0       0       0       0       *
ref2    33      3       0       3       0       0       0       0       *
ref2    34      2       0       0       0       2       0       0       *
ref2    35      1       1       0       0       0       0       0       *
Segmentation fault

Capturing the backtrace doesn't look too exploitable as it is D language, but I add it for reference, especially as I had difficulties to capture it:

#0  0x00007ffff7a9abe0 in object.ModuleInfo.tlsctor() const () from
+/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.106
#1  0x00007ffff7aa97d1 in ?? () from /lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.106
#2  0x00007ffff7aaa759 in rt.sections_elf_shared.DSO.opApply(scope int(ref rt.sections_elf_shared.DSO)
+delegate) () from /lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.106
#3  0x00007ffff7a93320 in thread_entryPoint () from /lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.106
#4  0x00007ffff773d083 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#5  0x00007ffff77bb7b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

I'm a bit unsure at what rate the problem occurs. I started with about 1 issue per 5 or 6 runs, but further tests seemed to have much less occurrences. I never encountered the problem on single threaded runs, which suggests a race condition.

Software versions were:

  • sambamba 1.0.1
  • ldc 1.40.0

To Reproduce

Steps to reproduce the behavior:

  1. Run sambamba depth base at least a dozen of times with a high thread count:
$ sambamba depth base --min-coverage=0 reads_sort.bam -L target.bed --nthreads=12 2>/dev/null
  1. See the Segmentation fault in some of the outputs.

Expected behavior

I would expeect the program to always return successfully after being able to output results.

Additional context

The test data that has been used for the purpose of the test was initially preprocessed using routines that are in use in the Debian package nanosv autopkgtest routines. The raw data is the file toy.sam from examples of samtools:

@SQ	SN:ref	LN:45
@SQ	SN:ref2	LN:40
r001	163	ref	7	30	8M4I4M1D3M	=	37	39	TTAGATAAAGAGGATACTG	*	XX:B:S,12561,2,20,112
r002	0	ref	9	30	1S2I6M1P1I1P1I4M2I	*	0	0	AAAAGATAAGGGATAAA	*
r003	0	ref	9	30	5H6M	*	0	0	AGCTAA	*
r004	0	ref	16	30	6M14N1I5M	*	0	0	ATAGCTCTCAGC	*
r003	16	ref	29	30	6H5M	*	0	0	TAGGC	*
r001	83	ref	37	30	9M	=	7	-39	CAGCGCCAT	*
x1	0	ref2	1	30	20M	*	0	0	aggttttataaaacaaataa	????????????????????
x2	0	ref2	2	30	21M	*	0	0	ggttttataaaacaaataatt	?????????????????????
x3	0	ref2	6	30	9M4I13M	*	0	0	ttataaaacAAATaattaagtctaca	??????????????????????????
x4	0	ref2	10	30	25M	*	0	0	CaaaTaattaagtctacagagcaac	?????????????????????????
x5	0	ref2	12	30	24M	*	0	0	aaTaattaagtctacagagcaact	????????????????????????
x6	0	ref2	14	30	23M	*	0	0	Taattaagtctacagagcaacta	???????????????????????

toy.sam is then processed this way to obtain the reads_sort.bam and target.bed files exploited by the reproducer:

cat read.sam | samtools view -Sb > reads.bam
samtools sort reads.bam > reads_sort.bam
samtools index reads_sort.bam reads_sort.bai
bedtools bamtobed -i reads_sort.bam > target.bed

The issue was initially described in details in Debian bug #1095434. I'm a bit unsure whether this problem does stem from sambamba, or from Debian's D compiler. In any case, I thought you might like being aware of the issue.

For information,
Étienne.

@emollier emollier added the bug label Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant