You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running into an error that I initially wanted some help in resolving, but now, I just want to notify you about its existence. The error involves an issue in CombineGVCFs that prevents processing Multi Nucleotide Polymorphisms.
I am running a small set of 10 samples on a slurm-managed cluster. Previously, eight of these samples (Drosophila simulans collected in 2023) ran successfully. When we added two more samples (D. simulans from 1984 and 1976), the workflow failed. These two samples have been previously mapped by bwa-mem for coverage calculations, so I had no reason to think the data are bad.... now I know that there might be an issue with that 1976 sample (e.g., two flies in the gDNA).
The workflow runs smoothly until the DB2vcf step, when GenotypeGVCFs is run, then it errors out. The log and error files are attached.
GATK throws exception:
"org.broadinstitute.hellbender.exceptions.GATKException: Error creating iterator over file gendb://results/Dsim_wRi_wMel_genome/genomics_db_import/DB_L0071"
"Caused by: java.io.IOException: GenomicsDB JNI Error: VariantQueryProcessorException : Unhandled overlapping variants at columns 127228851 and 127228852 for row 0"
I found the reference position corresponding to columns 127228851 and 127228852 by looking at the offset column of the results/Dsim_wRi_wMel_genome/data/genome/Dsim_wRi_wMel_genome.fna.fai file:
NW_025416844.1 38922 127196239 80 81
NW_025416845.1 31969 127235789 80 81
127228851-127196239=> at position 32612
So, the potential MNP is in the back ~1/6 of NW_025416844.1.... I made some subsetted vcfs of the variants on this scaffold. There were only ~100-500 per sample. See the attached files.
I don't see any variants at position 32612 (the max is 28557), but there are MNPs in the w501 sample. Position 9446 is a MNP in itself, with the call: GTTT,GTTTTT,<NON_REF>,G,GTTTTTTT. There are different SNP vs deletion calls between samples at position 9490.
Honestly, sample w501 looks problematic. Scaffold NW_025416844.1 is unplaced, but the rest of the genome looks enriched in "ExcessHet=3.0103" calls. I'm going to get rid of this sample for now - it might consist of two flies and I need to track this down.
I still think this error is of concern to you because someone will come across MNPs sometime, especially in polyploid systems. If I end up sequencing enough w501 flies to recover these MNPs among diploid individuals, it will become a problem for me again.
The text was updated successfully, but these errors were encountered:
I am running into an error that I initially wanted some help in resolving, but now, I just want to notify you about its existence. The error involves an issue in CombineGVCFs that prevents processing Multi Nucleotide Polymorphisms.
I am running a small set of 10 samples on a slurm-managed cluster. Previously, eight of these samples (Drosophila simulans collected in 2023) ran successfully. When we added two more samples (D. simulans from 1984 and 1976), the workflow failed. These two samples have been previously mapped by bwa-mem for coverage calculations, so I had no reason to think the data are bad.... now I know that there might be an issue with that 1976 sample (e.g., two flies in the gDNA).
The workflow runs smoothly until the DB2vcf step, when GenotypeGVCFs is run, then it errors out. The log and error files are attached.
GATK throws exception:
"org.broadinstitute.hellbender.exceptions.GATKException: Error creating iterator over file gendb://results/Dsim_wRi_wMel_genome/genomics_db_import/DB_L0071"
"Caused by: java.io.IOException: GenomicsDB JNI Error: VariantQueryProcessorException : Unhandled overlapping variants at columns 127228851 and 127228852 for row 0"
0071.txt
5019711.log
I did a bit of research on this error and discovered that:
"CombineGVCFs will not work if your input file contains MNPs which would result in this type of error"
I found the reference position corresponding to columns 127228851 and 127228852 by looking at the offset column of the results/Dsim_wRi_wMel_genome/data/genome/Dsim_wRi_wMel_genome.fna.fai file:
NW_025416844.1 38922 127196239 80 81
NW_025416845.1 31969 127235789 80 81
127228851-127196239=> at position 32612
So, the potential MNP is in the back ~1/6 of NW_025416844.1.... I made some subsetted vcfs of the variants on this scaffold. There were only ~100-500 per sample. See the attached files.
Dsimulans_Mer23_wRi_line8A.NW_025416844.1.partial.vcf.txt
Dsimulans_w501.NW_025416844.1.partial.vcf.txt
Dsimulans_Riv84_wRi-3.NW_025416844.1.partial.vcf.txt
I don't see any variants at position 32612 (the max is 28557), but there are MNPs in the w501 sample. Position 9446 is a MNP in itself, with the call: GTTT,GTTTTT,<NON_REF>,G,GTTTTTTT. There are different SNP vs deletion calls between samples at position 9490.
Honestly, sample w501 looks problematic. Scaffold NW_025416844.1 is unplaced, but the rest of the genome looks enriched in "ExcessHet=3.0103" calls. I'm going to get rid of this sample for now - it might consist of two flies and I need to track this down.
I still think this error is of concern to you because someone will come across MNPs sometime, especially in polyploid systems. If I end up sequencing enough w501 flies to recover these MNPs among diploid individuals, it will become a problem for me again.
The text was updated successfully, but these errors were encountered: