UnifiedGenotyper error: Somehow the requested coordinate is not covered by the read.

jklejnotjklejnot San DiegoPosts: 1Member

Dear GATK Team,

I am receiving the following error while running GATK 1.6. Unfortunately, for project consistency I cannot update to a more recent version of GATK and would at least wish to understand the source of the error. I ran ValidateSamFile on the input bam files and they appear to be OK.

Any insight or advice would be greatly appreciated:

`##### ERROR ------------------------------------------------------------------------------------------

ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Too many deletions? at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:425) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:374) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:370) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinates(ReadClipper.java:445) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinatesRightTail(ReadClipper.java:176) at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:196) at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getLikelihoods(IndelGenotypeLikelihoodsCalculationModel.java:212) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:235) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:164) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:302) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:115) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:63) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:248) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 1.6-22-g3ec78bd):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
ERROR
ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Too many deletions?
ERROR ------------------------------------------------------------------------------------------`

Abbreviated commandline used:

GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -et NO_ET \
-R "Saccharomyces_cerevisiae/UCSC/sacCer2/Sequence/WholeGenomeFasta/genome.fa" \
-dcov 5000 -I "someFile.bam" --output_mode EMIT_ALL_SITES -gvcf -l OFF \
-stand_call_conf 1 -L chrIV:1-1531919

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Hi there,

    I think this type of error typically happens with soft-clipped reads, but it's difficult to pinpoint the error. You could try using -l DEBUG to get a sense of where the issue is occurring and try to exclude that region from your analysis.

    Depending on how far along you are in your project I wonder if you'd consider updating and re-calling the samples that have already been processed, perhaps?

    Geraldine Van der Auwera, PhD

  • davidbryantlowrydavidbryantlowry University of Texas at AustinPosts: 1Member

    I am getting a very similar error running the UnifiedGenotyper for a second time after recalibration with GATK version 2.7-2. The UnifiedGenotyper worked fine before recalibration, but here is an example of the error I get after recalibration.

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Alignment 2957177 | 8M1D87M55S at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:574) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:438) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:429) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinates(ReadClipper.java:529) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinatesRightTail(ReadClipper.java:196) at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeGeneralReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:259) at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeDiploidReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:215) at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getLikelihoods(IndelGenotypeLikelihoodsCalculationModel.java:150) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:331) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:232) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:367) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.7-2-g6bda569):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 2957177 | 8M1D87M55S
    ERROR ------------------------------------------------------------------------------------------

    I also updated to version 2.8-1 and get the following new error: "ERROR MESSAGE: Null alleles are not supported." I see that this is also an issue that is being resolved.

    Do you have any advice on how to proceed?

    Thank you,

    David

  • kmhernankmhernan Posts: 10Member

    Hello Geraldine et al... I am an old colleague of David's. He shared some of his problematic files and I figured out how to pass this problem. It seems to be some strange CIGAR strings. If you use these filters:

    -rf DuplicateRead -rf FailsVendorQualityCheck -rf NotPrimaryAlignment \
    -rf BadMate -rf MappingQualityUnavailable -rf UnmappedRead -rf BadCigar \

    it doesn't throw the exception. What is strange is that this wasn't thrown on the first pass of the UnifiedGenotyper. Also, you can't use the BadCigar filter with the MalformedRead filter? I received an exception about redundant options, namely --filter_mismatching_base_and_quals

    Best, Kyle

  • kmhernankmhernan Posts: 10Member

    ****UPDATE**** I only tried this with a subset of the BAM files David was using. They did give this error, and using the above filters helped; however, when David tried running on all samples, the same error came up. I will see if I can figure out something before everyone returns from holiday. Kyle

    @kmhernan said: Hello Geraldine et al... I am an old colleague of David's. He shared some of his problematic files and I figured out how to pass this problem. It seems to be some strange CIGAR strings. If you use these filters:

    -rf DuplicateRead -rf FailsVendorQualityCheck -rf NotPrimaryAlignment \
    -rf BadMate -rf MappingQualityUnavailable -rf UnmappedRead -rf BadCigar \

    it doesn't throw the exception. What is strange is that this wasn't thrown on the first pass of the UnifiedGenotyper. Also, you can't use the BadCigar filter with the MalformedRead filter? I received an exception about redundant options, namely --filter_mismatching_base_and_quals

    Best, Kyle

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Hi Kyle,

    Can you clarify whether the error persists when you turn on the filters on the full set of BAMs? Can you post the command line and exact error message? And please try this with version 2.8 in case the issue is resolved by changes in the latest release.

    What is strange is that this wasn't thrown on the first pass of the UnifiedGenotyper

    This may be just random luck that in the first pass, the bad reads were downsampled out and therefore not encountered during analysis.

    Also, you can't use the BadCigar filter with the MalformedRead filter? I received an exception about redundant options, namely --filter_mismatching_base_and_quals

    There's some minor conflict there between the filters due to how they are handled internally. I'll see if we can make this be handled more gracefully.

    Geraldine Van der Auwera, PhD

  • kmhernankmhernan Posts: 10Member

    Hello Geraldine,

    So I have tried lots of things. I have ran the samples individually and combined. When running individually, I do get this same seemingly random error, but I can simply run again and I will get all of the samples to run (probably the random chance as you mentioned with downsampling). When running multi-sample, I get errors on a subset of scaffold groups. I have tried running these error scaffold groups again with no luck.

    I am now running using 2.8.

    Here is the command: java -Xms2G -Xmx24G -Djava.io.tmpdir=/scratch/01832/kmhernan/chk_david/temporary/contig-bv -jar /home1/01832/kmhernan/bin/GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar -T UnifiedGenotyper -l DEBUG -L /work/01832/kmhernan/fh-reseq-jobs/gatk/UGT-Second/intervals/contig-bv.intervals -I files.list -R /scratch/01832/kmhernan/References/Phallii/assembly_v_0_5/Panicum_hallii.main_genome.scaffolds.fasta -rf DuplicateRead -rf FailsVendorQualityCheck -rf NotPrimaryAlignment -rf BadMate -rf MappingQualityUnavailable -rf UnmappedRead -rf BadCigar -glm BOTH -gt_mode DISCOVERY -nt 4 -nct 4 -maxAltAlleles 6 -stand_call_conf 20 -stand_emit_conf 20 -o /scratch/01832/kmhernan/chk_david/data/vcf_gatk/test_contig-bv.vcf >& /scratch/01832/kmhernan/chk_david/logs/contig-bv.log

    Here is the error I see:

    `##### ERROR ------------------------------------------------------------------------------------------

    ERROR stack trace

    org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Too many deletions? at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:489) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:438) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:429) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinates(ReadClipper.java:524) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinatesLeftTail(ReadClipper.java:179) DEBUG 11:55:48,447 SAMDataSource$SAMReaders - Processing file (27 of 41) /scratch/01832/kmhernan/chk_david/data/bam/PCR61_recal.bam... va:323) at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeDiploidReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:251) at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getLikelihoods(IndelGenotypeLikelihoodsCalculationModel.java:149) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:331) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:232) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:367) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Too many deletions?
    ERROR ------------------------------------------------------------------------------------------`

    This is one of 110 scaffold groups I'm running... about 1/2 of them are fine and complete without error... I have also tried to look at potential problem regions in IGV but didn't see anything useful. I was able to run the same BAM files through mpileup with no errors.

    A different error from another group of scaffolds from the same batch job: `##### ERROR ------------------------------------------------------------------------------------------

    ERROR stack trace

    java.lang.NegativeArraySizeException at org.broadinstitute.sting.utils.clipping.ClippingOp.hardClip(ClippingOp.java:377) at org.broadinstitute.sting.utils.clipping.ClippingOp.apply(ClippingOp.java:122) at org.broadinstitute.sting.utils.clipping.ReadClipper.clipRead(ReadClipper.java:156) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReadCoordinates(ReadClipper.java:213) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReadCoordinates(ReadClipper.java:216) at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeGeneralReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:319) at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeDiploidReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:251) at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getLikelihoods(IndelGenotypeLikelihoodsCalculationModel.java:149) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:331) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:232) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:367) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------`
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Hi @kmhernan,

    Hmm, this looks like a bug that we might already have fixed internally in our development version. Could you please try running with the latest nightly build version (see downloads page) and confirm whether the error persists? If it does we'll need a snippet of your data to debug locally. Sorry for the inconvenience.

    Geraldine Van der Auwera, PhD

  • kmhernankmhernan Posts: 10Member

    Hello @Geraldine_VdAuwera

    I just tried with this build: GenomeAnalysisTK-nightly-2014-01-06-g8829a0e

    I received the same errors. Shall I pull out the regions covered by my contig-bv.intervals file for all 41 samples? Let me know what would work best and how to upload it to your server.

    Kyle

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    The best thing to do is to isolate the issue to a small region and produce a snippet file with just that data, if you can. Full instructions for uploading are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

    Geraldine Van der Auwera, PhD

  • kmhernankmhernan Posts: 10Member

    @Geraldine_VdAuwera

    I was having trouble reproducing the error, as it worked sometimes and would sometimes give me different errors. I found one small region that repeatedly gave me the same "Null alleles not allowed" exception and uploaded the snippet et al. to the public server. The tar file is kmhernan_null_allele_error.tar.gz

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Thanks for the bug report; I was able to reproduce your bug, and found that it occurs specifically when calling indels. I've turned this over to the devs to debug in more detial; we'll let you know in this thread when we know more about what's going on.

    Geraldine Van der Auwera, PhD

  • kmhernankmhernan Posts: 10Member

    Thank you Geraldine!! I appreciate it.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin
    edited January 15

    Hi @kmhernan,

    The fix is in and will be available in the next nightly build. Let us know if this doesn't fix your problem.

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • kmhernankmhernan Posts: 10Member

    Hi @Geraldine_VdAuwera

    Thank you.

    I tested with this version: nightly-2014-01-20-g48f90ff

    It seemed to work better, but still not 100 percent.

    Here are the two kinds of error messages I get:

    Somehow the requested coordinate is not covered by the read. Alignment 552532 | 34S49M1D2M2I46M1I16M

    and

    Reference coordinate corresponds to a non-existent base in the read. This should never happen -- check read with alignment start: 821052 and cigar: 11S89M7S

    The first one being much more common than the second.

    I can try to find a small region to that will replicate the first error, but it seems to go away when I subset or happens at random? Thoughts?

    Kyle

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    It's possible that the problem reads are getting downsampled out by chance when you subset if there's a lot of coverage at the sites where the problems occur. Try to find exactly which reads it's complaining about (e.g. by grepping for the alignment start and cigar in the error message) then subset a few of them to individual files. You can use PrintReads with the ReadName filter to isolate individual reads.

    Geraldine Van der Auwera, PhD

  • ryanabashbashryanabashbash Texas A&M UniversityPosts: 9Member

    Hi @Geraldine_VdAuwera and @kmhernan,

    I'm running into a similar problem, although I'm having a difficult time reproducing it (version 2.8-1-g932cd3a).

    ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 56831916 | 30S21M1D3M1D35M1D

    UG frequently chokes with this error when processing the whole genome, but similar to @kmhernan, I can't reproduce the error with a subset of the region. There is a high-depth region within 60bp of the problem coordinate, but the problem coordinate itself only has a depth of 6. The CIGAR string, 30S21M1D3M1D35M1D, is associated with reads from the high-depth region that is 60bp away. This is RAD-seq data so the alignment of nearly every read in the high-depth region has the same alignment start point, 56831976, and has the same CIGAR string, 30S21M1D3M1D35M1D.

    Lastly, UG didn't choke on this dataset prior to BQSR (4 successes out of 4 tries). After BQSR, UG threw this error on 4 out 5 tries on the whole dataset and succeeded 4 out of 4 tries on a subset surrounding the region.

    I just wanted to add my observations since I can't make a reproducible test case.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Unfortunately we can't fix it without a test case...

    @ryanabashbash, are you saying that it works fine when you run UG on this data without doing BQSR, or just that GATK tools before BQSR work fine, then UG chokes on it?

    Geraldine Van der Auwera, PhD

  • ryanabashbashryanabashbash Texas A&M UniversityPosts: 9Member

    Hi @Geraldine_VdAuwera,

    I was referring to the former, that UG never issued any errors on the data prior to performing BQSR on them. That is, the non-recalibrated .bam file made it through the UG multiple times (just playing with UG parameters, etc.), whereas the recalibrated .bam file causes the UG to crash with the aforementioned error around 80% of the time (although not always in the same region). I'll report back if I can produce something more substantial than an anecdote.

  • mike_boursnellmike_boursnell Posts: 72Member

    Hi,

    I am using version 2.8-1-g932cd3a and I'm getting this ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 23882786 | 46S231M1I5M1D18M

    What is the latest update on this please?

    Thanks,

    Mike

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Hi @mike_boursnell, we currently haven't received a test case that reproduces this bug. If you can produce one we'd be happy to take a look.

    Geraldine Van der Auwera, PhD

  • mike_boursnellmike_boursnell Posts: 72Member

    What would you need? A small portion of a BAM file that does it? I'll upload one.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Yep, that'll do it, thanks. I believe you know the procedure...

    Geraldine Van der Auwera, PhD

  • Nicolas86Nicolas86 New York CityPosts: 3Member
    edited February 12

    Hi everybody,

    I have a comment a bit out of the blue but what if using –nt & -nct to 1 instead of 4.

    All my jobs used to run well and they started to randomly crash (at an approximate rate of 20/30%) after increasing –nt & -nct. I set them back to 1 and the issue doesn’t seem to happen anymore. Also, my jobs never crashed at the same alignment.

    I’m not sure what to think, these parameters are not in the command line of the very first post. I noticed Kyle’s command line has them.

    That is obviously not a fix and it doesn't seem to be the reason these jobs fail from time to time either but this, somehow, workaround works for me and allows me to “safely” run my jobs for now.

    I’m using gatk versions 2.3-9-gdcdccbb and 2.7-4-g6f46d11.

    Nicolas

    Post edited by Nicolas86 on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Hi @Nicolas86,

    Some people definitely experience non-deterministic issues with the multithreading modes; the issues can be due to hardware or operating system quirks, so unfortunately we have no way to test for that systematically. Ultimately if running single-threaded works for you, go for it.

    Geraldine Van der Auwera, PhD

  • asteele2asteele2 Posts: 1Member

    I have been encountering this error almost everytime I use the UnifiedGenotyper (v2.8-1), and needless to say I have been monitoring this thread almost daily. If you are in need of additional test cases for debugging I would be more than happy to provide some.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Hi @asteele2,

    Let me check if we got mike's data; I'll let you know if additional data would help. FYI you can enable email notifications for new posts if you don't want to have to monitor the thread directly. There's a FAQ on how to do it here.

    Geraldine Van der Auwera, PhD

  • ryanabashbashryanabashbash Texas A&M UniversityPosts: 9Member

    I take back my previous anecdote that it could be related to BQSR since I have non-BQSR'ed datasets that throw the error relatively frequently (although rarely at the same locus). Maybe it's associated with bams that have been merged? I previously merged bams for multiple samples with Picard and passed one large bam to the UG, and this frequently threw the error. Not merging the bams and simultaneously passing all of the individual bams to the UG has yet to throw the error.

    Looking at the reads from the loci that throw the error in the merged bam file doesn't show any obvious problems though. Is anyone else who is experiencing the error using bam files produced by merging 2 or more files?

  • mike_boursnellmike_boursnell Posts: 72Member

    Hi. I tried to make a smaller BAM file but it wouldn't reproduce the error. We get it on BAM files direct from bwa (mem)

    Is there anywhere I can put a 20GB BAM file for you to check?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Hi @mike_boursnell, unfortunately we can't deal with a 20Gb file right now.

    To be frank we are swamped with work for the immediate future and we can't make this heisenbug a priority. So unless one of you can provide us with an easily reproducible test case I'm afraid we're going to have to leave this in the fridge for now. Personally I'm hopeful that the new single-sample calling pipeline for joint analysis coming in version 3.0 is going to bypass the underlying issue.

    Geraldine Van der Auwera, PhD

  • mike_boursnellmike_boursnell Posts: 72Member

    Hi Geraldine - what's the largest file you can deal with. Small snippets don't seem to reproduce the error?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    @mike_boursnell‌, does the issue only happen with UnifiedGenotyper run on multiple samples? Because if so, I'm pretty sure the devs are going to decline to even try to fix it, at this point. We are actively phasing out UG and multisample calling in favor of the new HC reference model (GVCF) pipeline, so to us there is little point in investing effort into fixing issues in older tools that are going to be deprecated in the near future. It's a question of priorities --and some pressure to deliver on the commitments we made to the mothership and our funding overlords. I'm sorry if this is inconvenient but I really encourage you to try out the new pipeline. It is truly a game changer.

    Geraldine Van der Auwera, PhD

  • mike_boursnellmike_boursnell Posts: 72Member

    OK. Thanks. Can you point me to the latest instructions for how to do this new method please?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin
  • KStammKStamm Posts: 26Member

    Another vote for UG failing on soft-clipped reads. (version nightly-2014-03-18-g8d1a043)

      ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 207602637 | 3S98M
    

    I'm doing multi-sample WGS from bwa-mem and -nct4, so there's a dozen places the bug could come from. I've tried each suggestion in this thread, several different command lines, each one pushing the error out farther into the process. Now I can get to chr1:200MB before crashing out. At this point, with UG deprecating, it sounds like the best course is variant calling with some other tool.

    I guess I'll start samtools mpileup, then start reading the HaplotypeCaller howto docs, and see which of us finishes first.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin
    edited March 20

    Hi @KStamm‌, definitely try the new HC GVCF pipeline (as documented here). It's really easy and works wonders.

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • bioSGbioSG Posts: 18Member
    edited March 26

    I'm using GATK UnifiedGenotyper for multiple Ion Torrent samples. It seems to be working very slow even with 8 threads. I was getting a "Somehow the requested coordinate is not covered by the read." on chromosome 1. I've updated my GATK version to the latest stable release (3.1.1) and now I'm getting this error on chromosome 2.

    To try to debug this I'm analyzing samples separately with UnifiedGenotyper. But even with one single sample I'm getting this kind of error.

    INFO 05:01:12,454 ProgressMeter - chr2:38903048 2.23e+07 13.7 h 36.8 m 11.4% 5.0 d 4.4 d INFO 05:01:23,947 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Alignment 38972119 | 70S7M1I14M1D5M1D118M at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:573) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:429) at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinateUpToEndOfRead(ReadUtils.java:425) at org.broadinstitute.sting.gatk.walkers.annotator.BaseQualityRankSumTest.getElementForRead(BaseQualityRankSumTest.java:76) at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.getElementForRead(RankSumTest.java:200) at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.fillQualsFromLikelihoodMap(RankSumTest.java:179) at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.annotate(RankSumTest.java:102) at org.broadinstitute.sting.gatk.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:192) at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateGenotypes(UnifiedGenotyperEngine.java:557) at **...

    Thanks.

    Post edited by bioSG on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    @bioSG, if you can narrow down the issue to a small snippet of data and upload a bug report (see instructions in the FAQs n.1894) we can try to debug it, but I make no promises as this is not a priority area (sorry!).

    Geraldine Van der Auwera, PhD

  • KronholmKronholm EdinburghPosts: 1Member

    Hi,

    I thought that I'd post my own experience as well. I've had this same error occur when running Unified Genotyper (v3.1-1-g07a4bf8) on a large BAM file with multiple individuals merged. As noted above by Nicolas, in some cases this issue seems to related to multithreading. I managed to workaround this problem by disabling it (i.e. setting -nt and -nct flags to 1).

    In order to speed things up a collegue of mine suggested a kind of hack parallelization using GNU parallel. What I do is use the -L flag in GATK to split the analysis by chromosome and then run these jobs on separate cores using the GNU parallel program. This produces no errors and is reasonably fast.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    @Kronholm‌ That is a good approach. You can also use Queue (the companion program to GATK) to scatter-gather UG jobs on a cluster easily and efficiently. I'm not familiar with GNU parallel so I can't comment on how they compare though.

    Geraldine Van der Auwera, PhD

  • Nicolas86Nicolas86 New York CityPosts: 3Member

    I also scatter-gather with GNU parallel and it works pretty well (Queue is not an option for me right now). The –L parameter is the way to go if you’re only doing this for your Gatk steps. If you’re adding other tools in your pipeline that don’t have such option, you might need to literally split your file. There is a balance to find since some steps require the whole bam file and splitting/merging (or ‘cating’) also take a bit of time.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    Yep, just keep in mind that even some of the GATK tools won't appreciate data splitting -- some for methodological reasons (e.g. BaseRecalibrator needs to see all the data to be properly empowered) and others for technical reasons (e.g. HaplotypeCaller may have edge issues if you split naively within contigs without interval padding).

    Geraldine Van der Auwera, PhD

  • Nicolas86Nicolas86 New York CityPosts: 3Member

    Yes! I run BaseRecalibrator on the whole bam and when I split within chromosomes, I do it in the middle of gap regions. I also keep the whole contigs, the amount of data there does not worth the job overhead on the cluster, at least not for my needs. Thanks for the advice!

  • sabrinaesabrinae USAPosts: 1Member

    hi! i am haveing the same problem. Is there any solve for the problem "Somehow the requested coordinate is not covered by the read"

    thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,462Administrator, GATK Developer admin

    @sabrinae, are you running the latest version (3.1)?

    Geraldine Van der Auwera, PhD

  • cbackescbackes GermanyPosts: 1Member

    Hi, I am also having the same problem. I'm running the latest GATK version (3.1). I would also try to switch to the new HC, but the problem is I have haploid bacteria samples, and as I understood only the UG can call SNPs in genomes of different ploidy. Is there an alternative for haploid genomes or will you integrate this feature into HC in the near future? Thanks a lot.

  • SheilaSheila Broad InstitutePosts: 561Member, GATK Developer, Broadie, Moderator admin

    @cbackes

    Hi,

    We are working on Haplotype Caller to be able to accept different ploidies. Unified Genotyper is your best bet for now.

    -Sheila

Sign In or Register to comment.