Error with the HaplotypeCaller when genotyping given alleles

LaurentLaurent Posts: 38Member, GSA Collaborator
edited March 2013 in Ask the GATK team

Hi,

I have a number of indels that were called using different methods (GATK UG and PINDEL mainly). I am now attempting to genotype them using the GATK HaplotypeCaller and am running into the following error for some variants, e.g.:

1 768116 . AGTTTTGTTTTGTTTTGTTTT AGTTTTGTTTTGTTTTGTTTTGTTTT,A

The command line I use is:

java -Xmx2g -jar ~/tools/GenomeAnalysisTK-2.4-7-g5e89f01/GenomeAnalysisTK.jar  \
-T HaplotypeCaller \
--genotyping_mode GENOTYPE_GIVEN_ALLELES \
-R /target/gpfs2/gcc/resources/hg19/indices/human_g1k_v37.fa \
-minPruning 5 \
-I /target/gpfs2/gcc/groups/gonl/projects/trio-analysis/intermediate/BQSR2/A102.human_g1k_v37.trio_realigned.recal.bam \
-L 1:1-10000000 \
-L ~/jobs/trio-analysis/ug/comp/hc/test.mu.vcf \
-o ~/jobs/trio-analysis/ug/comp/hc/test.out.vcf \
-isr INTERSECTION \
-alleles ~/results/trio-analysis/ug/gonl.merged.ug_pindel_asm_clever.left.biallelic.20bp.N2.vcf 

Any help on how to overcome this would be much appreciated!

Below is the stack trace:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalStateException: BUG: GenomeLoc 1:768116-768116 has a size == 1 but the variation reference allele has length 21 this = [VC UG_call @ 1:768116 Q136.86 of type=INDEL alleles=[AGTTTTGTTTTGTTTTGTTTT*, A, AGTTTTGTTTTGTTTTGTTTTGTTTT] attr={} GT=[[A102a AGTTTTGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTTGTTTT GQ 6 PL 116,6,0,137,21,881],[A102b AGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTT GQ 0 PL 0,0,0,3,3,165],[A102c AGTTTTGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTTGTTTT GQ 3 PL 45,3,0,60,15,885]]
at org.broadinstitute.variant.variantcontext.VariantContext.validateStop(VariantContext.java:1192)
at org.broadinstitute.variant.variantcontext.VariantContext.validate(VariantContext.java:1159)
at org.broadinstitute.variant.variantcontext.VariantContext.(VariantContext.java:333)
at org.broadinstitute.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:478)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateGenotypes(UnifiedGenotyperEngine.java:479)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateGenotypes(UnifiedGenotyperEngine.java:366)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateGenotypes(UnifiedGenotyperEngine.java:356)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.GenotypingEngine.assignGenotypeLikelihoods(GenotypingEngine.java:235)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:500)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:132)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:552)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:512)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:244)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:69)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.4-3-g2a7af43):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: BUG: GenomeLoc 1:768116-768116 has a size == 1 but the variation reference allele has length 21 this = [VC UG_call @ 1:768116 Q136.86 of type=INDEL alleles=[AGTTTTGTTTTGTTTTGTTTT*, A, AGTTTTGTTTTGTTTTGTTTTGTTTT] attr={} GT=[[A102a AGTTTTGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTTGTTTT GQ 6 PL 116,6,0,137,21,881],[A102b AGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTT GQ 0 PL 0,0,0,3,3,165],[A102c AGTTTTGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTTGTTTT GQ 3 PL 45,3,0,60,15,885]]
ERROR ------------------------------------------------------------------------------------------

Edit: Excluding all multi-allelic sites, it seems like I am not getting this type of errors any more. However I am now getting another error on a bi-allelic site. I haven't checked which site it is as the error doesn't specify it but could look into it if you think it is valuable.

Cheers,
Laurent

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalStateException: Bad likelihoods detected: 0.009656527453500985
at org.broadinstitute.sting.utils.pairhmm.PairHMM.computeReadLikelihoodGivenHaplotypeLog10(PairHMM.java:130)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeReadLikelihoods(LikelihoodCalculationEngine.java:153)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.LikelihoodCalculationEngine.computeReadLikelihoods(LikelihoodCalculationEngine.java:118)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:493)
at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:132)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:552)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:512)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:244)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:69)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.4-7-g5e89f01):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Bad likelihoods detected: 0.009656527453500985
ERROR ------------------------------------------------------------------------------------------
Post edited by Laurent on
Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,365Administrator, GATK Developer admin

    Hi Laurent,

    Could you upload a bam snippet so we can reproduce the error locally? Instructions here:
    http://www.broadinstitute.org/gatk/guide/article?id=1894

    Geraldine Van der Auwera, PhD

  • LaurentLaurent Posts: 38Member, GSA Collaborator

    Hi Geraldine,

    Thanks for the quick answer. I've uploaded the 2 following files:
    2013_03_08.BUG_GenomeLoc_HC.tar.gz
    2013_03_08.ERROR_BadLikelihoods_HC.tar.gz

    They correspond to the 2 errors posted in this thread. Note that I managed to reproduce the first one on a bi-allelic site after all, so I thought I'd post it too.

    The first file uses 1 bam file containing 1 trio, the second one 1 bam file containing 248 families (trios,quartets) as I could not isolate the problematic one so far but could attempt to if necessary.

    Another thing I should have mentioned earlier is the pipeline these bam files were generated with as it is fairly unusual:

    1. BWA aln [per lane]
    2. BWA sampe [per lane]
    3. Picard MarkDuplicates [per lane]
    4. GATK Indel realignment using known indels using 1KG pilot [per sample]
    5. GATK BQSR v1 [per sample]
    6. GATK Indel realignment using known indels from 1KG Phase1 + from the reads [per family]
    7. GATK BQSR v2 [per family]

    Maybe the pipeline above is to blame for the errors I am encountering.

    Thanks a lot!
    Laurent

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,365Administrator, GATK Developer admin

    Hi Laurent, thanks for the files. Others have encountered the "Bad likelihoods" issue so it's probably not related to your pipeline. Not sure about the other one -- we'll take a look. FYI we're focusing on the "bad likelihoods" bug first since it seems to affect more people. I'll let you know when we have a fix.

    Geraldine Van der Auwera, PhD

  • LaurentLaurent Posts: 38Member, GSA Collaborator

    Sounds great, thanks a lot!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,365Administrator, GATK Developer admin

    Hi Laurent,

    FYI we have a theoretical fix for the "Bad likelihoods detected error" and are now working on implementing it.

    Geraldine Van der Auwera, PhD

  • LaurentLaurent Posts: 38Member, GSA Collaborator

    Great! Thanks for the update :)

  • LaurentLaurent Posts: 38Member, GSA Collaborator

    Regarding the BUG: GenomeLoc issue, I found out the problem: I had the "END" annotation with the wrong coordinates in my VCF for some positions. I am not sure where it happened in the pipeline but simply removing the annotation got rid of the problem. Maybe this will also be useful for others...

    Cheers,
    Laurent

  • bwubbbwubb Posts: 51Member

    Can you elaborate a bit more? You are saying that your end position number was incorrect?
    You had to manually change the position?

    I am running into a similar error when trying to merge GATK compatible (supposidly) vcf files. Thank you.

Sign In or Register to comment.