Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

I do not seem to be able to run the HaplotypeCaller with the EMIT_ALL_CONFIDENT_SITES option.

I am using the HaplotypeCaller with GATK version 3.2-2. While it runs with the default EMIT_VARIANTS_ONLY option, it crashes when I try using the EMIT_ALL_CONFIDENT_SITES option. The stack trace is below.

Thanks

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 96
at org.broadinstitute.gatk.tools.walkers.annotator.BaseQualityRankSumTest.getElementForRead(BaseQualityRankSumTest.java:76)
at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.getElementForRead(RankSumTest.java:200)
at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.fillQualsFromLikelihoodMap(RankSumTest.java:179)
at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.annotate(RankSumTest.java:102)
at org.broadinstitute.gatk.tools.walkers.annotator.interfaces.InfoFieldAnnotation.annotate(InfoFieldAnnotation.java:49)
at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContextForActiveRegion(VariantAnnotatorEngine.java:217)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:265)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:941)
at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:218)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:273)
at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Can you please post the command line you used? Also, have you validated your data using Picard's ValidateSAMFile?

  • kanchonkanchon Member

    Hi Geraldine,

    Thank you for your previous speedy response, and apologies for my delay in getting back to you.

    I am still having problems with using the HaplotypeCaller with GATK version 3.2-2. While it runs with the default EMIT_VARIANTS_ONLY option, it crashes when I try using the EMIT_ALL_CONFIDENT_SITES option. Using the UnifiedGenotyper with EMIT_ALL_CONFIDENT_SITES is also fine. Please note that this error is different from the one I previously experienced.

    Thanks,
    Kanchon

    I have validated the bam file with the following command and there are no errors

    java -jar -Xmx2g ~/applications/picard-tools-1.89/ValidateSamFile.jar INPUT=realigned09-273_Hel_hec_Hel_9_CGATGT.bam OUTPUT=realigned09-273_Hel_hec_Hel_9_CGATGT.bam.vaidatesamfile MAX_OUTPUT=100000 > realigned09-273_Hel_hec_Hel_9_CGATGT.bam.fixmate.bam.vaidatesamfile.log

    UnifiedGenotyper with EMIT_ALL_CONFIDENT_SITES is fine:
    java -Xmx4g -jar ~/applications/GenomeAnalysisTK-3.2-2.jar -T UnifiedGenotyper -nt 30 -R /data/genome_references/Hmel1-1_primaryScaffolds_mtDNA.fasta -I realigned09-273_Hel_hec_Hel_9_CGATGT.bam -hets 0.01 -out_mode EMIT_ALL_CONFIDENT_SITES -o 09-273_hecale_UG.vcf > 09-273_hecale_UG.log 2>&1

    HaplotypeCaller with EMIT_VARIANTS_ONLY is fine:
    java -Xmx4g -jar ~/applications/GenomeAnalysisTK-3.2-2.jar -T HaplotypeCaller -R /data/genome_references/Hmel1-1_primaryScaffolds_mtDNA.fasta -nct 40 -I realigned09-273_Hel_hec_Hel_9_CGATGT.bam -hets 0.01 -indelHeterozygosity 0.01 -out_mode EMIT_VARIANTS_ONLY -o realigned09-273_Hel_hec_Hel_9_CGATGT.bam_HC.vcf > realigned09-273_Hel_hec_Hel_9_CGATGT.bam_HC.log 2>&1

    HaplotypeCaller with EMIT_ALL_CONFIDENT_SITES fails (see stack trace below):
    java -Xmx4g -jar ~/applications/GenomeAnalysisTK-3.2-2.jar -T HaplotypeCaller -R /data/genome_references/Hmel1-1_primaryScaffolds_mtDNA.fasta -I 09-273_Hel_hec_Hel_9_CGATGT.sorted.rmdup.fixmate.bam -hets 0.01 -indelHeterozygosity 0.01 -out_mode EMIT_ALL_CONFIDENT_SITES -o 09-273_Hel_hec_Hel_9_CGATGT.sorted.rmdup.fixmate.bam_HC.vcf > 09-273_Hel_hec_Hel_9_CGATGT.sorted.rmdup.fixmate.bam_HC.log 2>&1

    ERROR stack trace

    org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Somehow the requested coordinate is not covered by the read. Alignment 8063 | 54M1I6M11I1M28H
    at org.broadinstitute.gatk.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:573)
    at org.broadinstitute.gatk.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:429)
    at org.broadinstitute.gatk.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinateUpToEndOfRead(ReadUtils.java:425)
    at org.broadinstitute.gatk.tools.walkers.annotator.BaseQualityRankSumTest.getElementForRead(BaseQualityRankSumTest.java:76)
    at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.getElementForRead(RankSumTest.java:200)
    at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.fillQualsFromLikelihoodMap(RankSumTest.java:179)
    at org.broadinstitute.gatk.tools.walkers.annotator.RankSumTest.annotate(RankSumTest.java:102)
    at org.broadinstitute.gatk.tools.walkers.annotator.interfaces.InfoFieldAnnotation.annotate(InfoFieldAnnotation.java:49)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContextForActiveRegion(VariantAnnotatorEngine.java:217)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:265)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:941)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:218)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:273)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited September 2014

    @kanchon‌

    Hi Kanchon,

    Can you provide us with a snippet of the region that is causing the error? We would like to replicate it locally.

    Instructions on how to upload your files are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • kanchonkanchon Member

    I have uploaded the necessary file (kanchon.tar.gz). Thanks for your help.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @kanchon‌

    Hi Kanchon,

    I think this may just be an incompatibility of Haplotype Caller with EMIT_ALL_CONFIDENT_SITES and EMIT_ALL_SITES. I will have to confirm this with the developers.

    You can use either GVCF mode or BP_RESOLUTION mode to get all the sites. GVCF mode and BP_RESOLUTION mode are part of our reference confidence model which you can read about here: http://gatkforums.broadinstitute.org/discussion/4042/how-the-haplotypecallers-reference-confidence-model-works
    With these, you will be able to tell how confident the homozygous reference sites are by looking at the GQ and the PL fields.

    I hope this helps.

    -Sheila

  • kanchonkanchon Member

    Hi Sheila,

    Thank you for your response. I have now tried using GVCF mode with EMIT_ALL_CONFIDENT_SITES in the HaplotypeCaller and it looks like it works, which is great. However, the emitRefConfidence argument can only be used with single samples.

    My understanding is that the GATK recommendation is to run variant calling with multiple samples rather than on single samples. So my question now is whether the following is a valid procedure:
    1) run the HaplotypeCaller in EMIT_VARIANTS_ONLY mode with multiple samples.
    2) run the HaplotypeCaller with EMIT_ALL_CONFIDENT_SITES using the GVCF mode on each sample separately to obtain confidence values for the homozygous reference bases.
    3) combine the variant genotypes obtained with multiple samples from step 1 with the reference bases obtained from single samples in step 2.

    Thanks,
    Kanchon

Sign In or Register to comment.