Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Another GATK Unified Genotyper "Null alleles are not supported

Hi,
I saw a bug report in the Forum related to "Null alleles are not supported" error message, which was solved on Jan 2.
I just ran into this issue today, running Gatk 2.8.1 on a 2x250bp read set. I then tried to run a more recent nightly build of Gatk(vnightly-2014-01-13-ga785cf2), but the error still persists. The error is as follows:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: Null alleles are not supported
at org.broadinstitute.variant.variantcontext.Allele.(Allele.java:120)
at org.broadinstitute.sting.utils.haplotype.Haplotype.(Haplotype.java:61)
at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.trimHaplotypes(PairHMMIndelErrorModel.java:235)
at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeGeneralReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:438)
at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeDiploidReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:251)
at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getLikelihoods(IndelGenotypeLikelihoodsCalculationModel.java:149)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:331)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:232)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:367)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version nightly-2014-01-13-ga785cf2):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Null alleles are not supported
ERROR ------------------------------------------------------------------------------------------

I'm attaching a set of files to reproduce the error (command line, error log report and input files)
Any feedback would be appreciated.
Many thanks
Severine

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Severine,

    If I recall correctly we put in a fix for a second "Null alleles are not supported" bug on Jan 15, so you may have used a nightly build that was too old by a day or two. Can you please retry with the very latest and let me know if it still errors out?

  • severinecseverinec Member

    Hi Geraldine, Thanks for your quick answer, OK , I will it try it today and will let you know. Thank you.

  • severinecseverinec Member

    Hi Geraldine,
    I used the latest nightly build, GenomeAnalysisTK-nightly-2014-01-21-g48f90ff, and it ran to completion without the error. The content of the results however is surprising but that could very well be another root cause, which I will now investigate.
    many thanks
    Severine

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    OK, good to hear. Let me know if you have any reason to suspect that your "surprising results" are due to a program error. Good luck with your work!

  • severinecseverinec Member

    Hi Geraldine,
    I know it's been a few days but I still wanted to give you feedback on the root cause for the "surprising" results. they were due to an independent cause than the Gatk nightly build, so I can now report that the Gatk nightly build (gatk,nightly-2014-01-21-g48f90ff) successfully completed with good results (after I fixed the independent issue).
    many thanks
    Severine

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Fantastic, I'm glad to hear that! Thanks for reporting back, much appreciated.

  • ahwanpandeyahwanpandey Member

    Hi Geraldine and severinec.

    Is there any way to get the build that fixed the problem? I am still using GATK 2.8.1 and am not fully ready to migrate to GATK3. But I am having a hard time finding the fixed GATK2.8.1 version that severinec is talking about. Please help!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @ahwanpandey, we don't keep the binaries of the nightly builds beyond two weeks or so, sorry. Perhaps @severinec can send you her copy.

  • ahwanpandeyahwanpandey Member

    Thanks for the prompt response @Geraldine_VdAuwera‌ !

    Is it possible to get the source so I can build from it? @severinec‌ I would be really grateful if you could send me your binary (dropbox or email if possible?). My e-mail is [email protected]

    Thanks again.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The development source corresponding to the nightly builds is not public, but if you're feeling adventurous you can check out the source for 3.0, identify the commit that fixed the issue in 2.8, and patch your local copy.

  • ahwanpandeyahwanpandey Member

    Cloning and compiling this version fixed the issue: (https://github.com/broadgsa/gatk-protected/commits/master?page=10). Thanks for showing us the right direction.

  • severinecseverinec Member

    Hi,
    Sorry for the delay. I checked my files. I no longer have the nightly build.
    The fix is included in the latest release version, right?
    Severine

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The latest release includes the fix, yes. The problem was that @ahwanpandey did not want to migrate to the latest version yet. But it sounds like they were able to patch their version with the fix code, so it's all good now.

  • ahwanpandeyahwanpandey Member

    Hi @Geraldine_VdAuwera‌. I must have spoken too soon. I am getting a new error now. Is there a fix for this? I'm sure we will be migrating to GATK3 in the near future, but If we have no choice might have to get to it!

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Alignment 101677352 | 21S79M
    at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:582)
    at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:438)
    at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinateUpToEndOfRead(ReadUtils.java:434)
    at org.broadinstitute.sting.gatk.walkers.annotator.BaseQualityRankSumTest.getElementForRead(BaseQualityRankSumTest.java:76)
    at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.getElementForRead(RankSumTest.java:200)
    at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.fillQualsFromLikelihoodMap(RankSumTest.java:179)
    at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.annotate(RankSumTest.java:102)
    at org.broadinstitute.sting.gatk.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:192)
    at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateGenotypes(UnifiedGenotyperEngine.java:560)
    at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:234)
    at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:367)
    at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.8-79-ge2c2aa7):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 101677352 | 21S79M
    ERROR ------------------------------------------------------------------------------------------
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, we've seen this happen with edge cases, but we're probably not going to do any more work on UnifiedGenotyper. So if this is a blocking error, I would really recommend you just migrate and switch to HaplotypeCaller. It'll be worth your while.

  • ahwanpandeyahwanpandey Member

    I ahve a quick question for you @Geraldine_VdAuwera‌.

    Can I still use reduced bams with the new GATK3? I actually am running the new GATK3.1 with the cohort that I have which have all been reduced at it doesn't seem to be complaining. I was under the impression that GATK3 does not support reduced bams? Or is it just that GATK3 does not reduce bams but is compatible with the reduced bams of 2.8?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Last I heard we were going to put in a failsafe to make GATK 3.x error out on any reduced bams, so it sounds like that's not working properly. I'll ask @ebanks to confirm. Can you tell me what command you're running on those bams? But in any case you should not use HC in GVCF mode on your reduced bams.

  • ahwanpandeyahwanpandey Member

    The command I am using is:

    Unified

    java -Dhttp.proxyHost=proxy.swmed.edu -Dhttp.proxyPort=3128 $GATK_JAVA_OPTS -jar $GATK_HOME/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 8 -nct 3 -R ~/reference/Homo_sapiens/broadinstitute/2.8/b37/human_g1k_v37.fasta $SAMPLES -dcov 600 --dbsnp ~/reference/Homo_sapiens/broadinstitute/2.8/b37/dbsnp_138.b37.vcf -A VariantType -A Coverage -A AlleleBalance -A FisherStrand --genotype_likelihoods_model BOTH -stand_call_conf 30 -stand_emit_conf 10 --intervals ../regions.bed --interval_padding 100 -o CG.UNI.raw.snp_and_indels.vcf

    Haplotype

    java -Dhttp.proxyHost=proxy.swmed.edu -Dhttp.proxyPort=3128 $GATK_JAVA_OPTS -jar $GATK_HOME/GenomeAnalysisTK.jar -T HaplotypeCaller -nct 4 -R ~/reference/Homo_sapiens/broadinstitute/2.8/b37/human_g1k_v37.fasta $SAMPLES -dcov 600 --dbsnp ~/reference/Homo_sapiens/broadinstitute/2.8/b37/dbsnp_138.b37.vcf -A VariantType -A Coverage -A AlleleBalance -A FisherStrand -stand_call_conf 30 -stand_emit_conf 10 -minPruning 3 --intervals ../regions.bed --interval_padding 100 -o CG.HC.raw.snp_and_indels.vcf

    The $SAMPLES have been set as:
    -I sample1.gatk_ready.realigned.recalced.bam -I sample2.gatk_ready.realigned.recalced.bam -I sample2.gatk_ready.realigned.recalced.bam .....

    And by the way, even (The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21) gave me this error when using Unified Genotyper and reduced bam files (the reduced files were created using 2.8)

    INFO 15:59:18,526 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 15:59:19,510 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.98
    INFO 15:59:25,364 ProgressMeter - 10:13563373 2.39e+08 75.6 m 18.0 s 55.8% 2.3 h 59.7 m

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Alignment 13517089 | 10S90M
    at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:573)
    at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:429)
    at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinateUpToEndOfRead(ReadUtils.java:425)
    at org.broadinstitute.sting.gatk.walkers.annotator.BaseQualityRankSumTest.getElementForRead(BaseQualityRankSumTest.java:76)
    at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.getElementForRead(RankSumTest.java:200)
    at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.fillQualsFromLikelihoodMap(RankSumTest.java:179)
    at org.broadinstitute.sting.gatk.walkers.annotator.RankSumTest.annotate(RankSumTest.java:102)
    at org.broadinstitute.sting.gatk.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:192)
    at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateGenotypes(UnifiedGenotyperEngine.java:557)
    at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:234)
    at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:367)
    at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:143)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 13517089 | 10S90M
    ERROR ------------------------------------------------------------------------------------------

    GATK2.8 with the patch errored out on:

    INFO 12:38:58,396 ProgressMeter - 15:101760922 3.52e+08 2.7 h 28.0 s 81.8% 3.4 h 36.5 m

  • ahwanpandeyahwanpandey Member

    GATK3.1 is also letting me create gVCFs from reduced files:

    java -Dhttp.proxyHost=proxy.swmed.edu -Dhttp.proxyPort=3128 -Xmx14G -Djava.io.tmpdir=/scratch -jar $GATK_HOME/GenomeAnalysisTK.jar -T HaplotypeCaller -nct 4 -R ~/reference/Homo_sapiens/broadinstitute/2.8/b37/human_g1k_v37.fasta -I /share/solid0_seqcore/GATK_TRAINING_RESOURCE/Homo_sapiens/b37/variant_bams/1000G/reduced/"$i".gatk_ready.realigned.recalced.reduced.bam -dcov 600 --dbsnp ~/reference/Homo_sapiens/broadinstitute/2.8/b37/dbsnp_138.b37.vcf -A VariantType -A Coverage -A AlleleBalance -A FisherStrand -stand_call_conf 30 -stand_emit_conf 10 -minPruning 3 --intervals ~/reference/Homo_sapiens/broadinstitute/2.8/b37/beds/CCDS_Ensemble_GencodeV19_RefSeq_UCSC_Truseq_SureselectV4_Nextera.bed --interval_padding 10 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o "$i".raw.snp_and_indels.vcf

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    First off, please do not use those reduced bams anymore. You will get much better results running the new pipeline off of original bams.
    Second, do you know which version you used to create the reduced bams? The GATK should be checking for the @PG tag of ReduceReads in the header of the bam and failing if it sees it. Do your bams have such an entry? (You can check with 'samtools view -H my.bam')

  • ahwanpandeyahwanpandey Member

    Thank you @ebanks‌ for your reply. And by original bams, you mean the realigned recalibrated bams right?

    Here is the header:

    @PG ID:GATK ReduceReads ** VN:2.8-1-g932cd3a** CL:context_size=10 minimum_mapping_quality=20 minimum_base_quality_to_consider=15 minimum_tail_qualities=2 known_sites_for_polyploid_reduction=[] dont_simplify_reads=false dont_hardclip_adaptor_sequences=false dont_hardclip_low_qual_tails=false dont_use_softclipped_bases=false dont_compress_read_names=false hard_clip_to_interval=false minimum_alt_proportion_to_trigger_variant=0.05 minimum_alt_pvalue_to_trigger_variant=0.01 minimum_del_proportion_to_trigger_variant=0.05 downsample_coverage=250 cancer_mode=false nwayout=false debuglevel=0 debugread= downsample_strategy=Normal no_pg_tag=false

    It also works on a reduced bams created with 2.7

    @PG ID:GATK ReduceReads VN:2.7-2-g6bda569 CL:context_size=10 minimum_mapping_quality=20 minimum_base_quality_to_consider=15 minimum_tail_qualities=2 known_sites_for_polyploid_reduction=[] dont_simplify_reads=false dont_hardclip_adaptor_sequences=false dont_hardclip_low_qual_tails=false dont_use_softclipped_bases=false dont_compress_read_names=false hard_clip_to_interval=false minimum_alt_proportion_to_trigger_variant=0.05 minimum_alt_pvalue_to_trigger_variant=0.01 minimum_del_proportion_to_trigger_variant=0.05 downsample_coverage=250 cancer_mode=false nwayout=false debuglevel=0 debugread= downsample_strategy=Normal no_pg_tag=false


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    by original bams, you mean the realigned recalibrated bams right?

    Yes that's correct.

  • ahwanpandeyahwanpandey Member

    I tested a few scenarios using GATK3.1 and GATK2.7 (both on HaplotypeCaller mode) on a couple of true positive variants (verified by sanger)

    1. GATK2.7 + reduced reads called the variants correctly
    2. GATK3.1 + reduced reads missed the variants
    3. GATK3.1 + realigned recalibrated reads called the variants

    So obviously GATK3.1 seems to be having trouble with reduced reads during variant calling.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    So obviously GATK3.1 seems to be having trouble with reduced reads during variant calling.

    Well yes, we know this will happen because HaplotypeCaller in GATK 3.x is not capable of handling reduced reads correctly. Doing this is absolutely not supported. We will make this more obvious in the documentation.

  • ahwanpandeyahwanpandey Member

    Hi @Geraldine_VdAuwera‌. Sorry if I sounded overconfident which was not my intention at all. I just verified things for my own sake and possibly chose the wrong combination of words to express the results. I really do appreciate all the help from the team and look forward to finding new discoveries using this incredible tool!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No worries, sorry if I jumped on you a little brusquely. We just really want to make it crystal clear that people shouldn't do this because the results would be bad. Checking things for yourself is always a good thing to do :)

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Thanks for reporting this. I am patching the GATK now to always fail on reduced bams.

Sign In or Register to comment.