If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Will you make the allele-specific HC annotations adhere to the VCF standard?
I am having an issue related to the allele-specific annotations added during the cohort germline calling pipeline (HaplotypeCaller -> CombineGVCFs -> GenotypeGVCFs -> VQSR) with the option "-G AS_StandardAnnotation".
Specifically I have to remove variants detected in too few samples (according to some arbitrary threshold) from CombineGVCFs results, before passing them on to GenotypeGVCFs. In some multiallelic sites there might be only one of the variants that needs to be removed, so I go through the INFO and FORMAT tags that have [ARG] number of values, and remove the values specific to the targeted alternate allele. The issue is that, to my understanding of the VCF standard, the allele-specific annotations added by "-G AS_StandardAnnotation" (prefixed "AS") do not adhere to it. They obviously contain multiple values, even though they are listed with "Number=1" in the file header, and they seem to use '|' as a separator in addition to ','. This makes it impossible for me to correctly trim these annotations when removing variants, and by not doing so, it seems to make the following GenotypeGVCFs step crash, with a failure relating to one of the allele-specific annotations ("AS_StrandBiasTest"), relevant partial log output pasted below.
java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at java.util.Collections$UnmodifiableList.get(Collections.java:1309) at org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.AS_StrandBiasTest.parseRawDataString(AS_StrandBiasTest.java:183) at org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.AS_StrandBiasTest.combineRawData(AS_StrandBiasTest.java:122) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.combineAnnotations(VariantAnnotatorEngine.java:304) at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.mergeAttributes(ReferenceConfidenceVariantContextMerger.java:267) at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:101) at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:200) at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:159) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:201) at org.broadinstitute.hellbender.Main.main(Main.java:287)
So I am just wandering if you either have a plan to change the formatting of the "AS"-tags, or if you could please tell me how to correctly interpret them for the purpose of trimming away allele-specific data? This would be greatly appreciated.
GATK version 220.127.116.11.