Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

vcf invalid GT allele index

Hi GATK team,
I tried to subset vcf file using SelectVariants, but I got the error: The following invalid GT allele index was encountered in the file: "0. The subsetted vcf has only header lines and record header, no variant site records. I then used ValidateVariants to check the vcf file, and things seem ok. I am struggling to understand what's wrong with my vcf. Can you please help? :)

Best Answer

Answers

  • fengtaofengtao Member
    > @bhanuGandham said:
    > HI @fengtao
    >
    > Please post the exact command you are using, the entire error log and the version of gatk you are using.
    > Thank you.
    >
    > Regards
    > Bhanu

    Hi Bhanu,

    I am using gatk-4.0.8.1.

    My command is:
    gatk SelectVariants -R ~/hulianlian_2018.12/RcScaffold28543.fa --variant ~/hulianlian_2018.12/final.pass.SNP28543.t0304_26283to33795.vcf -O ~/hulianlian_2018.12/genotype_vcf/masaimala -sn masaimala

    The error message is:

    htsjdk.tribble.TribbleException$InternalCodecException: The following invalid GT allele index was encountered in the file: "0
    at htsjdk.variant.vcf.AbstractVCFCodec.oneAllele(AbstractVCFCodec.java:476)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseGenotypeAlleles(AbstractVCFCodec.java:500)
    at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:743)
    at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:132)
    at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
    at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148)
    at htsjdk.variant.variantcontext.GenotypesContext.iterator(GenotypesContext.java:465)
    at org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants.initalizeAlleleAnyploidIndicesCache(SelectVariants.java:624)
    at org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants.apply(SelectVariants.java:563)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:149)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:979)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:182)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:201)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    Thanks,
    Tao
  • fengtaofengtao Member
    > @bhanuGandham said:
    > HI @fengtao
    >
    > Please post the exact command you are using, the entire error log and the version of gatk you are using.
    > Thank you.
    >
    > Regards
    > Bhanu

    Hi Bhanu,

    I finally figured out the problems of my vcf. It relates to the incomplete header lines.

    Another question:

    I want to generate fasta sequences for all genotypes from a vcf file which contains more than 400 samples. I noticed a post in biostar doing the job , but it needs to run the process for each individual genotype, 400 times in my case. Are there any solutions to do this in a collective way?

    Best regards,
    Tao
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @fengtao

    Another user had a similar question. Please follow this thread for the suggested solution.
    https://gatkforums.broadinstitute.org/gatk/discussion/8035/vcf-to-fasta

Sign In or Register to comment.