vcf invalid GT allele index

Hi GATK team,
I tried to subset vcf file using SelectVariants, but I got the error: The following invalid GT allele index was encountered in the file: "0. The subsetted vcf has only header lines and record header, no variant site records. I then used ValidateVariants to check the vcf file, and things seem ok. I am struggling to understand what's wrong with my vcf. Can you please help? :)

Best Answer

Answers

  • fengtaofengtao Member
    > @bhanuGandham said:
    > HI @fengtao
    >
    > Please post the exact command you are using, the entire error log and the version of gatk you are using.
    > Thank you.
    >
    > Regards
    > Bhanu

    Hi Bhanu,

    I am using gatk-4.0.8.1.

    My command is:
    gatk SelectVariants -R ~/hulianlian_2018.12/RcScaffold28543.fa --variant ~/hulianlian_2018.12/final.pass.SNP28543.t0304_26283to33795.vcf -O ~/hulianlian_2018.12/genotype_vcf/masaimala -sn masaimala

    The error message is:

    htsjdk.tribble.TribbleException$InternalCodecException: The following invalid GT allele index was encountered in the file: "0
    at htsjdk.variant.vcf.AbstractVCFCodec.oneAllele(AbstractVCFCodec.java:476)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseGenotypeAlleles(AbstractVCFCodec.java:500)
    at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:743)
    at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:132)
    at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
    at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148)
    at htsjdk.variant.variantcontext.GenotypesContext.iterator(GenotypesContext.java:465)
    at org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants.initalizeAlleleAnyploidIndicesCache(SelectVariants.java:624)
    at org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants.apply(SelectVariants.java:563)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:149)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:979)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:182)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:201)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    Thanks,
    Tao
  • fengtaofengtao Member
    > @bhanuGandham said:
    > HI @fengtao
    >
    > Please post the exact command you are using, the entire error log and the version of gatk you are using.
    > Thank you.
    >
    > Regards
    > Bhanu

    Hi Bhanu,

    I finally figured out the problems of my vcf. It relates to the incomplete header lines.

    Another question:

    I want to generate fasta sequences for all genotypes from a vcf file which contains more than 400 samples. I noticed a post in biostar doing the job , but it needs to run the process for each individual genotype, 400 times in my case. Are there any solutions to do this in a collective way?

    Best regards,
    Tao
  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    HI @fengtao

    Another user had a similar question. Please follow this thread for the suggested solution.
    https://gatkforums.broadinstitute.org/gatk/discussion/8035/vcf-to-fasta

Sign In or Register to comment.