Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

GenomeLoc 11:69653434-69653483 has a size == 50 but the variation reference allele has length 51

talwar_jtalwar_j Member
edited January 25 in Ask the GATK team

Hello,

I am using GATK version 3.8.1 (3.8-1-0-gf15c1c3ef) and wanted to merge 4 vcf files using the CombineVariants command.
The command i am using is here:

java -jar GenomeAnalysisTK.jar \
-T CombineVariants -R genome.fa \
-nt 20 \
--variant a.vcf \
--variant b.vcf \
--variant c.vcf \
--variant d.vcf \
-o Combined.vcf \
-genotypeMergeOptions UNIQUIFY

I am using GRCh38 as a reference genome.

However after running for a while i get this error:

##### ERROR --
##### ERROR stack trace 
java.lang.IllegalStateException: BUG: GenomeLoc 11:69653434-69653483 has a size == 50 but the variation reference allele has length 51 this = [VC variant @ 11:69653434-69653483 Q. of type=MNP alleles=[ATGTGATCAATTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG*, TTGGGTTAATTTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG, TTTGTCTCAATTTTGACTTTATTCTTTTACCGCTCTTTTCCAAAAAGGGTA] attr={AC=0, AF=0.0, AN=2, HOMLEN=0, SVTYPE=RPL, set=variant} GT=[[TUMOR_pindel.variant ATGTGATCAATTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG*/ATGTGATCAATTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG*]]
        at htsjdk.variant.variantcontext.VariantContext.validateStop(VariantContext.java:1327)
        at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1294)
        at htsjdk.variant.variantcontext.VariantContext.<init>(VariantContext.java:401)
        at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:494)
        at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:488)
        at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.simpleMerge(GATKVariantContextUtils.java:1363)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:361)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:143)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: BUG: GenomeLoc 11:69653434-69653483 has a size == 50 but the variation reference allele has length 51 this = [VC variant @ 11:69653434-69653483 Q. of type=MNP alleles=[ATGTGATCAATTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG*, TTGGGTTAATTTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG, TTTGTCTCAATTTTGACTTTATTCTTTTACCGCTCTTTTCCAAAAAGGGTA] attr={AC=0, AF=0.0, AN=2, HOMLEN=0, SVTYPE=RPL, set=variant} GT=[[TUMOR_pindel.variant ATGTGATCAATTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG*/ATGTGATCAATTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG*]]
##### ERROR ------------------------------------------------------------------------------------------

when i look at the a.vcf, i see there is a line:

11  69653434    .   ATGTGATCAATTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG TTTGTCTCAATTTTGACTTTATTCTTTTACCGCTCTTTTCCAAAAAGGGTA .   PASS    END=69653483;HOMLEN=0;SVLEN=-51;SVTYPE=RPL;NTLEN=51 GT:AD   0/0:141,1

But i do not see anything wrong with this. Could you please help me here?
This is my a.vcf file:

##fileformat=VCFv4.0
##fileDate=20190101
##source=sampleA
##reference=GRCh38
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
##INFO=<ID=PF,Number=1,Type=Integer,Description="The number of samples carry the variant">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=NTLEN,Number=.,Type=Integer,Description="Number of bases inserted in place of deleted code">
##FORMAT=<ID=PL,Number=3,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Reference depth, how many reads support the reference">
##FORMAT=<ID=AD,Number=2,Type=Integer,Description="Allele depth, how many reads support this allele">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sampleA
10  43100402    .   TGTCCTTGAAGAAGCCTTATTCTCACCATCCCTCACTCACTTCCCTACTTCCCA  TTGGCCTTGAAGAAGCCTTTTTCTCACCACCCCTCACTCACTTTCCTTCTTTCCC .   PASS    END=43100455;HOMLEN=0;SVLEN=-53;SVTYPE=RPL;NTLEN=54 GT:AD   0/0:18,1
10  43129258    .   TCAACTTA    CAACCTAC    .   PASS    END=43129264;HOMLEN=0;SVLEN=-8;SVTYPE=RPL;NTLEN=8   GT:AD   0/0:41,2
10  87958071    .   AAGCTATATTTTATTTTATGACATGTA AGCCTTTTTTTTATTTTATGACAGTT  .   PASS    END=87958097;HOMLEN=0;SVLEN=-26;SVTYPE=RPL;NTLEN=25 GT:AD   0/0:73,1
11  69653433    .   TA  T   .   PASS    END=69653434;HOMLEN=0;SVLEN=-1;SVTYPE=DEL   GT:AD   0/0:96,1
11  69653434    .   ATGTGATCAA  TTGGGTTAAT  .   PASS    END=69653442;HOMLEN=0;SVLEN=-10;SVTYPE=RPL;NTLEN=10 GT:AD   0/0:106,2
11  69653434    .   ATGTGATCAATTTTGACTTAATGTGATTACTGCTCTATTCCAAAAAGGTTG TTTGTCTCAATTTTGACTTTATTCTTTTACCGCTCTTTTCCAAAAAGGGTA .   PASS    END=69653483;HOMLEN=0;SVLEN=-51;SVTYPE=RPL;NTLEN=51 GT:AD   0/0:141,1
11  69653438    .   GA  G   .   PASS    END=69653439;HOMLEN=0;SVLEN=-1;SVTYPE=DEL   GT:AD   0/0:102,1
11  69653550    .   T   TGGCGGGCAGACACGCGGGCGCGATCCCACACAGGCTGGCGGGGGGCGGGCCCCCGGGCGCC  .   PASS    END=69653550;HOMLEN=44;HOMSEQ=GGCGGGCAGACACGCGGGCGCGATCCCACACAGGCTGGCGGGGG;SVLEN=61;SVTYPE=INS  GT:AD   0/0:110,1
11  69653562    .   A   ACGCGGGCGCGATCCCACACAGGCTGGCGGGGGGCGGGGCCCCCGGCCC   .   PASS    END=69653562;HOMLEN=32;HOMSEQ=CGCGGGCGCGATCCCACACAGGCTGGCGGGGG;SVLEN=48;SVTYPE=INS  GT:AD   0/0:100,1

Any help will be appreciated. Thank you.

Tagged:

Answers

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @talwar_j

    Nothing jumps out at me as an obvious error, but here are a few thoughts.

    1.) How were the VCF generated, were all 4 generated in the same way?
    2.) The VCF version is on the latest available, but I am not sure that would cause a problem in GATK 3.8, but it is something to check - are all of your VCF the same version?
    3.) To validate the VCF files, there is a tool you can use the GATK ValidateVariant tool described here

    There are some recommendations on how to merge VCF files at the bottom of this page

  • talwar_jtalwar_j Member

    @AdelaideR

    1. The VCF files are generated by different sources. so they are not the same.
    2. However, three of the files are VCF Version 4.2 and one (that is causing the problem) is v4.0. Could that be having an effect ?
    3. i did not try that yet..

    I do see however that when i delete the line that is causing the error from the vcf file, the error completely disappears and it works fine.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @talwar_j

    I am glad that simple fix helped.

    It may yield better results downstream if you have access to the unmapped bam files so you can regenerate the VCF files so they are absolutely consistent. It will provide more robust variant calls over the sample set in general.

Sign In or Register to comment.