We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

LiftOverVariants throws error on END tag from initial build

nroaknroak HoustonMember ✭✭
edited April 2016 in Ask the GATK team

I have Build36 GVCF that is generated by command:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -L $interval_build36 -R $build36--genotyping_mode DISCOVERY -ERC GVCF --some_options $sample.recal_reads.bam -o $sample.raw_variants.b36.g.vcf

and I'm trying to do LiftOverVariants to Build37 using command:
java -jar GenomeAnalysisTK.jar -T LiftoverVariants -R $build36 -V $sample.raw_variants.b36.g.vcf -chain b36tob37.chain -dict human_g1k_v37.dict -o $sample.raw_variants.b37.g.vcf

I'm getting an ERROR:

ERROR MESSAGE: Badly formed variant context at location 1:15870; getEnd() was 15904 but this VariantContext contains an END key with value 5767

The Variant giving this error is:
1 5733 . C <NON_REF> . . END=5739 GT:DP:GQ:MIN_DP:PL 0/0:2:3:2:0,3,45

When I checked on UCSC Browser LiftOver tool, I found out that error position 1:15870 is from build37.
And I thought this was counter-intuitive that END tag includes position from build36 but LiftOverVariants throws an error trying to match the build37 position.
Can you please help me out with a workaround?

I only found the post below related to this issue, but it wasn't very helpful

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Can you try using Picard's LiftoverVcf?


  • nroaknroak HoustonMember ✭✭

    I got the same error even with Picard tool. Sorry, I should have posted this in the original post.
    The Picard Command is:
    java -jar picard.jar LiftoverVcf R=$build37 I=$sample.raw_variants.b36.g.vcf O=$sample.raw_variants.b37.g.vcf CHAIN=b36tob37.chain REJECT=$sample.REJECT.vcf

    The Error:
    Exception in thread "main" htsjdk.tribble.TribbleException: Badly formed variant context at location 1:15870; getEnd() was 15870 but this VariantContext contains an END key with value 5733
    at htsjdk.variant.variantcontext.VariantContext.validateStop(VariantContext.java:1325)
    at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1298)
    at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:413)
    at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:496)
    at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:490)
    at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:276)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

  • nroaknroak HoustonMember ✭✭

    I see. I would like to explain my analysis before asking a way-out. I have 90% of my samples aligned to b37 while last 10% samples, obtained from public data submission, were previously aligned to b36. So I performed GATK Best Practices workflow on b37 samples right till the VQSR step. What is the best step to add b36 samples? I have generated GVCFs for b36 samples, and then tried to do a liftover. Since, you recommend against doing that, I went ahead and performed CombineGVCF on these 20 samples. This VCF file is in b36. I am able to do a liftover at this stage to b37, but then VQSR resources from hapmap and Omni (b37) are not compatible with contig lengths of this liftover vcf. Should I just perform VQSR with b36 versions of hapmap and Omni as well and then do a liftover to b37?

    ** I do realize that best case scenario would be to realign bams to b37, but I do not know how other 90% of my samples were aligned to b37 thus introducing inconsistency in parameters used.
    I sincerely apologize that you have to tackle answers against your recommendations.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh, hmm, I see. Frankly the only correct thing to do is reprocess the b36 samples from scratch, ie remap the original data (or data reverted from the bams you have if the fastqs are not available) to b37. Otherwise you risk the worst inconsistencies. The bwa command line you need should be in the header of the b37 bam files if they were processed with civilized tools that retained that information. If not you should just reprocess everything because otherwise you have no way of knowing what was done to the data, which is a bad situation to be in.

    By the way, the output of CombineGVCFs is still a GVCF, so technically it's also not supported. Not sure why it's working in one case but not the other unless you meant to say you ran GenotypeGVCFs to obtain a VCF. In which case I can't stress enough how important it is that you be able to describe your analysis workflow accurately. How you process your data determines what information you can derive from it and what limitations you should acknowledge in your downstream work.

  • nroaknroak HoustonMember ✭✭

    Apologies, I did mean GenotypeGVCF was used to obtain a VCF file. I wasn't aware that the bwa command is present in the bam header, and I will look into the downloaded bams to find that information. Meanwhile, I performed GenotypeGVCF followed by VQSR workflow to get high quality calls from build36 samples. Only then after a few initial annotations, I did a LiftOver. Given the different processing of these samples and your advice, I won't be merging these samples with my other b37 set.

Sign In or Register to comment.