Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VCF liftover question

Dear GATK team,

My VCF was generated using GATK v3 and the hg19 reference. I'd like to compare it to the latest 1000G data, e.g. "1000G_phase3_v4_20130502.sites.vcf" from the b37 folder in the GATK ref bundle.
(Is this the right phase 3 data to use or do I need to download the original from the 1000G ftp site?)

According to this: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_LiftoverVariants.php
I'd like to use the LiftoverVariants function to liftover my VCF to b37 ref. Is this the right thing to do? If so, can you please tell me where I can file the required chain file "liftover_hg19_to_b37.txt"?

If not, could you please recommend the right tool to liftover a VCF? It looks like there is also picard's LiftoverVcf and gatk's liftOverVCF.pl

Many thanks! Look forward to hearing form you.

Jin

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @JinSzat
    Hi Jin,

    You can use either GATK's or Picard's tool for lifting over a VCF. The chain files are available in the bundle. https://www.broadinstitute.org/gatk/guide/article.php?id=1213
    https://www.broadinstitute.org/gatk/guide/article?id=1215

    -Sheila

  • JinSzatJinSzat Member

    Hi Sheila,

    Thank you. I made some progress and now have additional questions. Here is what I did:

    I did not see chain files within the bundle folders but I found them here ftp://ftp.broadinstitute.org/Liftover_Chain_Files/
    I ran the LiftoverVariants tool using hg19tob37.chain and my job completed with no error.

    Program Args: -T LiftoverVariants -R gatk_bundle/2.8/hg19//ucsc.hg19.fasta -V merged.dedup.realn.recal.rsid.PASS.vcf.gz -chain Liftover_Chain_Files/hg19tob37.chain -dict gatk_bundle/2.8/b37/human_g1k_v37.dict.gz -o merged.dedup.realn.recal.rsid.PASS.b37.vcf

    I compared the hg19 and lifted-over v37 versions of the VCF and found that (1) the header of the lifted-over file still has contig names in hg19 (is this correct?); (2) the positions were lifted-over fine (e.g. changing from chr1 to 1).

    chr1 98000073 rs190136297
    1 98000073 rs190136297

    When I proceeded to run FilterLiftedVariants, it failed with the error messages as shown below. Perhaps the problem is that the header in the lifted-over vcf still has hg19 contig names? How do I fix this error and complete liftover? Thanks so much. Look forward to hearing from you!

    INFO 11:36:33,060 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 11:36:33,065 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
    INFO 11:36:33,065 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 11:36:33,065 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 11:36:33,068 HelpFormatter - Program Args: -T FilterLiftedVariants -R gatk_bundle/2.8/hg19//ucsc.hg19.fasta -V merged.dedup.realn.recal.rsid.PASS.b37.vcf -o merged.dedup.realn.recal.rsid.PASS.b37.liftoverfiltered.vcf
    INFO 11:36:33,089 HelpFormatter - Executing as [email protected] on Linux 2.6.18-238.12.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_11-b12.
    INFO 11:36:33,089 HelpFormatter - Date/Time: 2015/09/08 11:36:33
    INFO 11:36:33,089 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 11:36:33,089 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 11:36:33,582 GenomeAnalysisEngine - Strictness is SILENT
    INFO 11:36:33,745 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 11:36:35,780 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.4-46-gbc02625):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: Input files merged.dedup.realn.recal.rsid.PASS.b37.vcf and reference have incompatible contigs: No overlapping contigs found.
    ERROR merged.dedup.realn.recal.rsid.PASS.b37.vcf contigs = [1]
    ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random

    , chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_
    gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, ch
    r19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl00022
    1, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl00023
    6, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]

    ERROR ------------------------------------------------------------------------------------------
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @JinSzat
    Hi Jin,

    Can you try with Picard's LiftoverVcf? http://broadinstitute.github.io/picard/command-line-overview.html#LiftoverVcf
    That should work. I think the error you are receiving is a known bug in GATK's tool.

    -Sheila

Sign In or Register to comment.