The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
Register now for the upcoming GATK Best Practices workshop, Nov 7-8 at the Broad in Cambridge, MA. Open to all comers! More info and signup at

converting hg19 annotations to b37 coordinates

laosaallaosaal Posts: 1Member
edited January 2013 in Ask the GATK team

We have some annotation files, for example a GTF file of UCSC's "Known Genes" in hg19 coordinates. We'd like to convert this to b37 coordinates. What's the best way to go about doing this? Assistance would be appreciated!
Thanks in advance,

Post edited by Geraldine_VdAuwera on


  • ebanksebanks Broad InstitutePosts: 698Member, Administrator, Broadie, Moderator, Dev admin

    If you can convert those files to VCF then you can use our liftover script (described on this forum). Otherwise, you won't be able to do this through the GATK.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • sej1985sej1985 Posts: 2Member
    edited November 2012

    I am trying to convert a vcf file of b36 build to hg19.
    First few line of my vcf file:

    ##INFO=<ID=Database,Number=1,Type=String,Description="Database identifier">
    ##INFO=<ID=Dbxref,Number=.,Type=String,Description="Database reference">
    ##INFO=<ID=dbID,Number=1,Type=String,Description="Database identifier">
    ##INFO=<ID=ID,Number=.,Type=String,Description="Chromosome or contig">
    ##INFO=<ID=Alias,Number=.,Type=String,Description="Mostly novel variant">
    ##INFO=<ID=Variant_seq,Number=.,Type=String,Description="Alternate Allele">
    ##INFO=<ID=Genotype,Number=.,Type=String,Description="Homozyguous or Heterozyguous">
    ##INFO=<ID=Variant_reads,Number=.,Type=Integer,Description="Number of reads where variant present">
    ##INFO=<ID=Total_reads,Number=.,Type=Integer,Description="Total number of reads">
    ##INFO=<ID=Reference_seq,Number=1,Type=String,Description="Ancestral allele">
    chr1    4793    .   A   G   25  .   ID=chr1:SoapSNP:SNV:4793;Alias=YHSNP0128643;Variant_seq=A,G;Reference_seq=A;Variant_reads=48,26;Total_reads=74;Genotype=heterozygous
    chr1    6434    .   G   A   48  .   ID=chr1:SoapSNP:SNV:6434;Alias=YHSNP0128644;Variant_seq=A,G;Reference_seq=G;Variant_reads=10,11;Total_reads=21;Genotype=heterozygous
    chr1    93896   rs4287120   T   C   51  .   ID=chr1:SoapSNP:SNV:93896;Dbxref=dbSNP:rs4287120;Variant_seq=C,T;Reference_seq=T;Variant_reads=5,4;Total_reads=9;Genotype=heterozygous
    chr1    225707  rs6603780   C   G   43  .   ID=chr1:SoapSNP:SNV:225707;Dbxref=dbSNP:rs6603780;Variant_seq=C,G;Reference_seq=C;Variant_reads=23,12;Total_reads=35;Genotype=heterozygous
    chr1    225839  rs6422503   C   A   31  .   ID=chr1:SoapSNP:SNV:225839;Dbxref=dbSNP:rs6422503;Variant_seq=A,C;Reference_seq=C;Variant_reads=13,5;Total_reads=18;Genotype=heterozygous
    chr1    526849  .   G   T   76  .   ID=chr1:SoapSNP:SNV:526849;Alias=YHSNP0128645;Variant_seq=G,T;Reference_seq=G;Variant_reads=14,12;Total_reads=26;Genotype=heterozygous
    chr1    554731  rs1832728   T   C   30  .   ID=chr1:SoapSNP:SNV:554731;Dbxref=dbSNP:rs1832728;Variant_seq=C,T;Reference_seq=T;Variant_reads=37,12;Total_reads=49;Genotype=heterozygous
    chr1    555353  rs7349153   T   C   28  .   ID=chr1:SoapSNP:SNV:555353;Dbxref=dbSNP:rs7349153;Variant_seq=C,T;Reference_seq=T;Variant_reads=37,9;Total_reads=46;Genotype=heterozygous
    chr1    555371  rs9283150   G   A   22  .   

    I have the vcf file validated using vcftools vcf-validator. But when I use the LiftOverVariants tool, it gives me the error:
    The providedVCF file is malformed at approximately line number 13: Trying to create a VariantContext with a ID key. Please use provided constructor argument ID.

    Please can someone tell me how to fix this?

    Post edited by Geraldine_VdAuwera on
  • Mark_DePristoMark_DePristo Posts: 153Administrator, Dev admin

    You cannot have an ID key in the INFO field.

    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • ebanksebanks Broad InstitutePosts: 698Member, Administrator, Broadie, Moderator, Dev admin

    @sej1985 - Mark is correct that in GATK 2.2 "ID" is an invalid key for the INFO field of the VCF. However this restriction will be lifted for our 2.3 release whenever that comes out. Thanks for posting this.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

Sign In or Register to comment.