Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

converting hg19 annotations to b37 coordinates

laosaallaosaal Posts: 1Member
edited January 2013 in Ask the team

Hi, We have some annotation files, for example a GTF file of UCSC's "Known Genes" in hg19 coordinates. We'd like to convert this to b37 coordinates. What's the best way to go about doing this? Assistance would be appreciated! Thanks in advance, Lao

Post edited by Geraldine_VdAuwera on
Tagged:

Answers

  • ebanksebanks Posts: 671GSA Member mod

    If you can convert those files to VCF then you can use our liftover script (described on this forum). Otherwise, you won't be able to do this through the GATK.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • sej1985sej1985 Posts: 2Member
    edited November 2012

    I am trying to convert a vcf file of b36 build to hg19. First few line of my vcf file:

    ##fileformat=VCFv4.0
    ##INFO=<ID=Database,Number=1,Type=String,Description="Database identifier">
    ##INFO=<ID=Dbxref,Number=.,Type=String,Description="Database reference">
    ##INFO=<ID=dbID,Number=1,Type=String,Description="Database identifier">
    ##INFO=<ID=ID,Number=.,Type=String,Description="Chromosome or contig">
    ##INFO=<ID=Alias,Number=.,Type=String,Description="Mostly novel variant">
    ##INFO=<ID=Variant_seq,Number=.,Type=String,Description="Alternate Allele">
    ##INFO=<ID=Genotype,Number=.,Type=String,Description="Homozyguous or Heterozyguous">
    ##INFO=<ID=Variant_reads,Number=.,Type=Integer,Description="Number of reads where variant present">
    ##INFO=<ID=Total_reads,Number=.,Type=Integer,Description="Total number of reads">
    ##INFO=<ID=Reference_seq,Number=1,Type=String,Description="Ancestral allele">
    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO
    chr1    4793    .   A   G   25  .   ID=chr1:SoapSNP:SNV:4793;Alias=YHSNP0128643;Variant_seq=A,G;Reference_seq=A;Variant_reads=48,26;Total_reads=74;Genotype=heterozygous
    chr1    6434    .   G   A   48  .   ID=chr1:SoapSNP:SNV:6434;Alias=YHSNP0128644;Variant_seq=A,G;Reference_seq=G;Variant_reads=10,11;Total_reads=21;Genotype=heterozygous
    chr1    93896   rs4287120   T   C   51  .   ID=chr1:SoapSNP:SNV:93896;Dbxref=dbSNP:rs4287120;Variant_seq=C,T;Reference_seq=T;Variant_reads=5,4;Total_reads=9;Genotype=heterozygous
    chr1    225707  rs6603780   C   G   43  .   ID=chr1:SoapSNP:SNV:225707;Dbxref=dbSNP:rs6603780;Variant_seq=C,G;Reference_seq=C;Variant_reads=23,12;Total_reads=35;Genotype=heterozygous
    chr1    225839  rs6422503   C   A   31  .   ID=chr1:SoapSNP:SNV:225839;Dbxref=dbSNP:rs6422503;Variant_seq=A,C;Reference_seq=C;Variant_reads=13,5;Total_reads=18;Genotype=heterozygous
    chr1    526849  .   G   T   76  .   ID=chr1:SoapSNP:SNV:526849;Alias=YHSNP0128645;Variant_seq=G,T;Reference_seq=G;Variant_reads=14,12;Total_reads=26;Genotype=heterozygous
    chr1    554731  rs1832728   T   C   30  .   ID=chr1:SoapSNP:SNV:554731;Dbxref=dbSNP:rs1832728;Variant_seq=C,T;Reference_seq=T;Variant_reads=37,12;Total_reads=49;Genotype=heterozygous
    chr1    555353  rs7349153   T   C   28  .   ID=chr1:SoapSNP:SNV:555353;Dbxref=dbSNP:rs7349153;Variant_seq=C,T;Reference_seq=T;Variant_reads=37,9;Total_reads=46;Genotype=heterozygous
    chr1    555371  rs9283150   G   A   22  .   
    

    I have the vcf file validated using vcftools vcf-validator. But when I use the LiftOverVariants tool, it gives me the error: The providedVCF file is malformed at approximately line number 13: Trying to create a VariantContext with a ID key. Please use provided constructor argument ID.

    Please can someone tell me how to fix this? Thanks

    Post edited by Geraldine_VdAuwera on
  • Mark_DePristoMark_DePristo Posts: 153Administrator, GSA Member admin

    You cannot have an ID key in the INFO field.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • ebanksebanks Posts: 671GSA Member mod

    @sej1985 - Mark is correct that in GATK 2.2 "ID" is an invalid key for the INFO field of the VCF. However this restriction will be lifted for our 2.3 release whenever that comes out. Thanks for posting this.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Sign In or Register to comment.