The current GATK version is 3.4-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# VariantsToVCF

edited July 2012

A new tool has been released!

Check out the documentation at VariantsToVCF.

Tagged:

• Posts: 21Member
edited August 2012

--variant / -V ( required RodBinding[Feature] )
Input variant file. Variants from this input file are used by this tool as input. --variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, BEAGLE, BED, BEDTABLE, EXAMPLEBINARY, GELITEXT, OLDDBSNP, RAWHAPMAP, REFSEQ, SAMPILEUP, SAMREAD, TABLE, VCF, VCF3

As per your above mentionned parameter- I can convert all of these formats ( BCF2, BEAGLE, BED, BEDTABLE, EXAMPLEBINARY, GELITEXT, OLDDBSNP, RAWHAPMAP, REFSEQ, SAMPILEUP, SAMREAD, TABLE, ) to VCF?

I have a SAM file with a list of variants and I want to convert it into VCF format. I used the following syntax-

java -jar GenomeAnalysisTK-latest/dist/GenomeAnalysisTK.jar -R indels_analysis/Mycobacterium_tuberculosis_H37Rv.fasta -T VariantsToVCF -o variantstovcf.vcf --variant:SAMREAD indels_analysis/tbaligned.sam


and received an error message saying ->

ERROR MESSAGE: We saw a record with a start of gi|57116681|ref|NC_000962.2|:4407222 after a record with a start of  gi|57116681|ref|NC_000962.2|:4410190, for input source: /home/ashu/puneet/Tuberculosis/indels_analysis/tbaligned.sam
##### ERROR


What does this error mean?

Post edited by Geraldine_VdAuwera on
• Posts: 21Member

forget the error, and if someone can only tell me if my gatk command and approach is right? Sam format variant file can be converted into vcf using this command?

Hi there,

What is a "Sam format variant file"? The SAMREAD codec is used to convert SAM/BAM records - but there are no variants associated with a SAM record. I think perhaps you will need to convert this file of yours to VCF manually.

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

• Posts: 37Member

Hi, I'd like to convert the dbSNP build 137(GRCm38/mm10) for use within GATK, so using the VariantsToVCF tool. I tried with this command line:
java -Xmx2g -jar GenomeAnalysisTK.jar -R gatk.ucsc.mm10.fa -T VariantsToVCF -o mm10snp137.vcf --variant:OLDDBSNP snp137.txt
which gave me an empty file so I'm guessing OLDDBSNP wasn't the right option to choose. The BED format looks like a straightforward option, I tried to check the link to see what further information needed to be incorporated to go from .bed to .vcf but the links are broken (e.g. http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_dbsnp_OldDbSNPCodec.html is a 404 error).
with regards.

Hi @Lavinia,

FYI, if you want to use dbsnp 137, you can download it in vcf format from our resource bundle.

As for OldDbSNPCodec, it is no longer documented on our website because we are no longer responsible for its development.

Geraldine Van der Auwera, PhD

• Posts: 37Member

Hi Geraldine,
Thanks for that, I hadn't realised that it was available there, thanks for your help.

• Posts: 37Member

Hi Geraldine, I don't think this is exactly what I am after, I'd like the mouse dbSNP data, on your ftp site: bundle/2.3 there are only the options of hg18 and hg19 and b36 and b37, neither of which contain (AFAIK) mouse data - can you help? thanks.

Oh, sorry about that, I didn't realize you wanted mouse dbsnp. We only have human resources. In that case you'll have to convert your own -- unless someone in the community volunteers info about where to find a mouse dbsnp that is ready to go. It mut exist somewhere since I know we have users working on mouse genomes. But if no one pipes up I will look up what is the usage you need to apply.

Geraldine Van der Auwera, PhD

• Posts: 37Member

Thanks Geraldine. I've got the 137 file from UCSC, but there were some errors converting it using vcfutils, so just looking at those now. I'm also downloading mgp.v2.snps.annot.reformat.vcf.gz from the Keane/Sanger Nature paper, so will look at using that. Is there anywhere within the GATK site/forum where I could post these resources for others to use? Thanks.

For now, if you could just post the links to where you obtained those files that would be very helpful. At some point we'll probably set up some articles that summarize where to get key resources for various non-human organisms. Thanks for your contribution!

Geraldine Van der Auwera, PhD

• Posts: 37Member

I'm using the mouse vcf data from ftp://ftp-mouse.sanger.ac.uk/current_snps/, which has VCF files for both SNPs and indels (last updated 5/2/2013), from this paper, PMID: 21921910 (with thanks to postings from PeteHaitch). Needs a bit of minor editing from 1,2,3 to chr1,chr2,chr3.

• Posts: 37Member
edited March 2013

Post edited by Geraldine_VdAuwera on
• HoustonPosts: 18Member

HI, Lavinia, could you suggest how to replace the first 1, 2, 3 of first line of each variant to chr1, chr2, chr3 since millions of 1,2 ,3 on each line? I am not very good at Linux usage.

@rzeng, be careful when you change the contig names in your file. The differences between genome builds are not limited to contig names -- sometimes there are differences in contig length and what the bases are in some places in the reference genome. It is not trivial to liftover your files to a different build of a genome. We have tools to do the liftover process safely but you will need to find the appropriate chain files. Have a look at our documentation here: http://www.broadinstitute.org/gatk/guide/article?id=63

Geraldine Van der Auwera, PhD

• Posts: 5Member

Hi,

The file should be reformatted to vcf file by GATK "VariantsToVCF", but I have a user error saying

"##### ERROR MESSAGE: Permitted to write any record upstream of position 12267983, but a record at 1:12190744 was just added"

The command I used is as following,

java -Xmx2g -jar GenomeAnalysisTK.jar
-T VariantsToVCF
-R human_g1k_v37.fasta (<- GRCH37 reference genome)
-o out.vcf
--variant:RawHapMap genotypes_chr22_CEU_r28_nr.b36_fwd.txt (<- hapmap raw data with "chr" removed from the chromosome column)
--dbsnp dbsnp_137.b37.vcf (<- from broad ftpsite)

where I stripped the string "chr" from the raw hapmap file to make it compatible with GRCH37 (human_g1k_v37.fasta).

thanks.

Hi @gensdei,

This sounds like the program is complaining that the variants are out of order. Have you tried with the hapmap file we provide in our bundle?

Geraldine Van der Auwera, PhD

• Posts: 5Member

Geraldine,

I've checked out "hapmap_3.3.b37.vcf" in the bundle.

Is it possible to extract those variants only for NA12891 by matching their rsID's with those in the original hapmap raw file ?

thanks.

• Posts: 5Member

Geraldine

please ignore the previous post of mine,

I want to make a vcf file only for variants from NA12891. That's why I don't use the hapmap vcf file in the bundle.

thanks.

@gensdei said:
Geraldine,

I've checked out "hapmap_3.3.b37.vcf" in the bundle.

Is it possible to extract those variants only for NA12891 by matching their rsID's with those in the original hapmap raw file ?

thanks.

• Posts: 12Member

For those looking for a toolkit to manipulate VCFs, i. e., rename chromosomes, sort, etc, have a look at jvarkit (https://github.com/lindenb/jvarkit). It's excellent.

Cheers,