The current GATK version is 3.2-2

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

# Empty file with VariantsToVCF

Posts: 33Member
edited April 2013

Hi,

I am trying to convert a dbSNP text file into a vcf file with the following command:
java -jar GenomeAnalysisTK.jar -R canFam2_dog_genome.fa -T VariantsToVCF --variant:OLDDBSNP snp131.txt -o canFam2.vcf

Below is the message:

INFO 13:27:28,085 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:27:28,088 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-9-g532efad, Compiled 2013/03/19 07:35:36
INFO 13:27:28,088 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 13:27:28,095 HelpFormatter - Program Args: -R canFam2_dog_genome.fa -T VariantsToVCF --variant:OLDDBSNP snp131.txt -o canFam2.vcf
INFO 13:27:28,095 HelpFormatter - Date/Time: 2013/04/25 13:27:28
INFO 13:27:28,095 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:27:28,095 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:27:29,123 GenomeAnalysisEngine - Strictness is SILENT
INFO 13:27:29,292 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 13:27:29,315 RMDTrackBuilder - Creating Tribble index in memory for file snp131.txt
INFO 13:28:05,113 IndexDictionaryUtils - Track /csc/lohi/dog_tools/GenomeAnalysisTK-2.4-9-g532efad/../../canFam3_ref_dogData/snp131.txt doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 13:28:05,125 RMDTrackBuilder - Writing Tribble index to disk for file GenomeAnalysisTK-2.4-9-g532efadsnp131.txt.idx
INFO 13:28:05,348 GenomeAnalysisEngine - Creating shard strategy for 0 BAM files
INFO 13:28:05,365 GenomeAnalysisEngine - Done creating shard strategy
INFO 13:28:05,365 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 13:28:05,367 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 13:28:05,875 ProgressMeter - done 0.00e+00 0.0 s 5.9 d 100.0% 0.0 s 0.0 s
INFO 13:28:05,876 ProgressMeter - Total runtime 0.51 secs, 0.01 min, 0.00 hours
INFO 13:28:07,459 GATKRunReport - Uploaded run statistics report to AWS S3

Everything looks normal and no error message shown. Could anyone help to fix this?

Thanks

Post edited by mehar on
Tagged:

I'm sorry, what is the problem exactly? Is the program not generating the output file?

Geraldine Van der Auwera, PhD

• Posts: 33Member

Yes, the output is an emtpy file.

Oh, just realized the problem is in the title. Then it seems that the program is not recognizing the data in your input file. Are you sure it's properly formatted? Where did that file originate from?

Geraldine Van der Auwera, PhD

• Posts: 33Member
• Posts: 33Member

The file is a text file and it is properly formatted.

I see. Can you try again with -l DEBUG? I'm not sure the program is actually loading in the file.

Geraldine Van der Auwera, PhD

• Posts: 33Member

I tried with debug option and it gave a very long standard output message. How do i send it to you if required?

You can either copy/paste it into a comment, or attach it as a text file using the "attach a file" function below the comment box.

Geraldine Van der Auwera, PhD

• Posts: 33Member

• Posts: 678GATK Developer mod

The problem is that you are telling the GATK to parse your file assuming it's formatted in the OLDDBSNP format, but your file is just a list of rsIDs and positions (which is not at all a format that dbSNP uses). Looks like you'll unfortunately need to write your own converter from this file to VCF.

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Just to make sure we're talking about the same file, @mehar, does yours start with the line

607 chr1    3004580 3004581 rs8419377   0   +   A   A   -/A genomic deletion    unknown 0   0   unknown exact   3       1   TIGR,   0               unknown


or with something else?

If it is something different, then my colleague @ebanks is correct. If it is the same, then the file is OK and the tool is not working properly.

Unfortunately in the latter case we are not going to be able to help you, because we cannot devote resources to fix this old tool, which is rarely used. We may have to retire the tool entirely. The best thing to do would be to tell the people who are providing these legacy formats that they should update their data into the current standard, which is VCF. We cannot continue supporting those old legacy formats indefinitely, it puts too much support burden on us and takes time away from development of new tools.

Geraldine Van der Auwera, PhD

• Posts: 33Member

Okay. My file starts with the above line, so i have to find some other way to get through.