Empty file with VariantsToVCF

meharmehar Posts: 81Member
edited April 2013 in Ask the GATK team

Hi,

I am trying to convert a dbSNP text file into a vcf file with the following command:
java -jar GenomeAnalysisTK.jar -R canFam2_dog_genome.fa -T VariantsToVCF --variant:OLDDBSNP snp131.txt -o canFam2.vcf

Below is the message:

INFO 13:27:28,085 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:27:28,088 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-9-g532efad, Compiled 2013/03/19 07:35:36
INFO 13:27:28,088 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 13:27:28,088 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 13:27:28,095 HelpFormatter - Program Args: -R canFam2_dog_genome.fa -T VariantsToVCF --variant:OLDDBSNP snp131.txt -o canFam2.vcf
INFO 13:27:28,095 HelpFormatter - Date/Time: 2013/04/25 13:27:28
INFO 13:27:28,095 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:27:28,095 HelpFormatter - --------------------------------------------------------------------------------
INFO 13:27:29,123 GenomeAnalysisEngine - Strictness is SILENT
INFO 13:27:29,292 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 13:27:29,315 RMDTrackBuilder - Creating Tribble index in memory for file snp131.txt
INFO 13:28:05,113 IndexDictionaryUtils - Track /csc/lohi/dog_tools/GenomeAnalysisTK-2.4-9-g532efad/../../canFam3_ref_dogData/snp131.txt doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 13:28:05,125 RMDTrackBuilder - Writing Tribble index to disk for file GenomeAnalysisTK-2.4-9-g532efadsnp131.txt.idx
INFO 13:28:05,348 GenomeAnalysisEngine - Creating shard strategy for 0 BAM files
INFO 13:28:05,365 GenomeAnalysisEngine - Done creating shard strategy
INFO 13:28:05,365 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 13:28:05,367 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 13:28:05,875 ProgressMeter - done 0.00e+00 0.0 s 5.9 d 100.0% 0.0 s 0.0 s
INFO 13:28:05,876 ProgressMeter - Total runtime 0.51 secs, 0.01 min, 0.00 hours
INFO 13:28:07,459 GATKRunReport - Uploaded run statistics report to AWS S3

Everything looks normal and no error message shown. Could anyone help to fix this?

Thanks

Post edited by mehar on
Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,161Administrator, GATK Dev admin

    I'm sorry, what is the problem exactly? Is the program not generating the output file?

    Geraldine Van der Auwera, PhD

  • meharmehar Posts: 81Member

    Yes, the output is an emtpy file.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,161Administrator, GATK Dev admin

    Oh, just realized the problem is in the title. Then it seems that the program is not recognizing the data in your input file. Are you sure it's properly formatted? Where did that file originate from?

    Geraldine Van der Auwera, PhD

  • meharmehar Posts: 81Member
  • meharmehar Posts: 81Member

    The file is a text file and it is properly formatted.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,161Administrator, GATK Dev admin

    I see. Can you try again with -l DEBUG? I'm not sure the program is actually loading in the file.

    Geraldine Van der Auwera, PhD

  • meharmehar Posts: 81Member

    I tried with debug option and it gave a very long standard output message. How do i send it to you if required?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,161Administrator, GATK Dev admin

    You can either copy/paste it into a comment, or attach it as a text file using the "attach a file" function below the comment box.

    Geraldine Van der Auwera, PhD

  • meharmehar Posts: 81Member

    please find the attachment

    txt
    txt
    gatk.txt
    259K
  • ebanksebanks Broad InstitutePosts: 689Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin

    The problem is that you are telling the GATK to parse your file assuming it's formatted in the OLDDBSNP format, but your file is just a list of rsIDs and positions (which is not at all a format that dbSNP uses). Looks like you'll unfortunately need to write your own converter from this file to VCF.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,161Administrator, GATK Dev admin

    Just to make sure we're talking about the same file, @mehar, does yours start with the line

    607 chr1    3004580 3004581 rs8419377   0   +   A   A   -/A genomic deletion    unknown 0   0   unknown exact   3       1   TIGR,   0               unknown
    

    or with something else?

    If it is something different, then my colleague @ebanks is correct. If it is the same, then the file is OK and the tool is not working properly.

    Unfortunately in the latter case we are not going to be able to help you, because we cannot devote resources to fix this old tool, which is rarely used. We may have to retire the tool entirely. The best thing to do would be to tell the people who are providing these legacy formats that they should update their data into the current standard, which is VCF. We cannot continue supporting those old legacy formats indefinitely, it puts too much support burden on us and takes time away from development of new tools.

    Geraldine Van der Auwera, PhD

  • meharmehar Posts: 81Member

    Okay. My file starts with the above line, so i have to find some other way to get through.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,161Administrator, GATK Dev admin

    Good luck, sorry we couldn't be of more help.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.