CombineVariants REQUIRE_UNIQUE broke in GATK 2.3?

Hi team,

I'm noticing that things that I was able to do in the past (GATK 2.2) now cause errors in v2.3. Specifically, using CombineVariants to merge two VCF files (that each have only one sample in them.. distinct ones at that) now produces the error "ERROR MESSAGE: REQUIRE_UNIQUE sample names is true but duplicate names were discovered ". Looking through the input files by hand, I see no reason why CombineVariants should think that there are duplicate sample names... it's just not the case.

The merge worked fine for me in 2.2-16, and seems to have broken in 2.3-0 as well as the latest 2.3-9. Here is an example that works fine:

java -Xmx2g -Djava.io.tmpdir=/broad/hptmp/dpark -jar /humgen/gsa-hpprojects/GATK/bin/GenomeAnalysisTK-2.2-16-g9f648cb/GenomeAnalysisTK.jar -T CombineVariants -R /idi/sabeti-scratch/genomes/malaria/plasmodb/9.0/genome.fasta --variant /seq/plasmodium/dpark/pf-broad/vcf-indiv/9.0/MLW_2786.vcf.gz --variant /seq/plasmodium/dpark/pf-broad/vcf-indiv/9.0/MLW_2788.vcf.gz -o /broad/hptmp/dpark/fail-2.2-16.vcf --genotypemergeoption REQUIRE_UNIQUE

Here is an example that fails quickly with the above error (it's the same input files and arguments, just a different jar file):

java -Xmx2g -Djava.io.tmpdir=/broad/hptmp/dpark -jar /humgen/gsa-hpprojects/GATK/bin/GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T CombineVariants -R /idi/sabeti-scratch/genomes/malaria/plasmodb/9.0/genome.fasta --variant /seq/plasmodium/dpark/pf-broad/vcf-indiv/9.0/MLW_2786.vcf.gz --variant /seq/plasmodium/dpark/pf-broad/vcf-indiv/9.0/MLW_2788.vcf.gz -o /broad/hptmp/dpark/fail-2.3-9.vcf --genotypemergeoption REQUIRE_UNIQUE

Seems like a bug to me, but let me know if it appears otherwise..

Danny

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Danny, we did make some changes to the genotype merging code, so this could be a side effect bug. However the developer who made the changes said he can't replicate your bug with his own test files. Could you please upload snippets that would allow him to reproduce and track down the bug?

    Instructions here if needed: http://www.broadinstitute.org/gatk/guide/article?id=1894

  • dparkdpark Member

    Hi Geraldine,
    Your developer should be able to access all the files in the example command lines I gave above. In fact, you should be able to just run the example command lines verbatim. If not, let me know and I can put them somewhere else.
    Danny

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh right, didn't realize you were at Broad. Now that I look at your cmds it is pretty obvious of course. I'll claim the late hour...

    Anyway, I'll check back with him in the morning and we'll try to sort this out for you asap.

  • TiphaineTiphaine Member

    Do you have a solution for this issue ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I believe this is fixed since 2.4.

  • TiphaineTiphaine Member

    thanks, we are going to try that.

  • emossemoss Member

    I'm still getting this bug in 2.4-9. Using 2.2-9 seems to avoid the problem.

  • graefgraef Member

    Are there any news on this feature?

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalStateException: REQUIRE_UNIQUE sample names is true but duplicate names were discovered.
    at org.broadinstitute.sting.gatk.walkers.variantutils.CombineVariants.initialize(CombineVariants.java:215)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:84)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.4-9-g532efad):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: REQUIRE_UNIQUE sample names is true but duplicate names were discovered.
    ERROR ------------------------------------------------------------------------------------------
    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.3-9-ge5ebf34):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: REQUIRE_UNIQUE sample names is true but duplicate names were discovered B940001
    ERROR ------------------------------------------------------------------------------------------
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @graef, could you please upload snippet files that reproduce the error? Detailed instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

Sign In or Register to comment.