We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

CombineVariant doesn't work with GATK version 2.6.4 but works with GATK version 2.2.9

Hi,

I don't understand but when I run CombineVariant with GATK version 2.3.9 or 2.6.4, I have a error message but not when I run via GATK 2.2.9
Before running CombineVariant all sub-VCF files, I split my VCF per region and run ReadBackedPhasing on each sub-VCF file. So each sub-VCF file has the same number and name of samples.

this is my error message :

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalStateException: REQUIRE_UNIQUE sample names is true but duplicate names were discovered.
at org.broadinstitute.sting.gatk.walkers.variantutils.CombineVariants.initialize(CombineVariants.java:215)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.6-4-g3e5ff60):
ERROR
ERROR Please check the documentation guide to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: REQUIRE_UNIQUE sample names is true but duplicate names were discovered.
ERROR ------------------------------------------------------------------------------------------

Could you help me ?

Thanks,

Tiphaine

Best Answer

Answers

  • Nick_BeesonNick_Beeson Member

    You have the CombineVariants argument --genotypemergeoption set to REQUIRE_UNIQUE, which produces the result you posted. Here is the documentation.

    --genotypemergeoption / -genotypeMergeOptions ( GenotypeMergeType )

    Determines how we should merge genotype records for samples shared across the ROD files.
    The --genotypemergeoption argument is an enumerated type (GenotypeMergeType), which can have one of the following values:

    UNIQUIFY
    Make all sample genotypes unique by file. Each sample shared across RODs gets named sample.ROD.
    PRIORITIZE
    Take genotypes in priority order (see the priority argument).
    UNSORTED
    Take the genotypes in any order.
    REQUIRE_UNIQUE
    Require that all samples/genotypes be unique between all inputs.

  • TiphaineTiphaine Member

    So in this case, do I need to use UNIQUIFY and not REQUIRE_UNIQUE?
    I am just little suprise that the same command line doesn't work between 2 differents versions.
    I am going to try this option.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @Tiphaine,

    The merging function was a little buggy in versions older than 2.3, so when we fixed it in 2.3 the behavior changed to work properly. That's why you don't get exactly the same results between versions.

    When you run into problems like that, where you're getting different results (or an error message) between two versions, you should look at the version changes, which are listed in the Guide here: http://www.broadinstitute.org/gatk/guide/version-history

  • TiphaineTiphaine Member
    edited July 2013

    Hi Geraldine,
    I read the Guide but I don't find this information.

    So I have one VCF file for each chromosome and I would like to combine into one VCF.
    if I use the option -genotypeMergeOptions UNIQUIFY (GATK 2.6.4), I have the name of each samples repeated.

    for example, in each VCF I have the genotype of sample AHAO8001 in each VCF files and in the merged VCF file,I have now multipe my sample but with .variantXX like that AHAO8001.variant AHAO8001.variant10 AHAO8001.variant11 AHAO8001.variant12 ... . I don't want that, I would like only one column per sample

    but if I use the option -genotypeMergeOptions REQUIRE_UNIQUE (GATK 2.6.4), I have an error message

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalStateException: REQUIRE_UNIQUE sample names is true but duplicate names were discovered.
    at org.broadinstitute.sting.gatk.walkers.variantutils.CombineVariants.initialize(CombineVariants.java:215)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.6-4-g3e5ff60):
    ERROR
    ERROR Please check the documentation guide to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: REQUIRE_UNIQUE sample names is true but duplicate names were discovered.
    ERROR ------------------------------------------------------------------------------------------


    But it works with GATK 2.2.9.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Are you saying that you have the same sample name in multiple files? That's what the error says is happening and is not allowed.

    Versions before 2.3 did not apply the priority rules correctly, so maybe it looks like it is working but it is not doing the right thing.

  • TiphaineTiphaine Member
    edited July 2013

    when I do zcat readphasedvariants.*gz | grep "^#CHROM" | uniq , I have only one line

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AHAO8001 CMSB32011 CS934001 KH948045 LSB32012 M88BBTPL MHAO8003 MKB32014 MN932002 PNB32015 RMAO8004 SBB32010

    and I have this same line in each file.

Sign In or Register to comment.