Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

CombineVariants in PRIORITIZE mode without -priority

TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 68Member
edited November 2012 in Ask the GATK team

Hi,

I'm just reverse engineering a colleagues script and I've noticed they're using CombineVariants in PRIORITIZE mode but without a -priority argument. I've looked at the documentation and I can't see what the defined behaviour would be in this situation. Would default priority in this situation follow the order of the arguments supplied; the reverse order; or random?

Thanks, Martin

Edit: Nevermind, from what I can see from the source it should be erroring out if -priority is not supplied. I must have missed something in the pipeline script.

Edit 2: No wait

    if ( genotypeMergeOption == VariantContextUtils.GenotypeMergeType.PRIORITIZE && PRIORITY_STRING == null )
        throw new UserException.MissingArgument("rod_priority_list", "Priority string must be provided if you want to prioritize genotypes");

is pointless because this is run first in initialize:

    if ( PRIORITY_STRING == null ) {
        PRIORITY_STRING = Utils.join(",", vcfRods.keySet());
        logger.info("Priority string not provided, using arbitrary genotyping order: " + PRIORITY_STRING);
    }

This should follow the input order yes? Unless vcfRods.keySet is sorted?

Post edited by TechnicalVault on

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

Best Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,984 admin
    Answer ✓

    I see, thanks for the clarification -- I had indeed misunderstood your post. Sure, getting someone else to fix things things for us is always good :)

    I'll ask the appropriate developer to get in touch with you to determine which way to go.

    Geraldine Van der Auwera, PhD

  • amiami Posts: 35 mod
    Answer ✓

    Hi Martin,

    I fixed this issue and it will be part of the new version (2.3) probably next week. In cases where you try you use the PRIORITISE mode and -priority is not specified GATK now emits the proper error message. (I also changed some of the related code to make sure that we sort by priory only when it is necessary and that we do take the priority list into account when it is provided, even if it is not a PRIORITISE mode.

    Thanks for pointing up on this problem. Please let us know if you still think that the problem is not solved in the coming new version (GATK2.3).

Answers

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 68Member
    edited November 2012

    Hmm if I read the source code right in your vcfutils class then vcfRods is a HashMap and order is not guaranteed... From the Java docs "This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time" Thus order is reliant on what you named your input rods and may change dependant on your java implementation? Am I correct in my reading of this?

    Post edited by TechnicalVault on

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,984Administrator, GATK Developer admin

    Hi Martin, unfortunately we don't have the resources right now to provide support for code interpretation and development, sorry!

    Geraldine Van der Auwera, PhD

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 68Member
    edited November 2012

    Hi Geraldine, I think you've misunderstand. When I first asked the question I was asking what would happen, as it was potentially undefined and undocumented behaviour in GATK.

    Then I realised (thus the edits) that this is a bug in GATK. If PRIORITISE mode is set and -priority is not specified GATK should emit the error "Priority string must be provided if you want to prioritize genotypes", it fails to do because the arbitrary genotyping order code kicks in first. So the answer should either be:

    -Existing behaviour will continue and the MissingArgument error code will be deleted.

    -Behaviour will be corrected and the arbitrary genotyping order code will be deleted.

    Of course if you don't have the manpower to fix it, just say which way you want it fixed and I can supply the appropriate patch as a pull request? :)

    Post edited by TechnicalVault on

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 68Member

    Thank you

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

Sign In or Register to comment.