how to set HaplotypeCaller ploidy argument? espically when pool samples

if the species is not diploid, do I need to set the ploidy argument by myself?
what does
For pooled data, set to (Number of samples in each pool * Sample Ploidy).

but in GVCF mode, we call one sample one time, when need we set to (Number of samples in each pool * Sample Ploidy).


  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @manba I am not sure which tool you are referring to, please try to be more specific on the GATK forum so we can get you an answer in a timely manner. A good rule of thumb is to include the version of GATK, the specific tool, an example of the command that was run, a record of the errors or a screenshot of the error log and/or any outputs pertaining to your question.

    In this case, I am interpreting your question as regarding the GenotypeGVCFs tool. According to the documentation found here

    Special note on ploidy This tool is able to handle any ploidy (or mix of ploidies) intelligently; there is no need to specify ploidy for non-diploid organisms.

    If this is not the tool you were using, please read the first paragraph and send all of the relevant information on any follow up question.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Oops, I misread the top header!! @manba if you are referring to HaplotypeCaller in -GVCF mode, then you use Number of samples in each pool * Sample Ploidy

    Could you please provide more information on what you are trying to attempt? What analysis are you running and what are your samples?

  • manbamanba Member ✭✭
    edited January 5

    Thanks @AdelaideR .
    to be more clear, I will describe step by step.

    Q1: why GenotypeGVCFs and HaplotypeCaller both have the argument -ploidy

    Q2: gatk suggests joint-calling, so HaplotypeCaller to get the gvcf of each sample, and GenotypeGVCFs performs the multi-sample joint aggregation step and merges the records together in a sophisticated manner: at each position of the input gVCFs, this tool will combine all spanning records, produce correct genotype likelihoods, re-genotype the newly merged record, and then re-annotate it.
    if you merge vcfs, will you merge from different species(such as, 5 samples from animals(not diploid), 6 samples from human), so why we need to use Number of samples in each pool * Sample Ploidy?

    Q3: I mean if I do analysis on not human, not diploid, how should I set the value of the argument -ploidy in both GenotypeGVCFs and HaplotypeCaller

    Q4: in human germline variant calling, do you think joint-calling can be applied, because joint-calling can recue some sites not certain in some samples.

    Q5: 12 pooled WGS samples, what does pooled mean?
    Thanks a lot

    Post edited by manba on
  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @manba The answers to these questions can be found in the documentation. If you have a specific problem or a specific task that you are trying to accomplish, it would be possible to help you on the forums. As @bhanuGandham stated in another post:

    > All these topics are well explained in the documents are you referencing from. These documents are written by subject matter experts. We try and help users with usability issues with gatk tools. For example, errors or bugs users face while using gatk tools. So if you have any questions regarding that then please reach out to us, but only after you have read all the tool documents well.
    We have been receiving a large volume of questions from you. Please try and ask fewer and more relevant questions. When posting questions by following the guidelines outlined in We ask this because we have multiple users to cater to and we do no want to spend so much time on one user that the other users feel neglected.
    Given your many questions, I think you may benefit from attending a GATK bootcamp. Our bootcamp/workshop schedule is at You can also ask your institute to schedule a GATK workshop or try to attend a GATK event at an ASHG or AGBT conference. In leiu of attending a workshop, I'd like to point you to GATK workshop materials under the Slides and workshop tutorial bundles section on the Presentations page. On this page you will also find links to YouTube videos that explain GATK tools and background context, including the entirety of workshop presentations. I think you will find these helpful.
Sign In or Register to comment.