We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Unified Genoptyper joint calling on poolseq data: variable ploidy

RohRoh CanberraMember


I know Unified Genotyper has been superseded by Haplotype Caller, though due to time constraints (and other) I am committed to use UG. I have 14 population pools with variable numbers of individuals in each (16-38 genomes; ave 23) to use in my joint SNP call. I have included the BAMs as a .list in the command line but under -ploidy I have used the average number of genomes (i.e. 23).

What does -ploidy actually do in UG? If it is significant in my specific circumstance, can I include each pool's genome number using joint calling? Can I correct any bias during the filtration step following the SNP call?


Issue · Github
by Sheila

Issue Number
Last Updated
Closed By


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    The -ploidy argument should be set to the total number of chromosomes in your pooled sample. For example, if you have 5 humans pooled together, you would set -ploidy 10. This tells the caller how many alleles to look for at each site. Have a look at this article for more information.

    You cannot specify -ploidy more than once. If you want to run with exact ploidy, you will need to run UnifiedGenotyper multiple times for each of your BAM files.

    Have a look at this thread and this thread for more help.


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi again,

    I should point out, the ploidy you specify changes the underlying mathematical model the tool uses to determine the genotype. Have a look at this document under 2. Calculating genotype likelihoods using Bayes' Theorem for more information. In the specific example there, the assumption is that the sample is diploid.


  • cjaln1994cjaln1994 MelbourneMember

    Hi @Sheila,
    I am new to GATK and planning to call variants from RNAseq in 4 different population pools (Pool Seq).
    One population pool has a different ploidy argument to the others, so I'll run HC on each pop separately.
    I am just wondering, is there a GATK tool downstream of variant calling that will allow me to compare allele frequencies (FST) between my four pooled samples, say similar to a tool like Popoolation2? Because I know that Popoolation2 asks for a pileup input format, so thought I would check if GATK had any alternative tools for my aims...
    Hopefully this makes sense, thanks!

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Chris,

    Yes, you can use SelectVariants to do comparisons of your different populations. Have a look at this article as well.


Sign In or Register to comment.