Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Unified Genoptyper joint calling on poolseq data: variable ploidy

RohRoh CanberraMember

Hi,

I know Unified Genotyper has been superseded by Haplotype Caller, though due to time constraints (and other) I am committed to use UG. I have 14 population pools with variable numbers of individuals in each (16-38 genomes; ave 23) to use in my joint SNP call. I have included the BAMs as a .list in the command line but under -ploidy I have used the average number of genomes (i.e. 23).

What does -ploidy actually do in UG? If it is significant in my specific circumstance, can I include each pool's genome number using joint calling? Can I correct any bias during the filtration step following the SNP call?

Thanks

Issue · Github
by Sheila

Issue Number
1806
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Roh
    Hi,

    The -ploidy argument should be set to the total number of chromosomes in your pooled sample. For example, if you have 5 humans pooled together, you would set -ploidy 10. This tells the caller how many alleles to look for at each site. Have a look at this article for more information.

    You cannot specify -ploidy more than once. If you want to run with exact ploidy, you will need to run UnifiedGenotyper multiple times for each of your BAM files.

    Have a look at this thread and this thread for more help.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Roh
    Hi again,

    I should point out, the ploidy you specify changes the underlying mathematical model the tool uses to determine the genotype. Have a look at this document under 2. Calculating genotype likelihoods using Bayes' Theorem for more information. In the specific example there, the assumption is that the sample is diploid.

    -Sheila

  • cjaln1994cjaln1994 MelbourneMember

    Hi @Sheila,
    I am new to GATK and planning to call variants from RNAseq in 4 different population pools (Pool Seq).
    One population pool has a different ploidy argument to the others, so I'll run HC on each pop separately.
    I am just wondering, is there a GATK tool downstream of variant calling that will allow me to compare allele frequencies (FST) between my four pooled samples, say similar to a tool like Popoolation2? Because I know that Popoolation2 asks for a pileup input format, so thought I would check if GATK had any alternative tools for my aims...
    Hopefully this makes sense, thanks!
    Chris

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @cjaln1994
    Hi Chris,

    Yes, you can use SelectVariants to do comparisons of your different populations. Have a look at this article as well.

    -Sheila

Sign In or Register to comment.