Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What if Ploidy is set to 2 for pooled DNA sequencing experiment?

sudarshansudarshan Princeton UniversityMember

Hi,
I performed a pooled sequencing experiment with 60 individuals and am trying to call variants using HaplotypeCaller. The ploidy is technically supposed to be set at 120. But there are known issues with high ploidy and HC not designed to handle it. Since this is a pooled experiment, I can only get allele frequency and not individual level variants anyways. Furthermore I am not interested in rare alleles either.
So what if I set the ploidy to 2 (and maxAltalleles to 3 or 4 )? For the program, would it appear that that the reads are actually coming from a diploid individual sequenced at high depth?

Please let me know.

Best Answers

Answers

  • sudarshansudarshan Princeton UniversityMember

    Great! Thanks a tonne! Do you have any idea if reducing the ploidy to 2 will negatively influence the allele frequency estimation?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sudarshan
    Hi again,

    After talking to my colleagues, I should state explicitly here, you will only get the alleles that are quite common in your samples. (ploidy 2 assumes a 50/50 allele frequency). However, you may want to look into MuTect for your variant calling. It will allow you to detect lower frequency alleles. Have a look at this thread. Note: The latest version of MuTect calls both SNPs and indels.

    -Sheila

  • sudarshansudarshan Princeton UniversityMember

    Thanks a tonne for this response.. I was planning on using UG with ploidy at 120 and comparing it with HC at ploidies of 2 and 3, (although I am not sure it would solve the problem).
    Mutect looks very interesting. A few of questions regarding mutect without intending to deepen this thread
    1) The documentation says it can only compare one normal to one tumor, has this changed in the latest version to comparing multiple samples?
    2) The ploidy for a typical tumor is still probably lower than the total ploidy for a typical pooled experiment in Drosophila. I assume this could potentially run into the same problem with high ploidy (ploidy for my samples are 120)
    3) Does it support non-mammalian reference builds?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    MuTect2 does not support calling multiple samples yet, only tumor-normal pairs. Tumor-only calling is technically possible but not recommended. It does support any organism genome you want to throw at it as long as it's formatted correctly.

    Not sure what you mean by your second question about ploidy, can you clarify?

  • sudarshansudarshan Princeton UniversityMember

    Thanks for the response. I just meant ask that even with MuTect2, would having a high ploidy like 120 be computationally very intensive and run into the same problems like HC?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sudarshan
    Hi,

    Can you try HaplotypeCaller with the latest version (3.5)? There have been some updates that might help in the latest version.

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Also, regarding MuTect, it's important to understand that MuTect uses a genotyping model and pre filtering rules that make it largely unsuitable for germline variant calling. I would not consider it an appropriate replacement for HC in this case.

  • sudarshansudarshan Princeton UniversityMember

    Hi,
    Thanks a tonne for your responses. I have tried using GATK v3.5 to run HC. For files that are ~50Mb, the program finishes running with ploidy 120 and maxalt alleles 3 (memory 128Gb and -Xmx 100g). It is also considerably faster. But for larger files ~1Gb, with the same settings, it runs into memory problems (insufficient memory or it just stops and there is no progress after a few hours). It also works if I use ploidy 2 or 3 with default maxalt alleles setting.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @sudarshan,

    Thanks for this interesting feedback. Regarding these files of different sizes, do they contain a different span of genome regions, or different levels of coverage depth?

  • sudarshansudarshan Princeton UniversityMember

    I apologize. The small file is the 4th chromosome and larger file is the 2R arm of Drosophila melanogaster. So I believe, yes to both of your questions. I am now using the -L parameter to try and further reduce the chunk size that I run through HC for the larger arm. I assume that at some point the span of region will be small enough that HC will be able to handle even high coverage for a given region. I hope this is not an incorrect assumption. I will post again if it works.

Sign In or Register to comment.