Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

ploidy level in HaplotypeCaller and GenotypeGVCFs

Hello,
I am trying to make SNP calling of chloroplast DNA reads from 85 samples, using GATK v4.0.
First, I used HaplotypeCaller to produce individual GVCF with the default ploidy setting. Making joint call using GenotypeGVCFs, and it only took a few minutes. But then I think the sample ploidy should set as 1, since I am working with chloroplast data.
I did not change any settings but only added “-ploidy 1” when running HaplotypeCaller and it worked. However, when running “gatk GenotypeGVCFs” with default settings, the program hanged at “WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples” for hours and hours. Then I tried “-ploidy 85” when running GenotypeGVCFs but got the same problem.
I wonder what is wrong with the ploidy setting.
Also, I found HaplotypeCaller with “-ploidy 1” detected much fewer SNPs, comparing to the running with the default setting. I assume this is reasonable, right?

Looking forward to reply! Thanks a lot in advance!

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited February 2018

    @houyan
    Hi,

    Let me make sure I have this right. You have 85 samples, all ploidy 1. You created GVCFs for each of the 85 samples using HaplotypeCaller in GVCF mode with ploidy set to 1.

    When trying to run GenotypeGVCFs with ploidy 1, you got a WARN statement telling you an annotation cannot be calculated, but the tool ran to completion. When running GenotypeGVCFs with ploidy 2 and 85, you also got the WARN statement, but the tool ran to completion.

    If this is the case, you have nothing to worry about :smile: The WARN statement is simply telling you the annotation InbreedingCoeff cannot be calculated. The reason is your GVCFs have genotypes that are haploid, and InbreedingCoeff can only be calculated for diploid genotypes. Have a look at the tool docs for more information.

    You are correct ploidy 1 will give less variants than higher ploidy because there has to be ~100% evidence to call a variant in haploid data. In diploid data, you need ~50% of the reads to contain the variant.

    -Sheila

  • Hi Sheila,
    Thank you very much for your reply! Yes, you got my message correctly.
    Now the GenotypeGVCFs with ploidy 1 running finished. It took much more time than running with ploidy 2, and showed much more WARN statements, which worried me. But now, the results seems all right.

    Thanks a lot again!

    Yan

Sign In or Register to comment.