ploidy level in HaplotypeCaller and GenotypeGVCFs

I am trying to make SNP calling of chloroplast DNA reads from 85 samples, using GATK v4.0.
First, I used HaplotypeCaller to produce individual GVCF with the default ploidy setting. Making joint call using GenotypeGVCFs, and it only took a few minutes. But then I think the sample ploidy should set as 1, since I am working with chloroplast data.
I did not change any settings but only added “-ploidy 1” when running HaplotypeCaller and it worked. However, when running “gatk GenotypeGVCFs” with default settings, the program hanged at “WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples” for hours and hours. Then I tried “-ploidy 85” when running GenotypeGVCFs but got the same problem.
I wonder what is wrong with the ploidy setting.
Also, I found HaplotypeCaller with “-ploidy 1” detected much fewer SNPs, comparing to the running with the default setting. I assume this is reasonable, right?

Looking forward to reply! Thanks a lot in advance!



  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited February 1


    Let me make sure I have this right. You have 85 samples, all ploidy 1. You created GVCFs for each of the 85 samples using HaplotypeCaller in GVCF mode with ploidy set to 1.

    When trying to run GenotypeGVCFs with ploidy 1, you got a WARN statement telling you an annotation cannot be calculated, but the tool ran to completion. When running GenotypeGVCFs with ploidy 2 and 85, you also got the WARN statement, but the tool ran to completion.

    If this is the case, you have nothing to worry about :smile: The WARN statement is simply telling you the annotation InbreedingCoeff cannot be calculated. The reason is your GVCFs have genotypes that are haploid, and InbreedingCoeff can only be calculated for diploid genotypes. Have a look at the tool docs for more information.

    You are correct ploidy 1 will give less variants than higher ploidy because there has to be ~100% evidence to call a variant in haploid data. In diploid data, you need ~50% of the reads to contain the variant.


  • Hi Sheila,
    Thank you very much for your reply! Yes, you got my message correctly.
    Now the GenotypeGVCFs with ploidy 1 running finished. It took much more time than running with ploidy 2, and showed much more WARN statements, which worried me. But now, the results seems all right.

    Thanks a lot again!


