To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Ploidy level in HaplotypeCaller in GATK 4.0

prateekg04prateekg04 IndiaMember
edited January 30 in Ask the GATK team

Hi,

Thanks for the new version of GATK (GATK4.0).

We have a pooling of 48 samples and the organism is diploid, we are using ploidy of 96 (48x2=96). earlier when I am using HaplotypeCaller for variant calling in older versions of GATK, I am getting the error not enough memory to run this program., so was unable to run this with HaplotypeCaller earlier. Now when I tried it with GATK 4.0 version I am not getting this error, but a warn message mentioned below

12:40:23.159 WARN HaplotypeCallerGenotypingEngine - Removed alt alleles where ploidy is 96 and original allele count is 3, whereas after trimming the allele count becomes 2. Alleles kept are:[T*, C]

The command line which we have used is below

java -jar -Xmx64g gatk-package-4.0.0.0-local.jar HaplotypeCaller -R tilling.fa -I C1_S1.sorted.bam -O C1_S1.vcf -stand-call-conf 20.0 -ploidy 96

Can you please help us what does the warn message means, whether the command and the options which I am using are right, or I need to include more options for efficient variant calling.

Thanks in advance.

Regards,
Prateek

Post edited by Sheila on

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited January 30

    @prateekg04
    Hi Prateek,

    Thank you for adding an edited post. I just edited your original post (I cannot delete the new post, but perhaps you can so others will find the thread easier to read). For the future, you can simply click on the wheel looking icon in the top right of the post and select EDIT to edit a post :smile:

    The WARN message is telling you the number of alleles at the site combined with high ploidy is too much for the tool to handle, so it is removing some less common alternate alleles. The tool removes the least common alternate alleles so there is not as much work involved in determining genotypes (it takes more compute for higher ploidy/more alternate alleles). Have a look at the methods and algorithms section for more information on genotyping. You can change the default value with --max-genotype-count, however, it may be best to leave the default. 96 is very high ploidy, and if you are just looking for the most common alleles, 2 alternate alleles should be enough. What is your end goal? If you are looking for all possible alternate alleles at all sites, you can consider lowering the ploidy so more alternate alleles can be considered, or you can indeed increase the --max-genotype-count which will in turn increase compute.

    -Sheila

  • prateekg04prateekg04 IndiaMember

    Hi Sheila,

    Thanks for you answer. But our samples are pooled samples and we can't reduce the ploidy level. initially, I tried to run it with --max-genotype-count 4 to check whether the warning comes or not, but it is giving the below mentioned error and it's not running. please find the error below

    java.lang.IllegalArgumentException: VariantContext has only a single reference allele, but getLog10PNonRef requires at least one alt allele [VC HC40 @ Psy1:28 Q. of type=NO_VARIATION alleles=[A*] attr={} GT=[[C1_S1 ./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././. PL 0]]

    Can you please help us in this regard.

    Thanks in advance.

    Regards,
    Prateek

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @prateekg04
    Hi Prateek,

    If you limit max_genotype_count to 4 with ploidy 96, the tool will crash, because 4 is too low of a number for genotypes. If you would like 1 alternate allele present in the VCF, you need to at least set max_genotype_count to 97. Have a look at this page for the relationship between possible number of genotypes and ploidy. Notice the number of possible genotypes blows up after ~3 alleles (2 alternate alleles plus ref allele). That is why we recommend setting the number of alt alleles or ploidy to a lower number. Have a look at this thread for more information.

    -Sheila

  • kjngokjngo Member
    edited February 13

    Hi Sheila,

    I have a similar experiment setup as Prateek, but our experiment has a ploidy of 128 (2x64 individuals) instead. Our goal is to detect rare variants. I was wondering what parameters should I adjust in order to have ~4 alleles (3 alternate alleles plus ref allele) written out to the VCF?

    Thanks in advance.

    Best Regards,
    Kathie

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @kjngo
    Hi Kathie,

    You should try setting --max_alternate_alleles and --max_num_PL_values to higher values.

    I hope that works. The links I provided in the post above should help as well.

    -Sheila

  • @Sheila
    Hi Sheila,

    Thank you for the suggestions. For GATK4 HaplotypeCaller, I don't see the option --max_num_PL_values, is this only available in GATK3?

    Best Regards,
    Kathie

Sign In or Register to comment.