Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

HaplotypeCaller Memory 3.4.0

Hi,
I am new to the HaplotypeCaller and have huge problems getting it to run ok. I have WGS re-sequencing bam files with ~30-60 coverage (bam files are >3GB in size). I am running these in ERC mode as suggested, but within minutes, 3/4 are killed by the cluster due to exceeding memory. I am using the following command:

java -Xmx32g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -I $bamfile -minPruning 4 --min_base_quality_score $min_base_qual --min_mapping_quality_score $min_map_qual -rf DuplicateRead -rf BadMate -rf BadCigar -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -R $ref -o $HCdir"."HC.$bamfile".""."g.vcf -ploidy $cohort1_ploidy -stand_emit_conf $stand_emit -stand_call_conf $stand_call --pcr_indel_model NONE "

I have varied the amount of memory I allocate up to -Xmx256 with no improvements, and this seems a bit odd to me? Even adding the minPruning did not seem to improve the situation. I have looked at previous posts and know that HC appears quite memory greedy, but is this normal to this extent?

Many thanks in advance for any pointers.

Answers

  • Does HaplotypeCaller produce a partial output that could help you pinpoint if something in your data causes this? I had a similar problem where HaplotypeCaller would run out of memory when there were multiple alternate alleles for a position. Running HaplotypeCaller with --max_alternate_alleles 2 solved it for me. gatk will write a warning to the log whenever it encounters (and skips) any alleles, so you can get a feeling for how much data you loose with this option.

  • SarahMSarahM JICMember

    The output looks normal, up until the point where the job got killed on the HPC for exceeding memory limits. No sign of multiple alleles in the files I looked at, but I will give this a try anyway, thanks! Just out of curiosity, what sort of file size did you have and how much memory did you specify? I cannot imagine the HaplotypeCaller would require more than 300GB, so I feel like there is something going wrong, or one of the options I specified is too memory intensive...

  • SarahMSarahM JICMember

    Hi again! I think (hope) I solved the problem! Apologies, it was actually not directly related to the HaplotypeCaller! I am using a cluster to execute these jobs, and while I specified the memory requirements within the GATK command, I did not specify them in the job scheduler, which may have caused it to place several memory-intensive jobs on the same node, leading to the memory issues I experienced. Having said that, it appears both the max_alternate_alleles as well as the minPruning decrease memory requirements and increase speed. Thanks again!

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    Did you solve the problem? Otherwise, what is the value of $cohort1_ploidy? Thanks.

  • SarahMSarahM JICMember

    Sorry for the late reply! I did indeed solve the problem, it was as mentioned above my fault for not specifying memory requirements to the job scheduler. :).

Sign In or Register to comment.