Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Does GATK HaplotypeCaller has resume analysis feature

modashtimodashti KuwaitMember

Hello there,

I am calling variants on 800 exome samples using Haplotypercaller for some reasons the caller stopped the analysis on certain location on chromosome 5 (after 5 weeks and i have 10 weeks to go). The error is below (one of the samples was malformed)
_INFO 02:04:51,540 ProgressMeter - chr5:180670788 1.05846818873E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:06:01,736 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:07:01,737 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:08:11,738 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:09:11,739 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:10:11,740 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:11:11,741 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: File /media/daruma/sea/sample483.fastq.gz.mdup.realigned.fixed.recal.bai is malformed: Premature end-of-file while reading BAM index file sample483.fastq.gz.mdup.realigned.fixed.recal.bai. It's likely that this file is truncated or corrupt -- Please try re-indexing the corresponding BAM file.

_
I will re analysis the sample and want to resume the analysis to speed up the process i was wondering if Haplotypecaller has resume function. If it doesnt what is the best way to work around the issue?
- shall i modify the exome target file ? if yes, shall i start from the begining of chroomsome 5 or find the closer position to where the analysis stopped ?

  • why not including resume feature by feeding the analysis VCF file where it was stopped?
    would like to hear other suggestions that was not mentioned

Thanks in advance :)

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @modashti
    Hi,

    So, you are running HaplotypeCaller on all 800 samples? That is a lot, and I am not sure if HaplotypeCaller can handle all 800 samples all at once. We have a GVCF workflow that will be better in your case. Have a look here.

    Not only will that alleviate issues of compute and adding more samples, you can also easily fix the files that have issues without messing up all the other runs.

    There is no resume function. The best thing to do is follow the GVCF workflow. Do not try to run from the last position in the progress meter, as that can be incorrect and cause issues downstream.

    -Sheila

  • modashtimodashti KuwaitMember

    Thanks a lot Sheila,

    I doing association study so am trying to get accurate calls (indels and SNP) as much as possible result. I am aware of GVCF but how does it compared to calling samples in one go ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @modashti
    Hi,

    They should be functionally equivalent. There were some issues of singletons being dropped, but those should be resolved with the newQual model.

    -Sheila

  • wbsimeywbsimey California Academy of SciencesMember ✭✭

    I had a similar experience with one of my HaplotypeCaller analyses. Our system ran out of memory and I got the error:

    07:33:52.460 INFO ProgressMeter - scaffold_4:190270572 2212.7 4129370 1866.2
    07:33:53.032 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    07:33:53.032 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    07:33:53.032 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    07:33:53.032 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    07:34:00.277 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
    07:34:02.555 INFO ProgressMeter - scaffold_4:190336397 2212.8 4129870 1866.3
    OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007fff00000, 1048576, 0) failed; error='Cannot allocate memory' (errno=12)
    #

    There is insufficient memory for the Java Runtime Environment to continue.

    Native memory allocation (mmap) failed to map 1048576 bytes for committing reserved memory.

    An error report file with more information is saved as:

    /GATK4/bootstrapping_known-sites/hs_err_pid182123.log

    I was trying to run 16 HaplotypeCaller runs simultaneously. Most of the others completed, but this is our largest bam file.

    Is there a way to resume or restart at "scaffold_4"?
    my command was:

    gatk HaplotypeCaller \
    -R ../../Raw_data/Tse_SBAPGDGG_D.fa \
    -I ../sorted_dedup_rg_bams/TP30046_sorted_dedup_rg.bam \
    --genotyping-mode DISCOVERY \
    --emit-ref-confidence GVCF \
    -O TP30046_sorted_dedup_rg.gVCF

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited July 29

    HI @wbsimey

    Is there a way to resume or restart at "scaffold_4"?

    No there is no way to resume or restart with Haplotypecaller, unless you are running on the cloud platform that has call caching .

  • wbsimeywbsimey California Academy of SciencesMember ✭✭

    What if I use the -XL option to exclude the scaffolds before scaffold_4 then merge the gVCFs?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @wbsimey

    That should work.

Sign In or Register to comment.