Does GATK HaplotypeCaller has resume analysis feature

modashtimodashti KuwaitMember

Hello there,

I am calling variants on 800 exome samples using Haplotypercaller for some reasons the caller stopped the analysis on certain location on chromosome 5 (after 5 weeks and i have 10 weeks to go). The error is below (one of the samples was malformed)
_INFO 02:04:51,540 ProgressMeter - chr5:180670788 1.05846818873E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:06:01,736 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:07:01,737 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:08:11,738 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:09:11,739 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:10:11,740 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w
INFO 02:11:11,741 ProgressMeter - chr5:180687847 1.05853600709E11 4.5 w 25.0 s 31.7% 14.1 w 9.6 w

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: File /media/daruma/sea/sample483.fastq.gz.mdup.realigned.fixed.recal.bai is malformed: Premature end-of-file while reading BAM index file sample483.fastq.gz.mdup.realigned.fixed.recal.bai. It's likely that this file is truncated or corrupt -- Please try re-indexing the corresponding BAM file.

_
I will re analysis the sample and want to resume the analysis to speed up the process i was wondering if Haplotypecaller has resume function. If it doesnt what is the best way to work around the issue?
- shall i modify the exome target file ? if yes, shall i start from the begining of chroomsome 5 or find the closer position to where the analysis stopped ?

  • why not including resume feature by feeding the analysis VCF file where it was stopped?
    would like to hear other suggestions that was not mentioned

Thanks in advance :)

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @modashti
    Hi,

    So, you are running HaplotypeCaller on all 800 samples? That is a lot, and I am not sure if HaplotypeCaller can handle all 800 samples all at once. We have a GVCF workflow that will be better in your case. Have a look here.

    Not only will that alleviate issues of compute and adding more samples, you can also easily fix the files that have issues without messing up all the other runs.

    There is no resume function. The best thing to do is follow the GVCF workflow. Do not try to run from the last position in the progress meter, as that can be incorrect and cause issues downstream.

    -Sheila

  • modashtimodashti KuwaitMember

    Thanks a lot Sheila,

    I doing association study so am trying to get accurate calls (indels and SNP) as much as possible result. I am aware of GVCF but how does it compared to calling samples in one go ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @modashti
    Hi,

    They should be functionally equivalent. There were some issues of singletons being dropped, but those should be resolved with the newQual model.

    -Sheila

Sign In or Register to comment.