Accelerate HaplotypeCaller step

chabibchabib FranceMember

Hello everyone,

I am using GATK in a clinical context for NGS diagnosis. The issue is that the HaplotypeCaller take some time, too much time actually (2h per patient).
I tried this things :

  • reduce the bam file size by keeping only the genomic regions of my diagnosis genes but it looks like it still run all the hg19 genome.
  • ask "only variants" with the output_mode option but the output file is exactly the same than the default one.
  • use several CPU thread, but 1 CPU = 147 min, 2 CPU = 89 min, 3 CPU = 80 min. And I don't have this much CPU available so it is not interesting above 2 CPU , and still not fast enough.

I can't use the data thread option right now, would it allow me to gain more time than the CPU option ?
There is the interval option but I don't think it would allow me to gain enough time since I have gene of interest on almost all chromosomes.

I would appreciate to have your guidance regarding this problem. How would you do to make this HaplotypeCaller step faster ?

Many thanks in advance.

Christ

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @chabib
    Hi Christ,

    I think you are not specifying the regions of interest in your command. It is not enough to simply reduce your bam file size. You can specify your regions of interest using the -L argument. Have a look at this article for more information.

    -Sheila

Sign In or Register to comment.