GATK licensing moves to direct-through-Broad model -- read about it on the GATK blog

HaplotypeCaller is slow

My variant calling seems very slow. What do you think?

We have 10 BAM files, each about 2.5GB, covering a targetted region of about 15MB.

I am using the HaplotypeCaller with 8 threads (-nct 8) and it is taking 31 hours.

When we start whole genome sequencing this will be impossible!

Any ideas on how to speed things up? Is this a normal speed?

Best Answer

Answers

  • leeyoungwhaleeyoungwha Posts: 23Member

    Hello,

    It doesn't sound unusual in my experience - I've run HaplotypeCaller with 94 samples after ReduceReads, nct 20, minPruning 2. I paralellized it - running two jobs with nct 20, 1 Mb at a time for each thread. A chromosome in my species is ~14 Mb, and that took about 2-3 days. I would say running a couple jobs and decreasing the size of your target segment for each job might help, since then variant calling won't stall completely on difficult regions?

    YW

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,631Administrator, GATK Developer admin

    That seems like a lot, although it sounds like that's pretty deep sequencing. Maybe you're running into coverage issues. Are you using ReduceReads to compress your data at all?

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,631Administrator, GATK Developer admin

    @leeyoungwha, the runtimes you're getting sounds about right, but you're working with 94 samples; whereas Mike is working with only 10, which should be comparatively much faster.

    Geraldine Van der Auwera, PhD

  • mike_boursnellmike_boursnell Posts: 85Member

    Maybe I should stick to UnifiedGenotyper?

  • mike_boursnellmike_boursnell Posts: 85Member

    Sounds good. I'll look forward to it

Sign In or Register to comment.