GATK2.0 runtime

delahayemdelahayem Member
edited October 2012 in Ask the GATK team

Hi,
We've made a pipeline to call SNP and Indels based on TCGA bam files. We were using GATK version 1 and tried to use the new version. Our bam files are 15Go on average (exome sequencing).

The first steps regarding duplicates and local realignment are ok. Then BaseRecalibrator and PrintReads take almost 15hours on a 16Go bam files (compared to 6-7hours before). Is that normal ? Do you plan to implement the "nt" argument ?

Then I tried to launch HaplotypeCaller but it's supposed to run 90hours so it is not possible. Should I run HaploytpeCaller on a reduced bam instead ? Would that save time ?
Otherwise my unique solution would be to run UnifiedGenotyper with the "nt" argument but I find it too bad not to use your new algorithm..

Could anyone help me on how to improve the runtime ? Should I split by chromosome ?

Thank you very much for your help !
Manon
P.S : I like the new website !!

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited October 2012

    Hi Manon,

    The runtimes you observe are not abnormal. We are working on implement -nt for BQSR but it's not ready for public use yet, sorry.

    There are some workarounds to speed up the Haplotype Caller (see here for links).

    Ultimately though the best way to speed things up on all steps is to use Queue to parallelize execution using scatter-gather (see here for an overview).

    I hope this helps!

    Good luck,

    Geraldine

Sign In or Register to comment.