We've made a pipeline to call SNP and Indels based on TCGA bam files. We were using GATK version 1 and tried to use the new version. Our bam files are 15Go on average (exome sequencing).
The first steps regarding duplicates and local realignment are ok. Then BaseRecalibrator and PrintReads take almost 15hours on a 16Go bam files (compared to 6-7hours before). Is that normal ? Do you plan to implement the "nt" argument ?
Then I tried to launch HaplotypeCaller but it's supposed to run 90hours so it is not possible. Should I run HaploytpeCaller on a reduced bam instead ? Would that save time ?
Otherwise my unique solution would be to run UnifiedGenotyper with the "nt" argument but I find it too bad not to use your new algorithm..
Could anyone help me on how to improve the runtime ? Should I split by chromosome ?
Thank you very much for your help !
P.S : I like the new website !!