GATK v3.1 HaplotypeCaller with GATK 1.6 Realigned & Relcalibrated BAMs

dobrien41

For GATK v3.1 I want to use the HaplotypeCaller with Combine gVCFs & Genotype gVCFs on GATK 1.6 Realigned & Relcalibrated BAMs.
A) Should I realign and recalibrate my BAMs with GATK v3.1 prior to using the HaplotypeCaller?
B) Is this a problem for other issues I am not seeing?
C) Should I not go this route and just use UnifiedGenotyper to call variants jointly across all of my samples?
FYI - This is for a cohort of ~1000 germline exome samples.


  Geraldine_VdAuwera

    You should definitely use the new pipeline with HaplotypeCaller for superior results. In principle A) should work fine; however please be aware that we have fixed a number of bugs in the data processing tools, so your realigned & recalibrated bams could potentially be improved on by reprocessing with a newer version. But as far as I know there is no technical reason why you would have any issues (@ebanks will jump in if I'm wrong), and considering the computational requirements of reprocessing I wouldn't push to reprocess.

  • 1.6 is pretty darn old. IIRC, 2.0 had big improvements in the recalibration steps (recalibrating indels as well as single-base mismatches), and I'm pretty sure HC uses those indel qualities. I don't believe realignment has substantially changed

  • We have a similar question that a thousand of samples were processed by BWA aln, GATK2.7 realigned and recalibrated. Eventually, we would like to make use of the N+1 feature of the new version as more WES samples are coming.

    1) Would you recommend redo all realignment and recalibration using the new version of GATKv3? or we can simply redo the gVCF calling using the older version of recalibrated files?

    2) New data is going to use BWA mem (not aln) for alignment. Will it make a difference if we try to simply recall new+old bam files with slightly different alignment algorithm?


  Geraldine_VdAuwera

    Hi @claratsm,

    1) If your files were processed with 2.7 you should be fine. I wouldn't reprocess them.

    2) That's a little trickier. I would tend to say it's not a big deal, but others may have stronger opinions. @carneiro may want to comment on this.

  Carneiro

    differences between BWA aln and BWA mem will be minimal, unless you're working with long reads.

  • Ours are just normal TruSeq with ~100bp. Thanks a lot.

