We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Merging same sample BAMs for HaplotypeCaller

AndriusAndrius LithuaniaMember
Dear GATKers,

I really do appreciate you all for your hard work.

I work with non-human samples which were genotyped using RAD-based technique (GBS). Each of the libraries were sequenced twice on distinct flow cell lanes, thus I have two RG-tagged BAMs for each sample (SM is the same).

As I screened the GATK forum, I have learned that BAM files belonging to the same sample should be merged before passing them to HaplotypeCaller.

I have noticed, that you recommend merging BAMs either at the MarkDuplicates step, the Indel Realignment step or at the BQSR step. The problem is, that I have to skip these steps for the following reasons: (i) MarkDuplicates step is out of question because GBS relies heavily on PCR amplification, (ii) and as I work on non-human/model organisms, there’s a lack of indel/known polymorphic sites databases required for Indel Realignment or BQSR step.

Could you please specify if I could simply merge single-sample BAMs by using MergeSamFiles (Picard) and feed them to HaplotypeCaller?

I also came across to a post by Sheila, stating that ‘some people used to input same sample GVCF files to GenotypeGVCFs with no problem’, also noticing that ‘this is not best practice’. I’ve tried this, and GenotypeGVCFs runs without throweing an error, however I am concerned about the reliability of the genotyping. Is it a really bad idea to go this way?

Many thanks and have a nice day!

P.S. I am using GATK v. 3.8


Sign In or Register to comment.