If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
MergeBamAlignment - what are all the exact steps it performs?
Hi, I have a question about the MergeBamAlignment tool
I tried reading through the documentation and through the couple of blog posts that I found on the GATK website, but I still have a couple of things that I could use help clearing out.
Basically, I ran the following tests:
1. Starting from an unmapped BAM file with multiple read groups, I ran the GATK data pre-processing Best Practices WDL
2. Starting from the same uBAM, but with read group information removed using AddOrReplaceReadGroups, I ran the GATK data pre-processing Best Practices WDL
3. Starting from the uBAM without readgroup information, I ran the Data pre-processing pipeline where I removed the MergeBamAlignment step
After this, with the resulting BAM files, I ran the GATK generic germline variant calling Best Practices WDL.
Between the first two cases, for the samples that I was testing with, I found 2.1% differences in variants called.
I understand here that MergeBamAlignments adds the missing readgroup information from the uBAM, which in turn is used during MarkDuplications, BaseRecalibrator and ApplyBQSR steps, which would lead to a different BAM than in the case if I didn't have the read group information (test 2), and so the variant calling would also be different.
But, between test 2 (uBAM with no readgroups and MBA exists) and test 3 (uBAM with no readgroups and MBS does not exist), I also noted differences in variants called - the difference was 0.18%, so albeit small, it still exists.
My understanding is that MergeBamAlignment also performs more actions in additions to just merging readgroups and read-level tag information.
From one post, I understood that MBA turns hardclipped reads (by BWA, usually some chimeric reads) back into softclipped reads.
Does anyone have more info on this? Or on what exactly MBA does?
Should I expect these small differences, or not?