Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
MergeBamAlignment – Select primary alignment
Hi,
In the current best practices workflow gatk4-data-processing
, you recommend using uBAMs instead of FASTQ files. Great idea! However, when it comes to merging with the BWA alignment BAM, there is something that puzzles me.
Here is an example of a paired-end read mapped by BWA:
XXXXXXXX:412:YYYYYYYYY:1:11101:10001:10497 83 chr16 1229894 0 149M = 1229833 -210 GGGCCGCGTAGGCGCGGCTCGCCAGGACGGGCAGCGCCAGCAGCAGCAGATTCAGCATCTGGGGAGCAAGGAGGAGCATCGTGGGCCTGGCCGGGCCTCACAGGGCAGGGCTGGGGGCTACAGATTGTGGGGTGAAGAATGGAGCTGAG AAAAA/E<EEAA</A/<EA<<EEEEEEEE/EEEAAEEAEE/EAEAAEEEEEEEEEEEAEEAAEEAEAEAAEEEEEEEEEEEEAAEEEEAE6EAEEEEEEEE/EEEEEEE/EE/AEAAEEEEEEEEEAAEEEEEEEEEEEEEEEEAAAAA XA:Z:chr16,+1240848,149M,1;chr16,+1256211,149M,6; MC:Z:150M MD:Z:147G1 RG:Z:NS500158.1 NM:i:1 AS:i:147 XS:i:147 XXXXXXXX:412:YYYYYYYYY:1:11101:10001:10497 163 chr16 1229833 0 150M = 1229894 210 CCAGGCCCTGACCTGTGGAATGTGGTGAGGGGCAGGGTGGACCCCGGCTGGGACTCACCAGGGGCCGCGTAGGCGCGGCTCGCCAGGACGGGCAGCGCCAGCAGCAGCAGATTCAGCATCTGGGGAGCAAGGAGGAGCATCGTGGGCCTG AAAAAEEEEEEEEEEEEAE6EEEAEEEEEEEEEEEEEEAE/EEEEEEEEEEA/AEAEEEEEEEEEAEAE<EEE6A/EEAAAEEEA/EEAAEEAEEE/AAAAEEEEEEEAE/EEEEEEEEEEAEEEEEEAEEEAEE6EAEEAE<</AAA<6 XA:Z:chr16,-1240908,150M,0; MC:Z:149M MD:Z:150 RG:Z:NS500158.1 NM:i:0 AS:i:150 XS:i:150
Note that BWA has suggested an alternative alignment given in the XA
tag. When using MergeBamAlignment
as in the best practices pipeline, the alignment in XA
is chosen. I have tried modifying the --PRIMARY_ALIGNMENT_STRATEGY
parameter, but is doesn't change anything.
In the old days before uMAPs, you worked directly with FASTQ files and hence used the primary alignment selected by BWA. What is the motivation for changing that?
Tagged:
Answers
Hi @micknudsen, thanks for your question. Would you mind providing the output from MergeBamAligment for the problematic inputs, and the aligner's name and version (bwa mem vs bwa), with the full command line?
Hi @akovalsk. I lost track of where I got the example from, but I have found another run. When I run
bwa-mem
directly on the FASTQ files, I get the following:but when I use the
gatk4-data-processing
version withMergeBamAlignment
, the following alignment is chosen:That is, one of the
XA
reads is preferred. I now realize that in both this and the previous example, the read has0
mapping quality, and I have not been able to find examples where this is not the case. Still, I find it a little strange why one would prefer one (ambiguous) mapping over the other.Thanks for sharing this info, @micknudsen
MergeBamAlignment uses BestMapqPrimaryAlignmentSelectionStrategy to choose which alignment to use. For paired-end reads, it chooses the alignment that gives the best sum of mapping qualities. In the event that there is a tie, one alignment is chosen randomly (with a fixed seed)
Hi @akovalsk
Looks like the best practices workflows still use MostDistant approach for selecting primary alignments.
Is this still true or has anything changed at best practices since recently?
Hi @SkyWarrior would you mind sharing the context where exactly that screenshot comes from?
Of course. Sorry my bad
https://github.com/gatk-workflows/broad-prod-wgs-germline-snps-indels?files=1
Hi @SkyWarrior that repo is indeed out of date, and users should be looking here instead: https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline/tree/master
The default alignment strategy for MergeBamAlignment has been BestMapq for some time.
Hi @akovalsk
In this git repo I see that
https://github.com/gatk-workflows/five-dollar-genome-analysis-pipeline/blob/master/tasks/Alignment.wdl
still uses MostDistant as the strategy?
Can you clarify that?
Also is this BestMapq for hg38 based alignments only or does that cover hg19 as well?