Hi, how can I exclude multi-mapped reads before using UnifiedGenotyper to call SNPs?
You mentioned picard that could be used to exclude multi-mapped reads. However, which tool of picard should be used?
Thanks a lot!
You can use the read filters here. You should be able to use NotPrimaryAlignment and MappingQualityZero read filters to ensure the multi-mapped reads are not used. Also, have a look at this thread for more information.
Thanks for your help! Is it possible to remove both primary and secondary reads by using these tools? I do not want any ambiguity reads.
I'm not sure why you want to remove primary reads? You can look up the list of reads filters in GATK here (click on Read Filters).
@FelixZhang I'm curious as to why you want to remove primary records of multi-mapping reads? I ask because today's longer reads together with BWA-MEM are pretty good at designating primary alignments such that most if not all non-primary alignments will be supplementary (0x800 flag) rather than secondary (0x100 flag). Only the secondary alignments have truly ambiguous mapping if taken alone and even these mappings could be less ambiguous when considered with the mapped mate in paired libraries. Mate mapping location is a consideration for primary alignment designation, e.g. mapped to the same chromosome. If you still want to remove sets of records that have a specific flag, then you can follow directions at the end of this blogpost. Note that these instructions remove the entire set of records that share a read name, including the mate.
Does UnifiedGenotyper keep multi-mapping reads (seconday reads) by default？
Does haplotypecaller do the same thing?
Both tools filter out secondary alignments. You can usually see which filters are applied to the tools in the tool docs, but I am not seeing them in GATK4 docs. I am checking with the team on this.