Handling Multimappers

Hi,

I am wondering how HaplotypeCaller in GATK 3.1 handles multimappers? I thought I read that they are passed over for variant calling but stay in the realigned, recalibrated BAM for 'completions sake', like marking duplicates not removing them but cannot find supporting info on the website or from farther afield,

Presumably it is the NotPrimaryAlignmentFilter but there is no info on that posted yet. I know I can output a BAM from HC with haplotype info in there but can I just get reads used in variant calls? Or should I trim the BAM myself to retain reads I want used? I do this for mark duplicates (removed) but for multimappers I would like to know how you define so I can do the same. The reason is for coverage estimates, using bamtobed or such means I take all realigned, recalibrated which is many more lines including multimappers which skews my results.

Thanks,

Bruce.

Best Answer

  • pdexheimerpdexheimer Member
    Accepted Answer

    What you're looking for in the documentation is ReadFilters - each tool has a list of the filters it applies by default, and you can add others on the command line. There are two pieces of data in the bam that give an indication of multi mapped reads. The first is the "Not Primary Alignment" flag, which has it's own filter as you noted. The other is the mapping quality - the recommendation (followed by most aligners I know of) is to assign a mapping quality of 0 to any ambiguous alignments.

    When you look at the documentation for each tool, there's a section on Read Filters used by that tool. If you want to "see" the input for that tool, you could run PrintReads (or CountReads, or Pileup, etc) with those filters explicitly specified on the command line. However, keep in mind that there's a bit more filtering that goes on - UnifiedGenotyper, for instance, imposes a minimum base quality as well as a minimum mapping quality on the reads it considers.

    The simpler answer, though, may be to use DepthOfCoverage or DiagnoseTargets. Those tools are designed to report on the "callable" regions of the input, and so mimic the UG/HC filters fairly closely

Answers

  • pdexheimerpdexheimer Member
    Accepted Answer

    What you're looking for in the documentation is ReadFilters - each tool has a list of the filters it applies by default, and you can add others on the command line. There are two pieces of data in the bam that give an indication of multi mapped reads. The first is the "Not Primary Alignment" flag, which has it's own filter as you noted. The other is the mapping quality - the recommendation (followed by most aligners I know of) is to assign a mapping quality of 0 to any ambiguous alignments.

    When you look at the documentation for each tool, there's a section on Read Filters used by that tool. If you want to "see" the input for that tool, you could run PrintReads (or CountReads, or Pileup, etc) with those filters explicitly specified on the command line. However, keep in mind that there's a bit more filtering that goes on - UnifiedGenotyper, for instance, imposes a minimum base quality as well as a minimum mapping quality on the reads it considers.

    The simpler answer, though, may be to use DepthOfCoverage or DiagnoseTargets. Those tools are designed to report on the "callable" regions of the input, and so mimic the UG/HC filters fairly closely

  • Hi, sorry, had the question as draft and must have clicked post instead of cancel! I have indeed just used DiagnoseTargets walker and called out all my primary alignments by flag to a new BAM, so best to just use those for coverage stats. Will be interesting to see the overlap of the two. It was mostly because the filters are not written up (not blaming anyone there is fantastic work going on by the GATK team!)

    Many thanks for your answer Pdex, probably a good idea to add DiagnoseTargets to the pipeline to quickly scan and see what filters are actually doing!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @bruce01‌

    Hi,

    Thanks for the suggestion. We will make a note of it.

    -Sheila

  • Hi Sheila,

    just to clarify I was saying I should add DiagnoseTargets to my own pipeline, not that you should add to best practices, sorry for any confusion.

    Bruce.

Sign In or Register to comment.