Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller MappingQualityZero always is 0

feilchenfeldtfeilchenfeldt ViennaMember
edited March 2015 in Ask the GATK team

Hi,

I am using the standard HaplotypeCaller/GenotypeGVCFs pipeline with the --includeNonVariantSites option.
I always get MQ0=0 for all SNPs and don't get it reported for non-variants, even if I add "--annotation MappingQualityZero" to both commands.
With UnifiedGenotyper, MQ0 values are reported correctly.

Is this a bug?

Also, with HaplotypeCaller, MQ is only reported for variants, not for non-variants. Is there a way to get this annotation reported for non-variants, such as with UnifiedGenotpyer?

I am using Version 3.3-0, but collegues have a similar problem with version 3.2-2.

Thanks,
Hannes

Post edited by feilchenfeldt on

Issue · Github
by Geraldine_VdAuwera

Issue Number
821
State
closed
Last Updated
Assignee
Array
Closed By
vdauwera

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    To add to Sheila's answer, be sure to pass in the original bam file if you want to run VariantAnnotator to annotate read-related metrics. Note also that the HaplotypeCaller applies a filter to filter out MQ0 reads, so it's expected to always return a zero value since the tool never sees those reads.

  • feilchenfeldtfeilchenfeldt ViennaMember

    Thanks for the answers. That works. Still, I would recommend you to remove MQ0 as a default annotation from HaplotypeCaller since it is very misleading.
    The same is probably true for MQ (RMSMappingQuality). If HaplotypeCaller calculates it after applying the HCMappingQualityFilter, the reported value is misleading for many applications.
    (Think of people who need to apply hard filters because they work in a non-model organism.)

    Thanks,
    Hannes

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Hannes,

    We are indeed considering removing MQ0 from the default set since it confuses people and does not provide useful information. Regarding MQ, that's a different matter. MQ is informative since it is the value that corresponds to the reads that HC actually used in order to make the call. If we were to report the MQ of all reads including those that were not used, that would be very misleading.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Quick update: we have removed MQ0 from the default HC annotations, so it will no longer be generated by HC unless requested.

  • feilchenfeldtfeilchenfeldt ViennaMember

    Good to know.
    While it might be sensible for many applications, I still find the behaviour of reporting MQ0 and MQ on the reads after filtering not totally straight-forward. For instance, in my understanding UnifiedGenotyper does not use MQ0 reads either, but it counts them in MQ0. I guess the difference is that there is no filter applied before.

    I would suggest to state more clearly in the INFO tag in the header what MQ0 and MQ really is in each case, e.g.,

    INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality of the reads considered by HaplotypeCaller, i.e., after read filters have been applied.">

    This is important, also, because if I use VariantAnnotator afterwards it will use the same tag with the same description but counts all reads, not only the ones considered by HC.

    Thanks,
    Hannes

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @feilchenfeldt The definition of the annotation that is put in the VCF header is and should remain independent of what specific procedures may have been applied prior to the calculation. That's why it's important to understand that different filters may have been applied to the data, depending on the tool and/or what the user specifies in the command line. We can try to make this more obvious in the documentation, but we can't customize the header definition for every possible case.

Sign In or Register to comment.