MQRankSum and ReadPosRankSum for SNPs in a haploid organism?

rorycraigrorycraig EdinburghMember


Apologies if this has been addressed previously. I'm working with genomic resequencing data for a haploid organism, and I have created a VCF file using GenotypeGVCFs from 33 gVCFs created using HaplotypeCaller (using best practices). I did not set the ploidy option when using GenotypeGVCFs as directed. My final aim is to filter for a subset of high-quality SNPs for a downstream analysis.

As I understand it the parameters MQRankSum and ReadPosRankSum can only be calculated if there is an individual with a heterozygous genotype (ref and alt alleles) at that position. Around 15% of my SNPs have been scored for these parameters, can anyone explain what this means for a haploid? Are these sites good candidates to filter outright?

An example SNP is below:

chromosome_1 3316 . G A 492.42 . AC=3;AF=0.136;AN=22;BaseQRankSum=0.731;ClippingRankSum=1.70;DP=1488;FS=0.000;MLEAC=3;MLEAF=0.136;MQ=31.59;MQRankSum=-5.660e-01;QD=16.98;ReadPosRankSum=0.731;SOR=1.308 GT:AD:DP:GQ:PL 0:86,0:86:99:0,1800 0:89,0:89:99:0,1800 0:147,0:147:99:0,1800 0:51,5:56:99:0,1800 1:1,4:5:80:80,0 0:271,0:271:99:0,1800 1:0,8:8:99:247,0 0:21,1:22:99:0,814 .:0,0 1:5,11:16:99:211,0 0:140,72:212:99:0,1800 0:242,13:255:99:0,1800 0:252,0:252:99:0,1800 .:0,0 .:0,0 .:0,0 0:1,0:1:44:0,44 .:0,0 .:0,0 0:3,0:3:99:0,112 0:1,0:1:39:0,39 .:0,0 .:0,0 .:0,0 .:0,0 0:17,0:17:99:0,360 0:1,0:1:37:0,37 0:4,0:4:99:0,135 0:10,0:10:99:0,270 0:3,0:3:99:0,119 0:3,0:3:99:0,111 0:3,0:3:99:0,119 .:0,0



  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Rory,

    Can you confirm that the RankSum annotations do not appear in the final VCF if you do set ploidy in GenotypeGVCFs? I don't think you should use the RankSum annotations for haploid samples, as the annotation is meant for diploid samples.


  • rorycraigrorycraig EdinburghMember

    Hi Sheila, sorry for the slow reply. I can confirm that these annotations do still appear if ploidy is set to 1 in the GenotypeGVCFs command. Do you have any insight on whether it's best to ignore these annotations, or actively filter them? Thanks!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    We don't have any recommendations for using or not using rank sum annotations in haploid (or non-diploid) samples. I think the best thing to do is try both ways (filtering with and without the rank sum annotations) and see which works best for your dataset.


  • qiangfuqiangfu BelgiumMember


    I had a question about ReadPosRankSum in haploid. I went through the definition given at GATK doc page ReadPosRankSumTest, also the explanation of the statistics behind the score at Rank Sum Test.

    However, I could not find any information related to the ploidy of a sample that makes this score invalid for haploid. I really would like to have some clarification on why this score is only valid for diploid ?

    Many thx.


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @qiangfu

    ReadPosRankSumTest invalid for haploid because it compares distributions of relative positions of alt reads and ref reads, and we can't have both on a haploid sample
    there is this note in documentation:


    • The read position rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles.

  • qiangfuqiangfu BelgiumMember

    Thanks for the clarification.

    Indeed for haploid species, it should have only ref or alt theoretically. I got confused as for virus or even for bacteria, there is sometimes more then one population in a sample, resulting ref/alt to appear at the same position... But that is not from the same origin (at least two different populations), then the assumption for ReadPos bias does not applicable anymore.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Hi @qiangfu Feel free to post a follow up question.

Sign In or Register to comment.