Are RGQ values greater than 99 valid?

evanmelstadevanmelstad USAMember
edited October 2017 in Ask the GATK team

I have two questions with regard to RGQ and the --includeNonVariantSites flag in GenotypeGVCFs:

1) I have read in another thread that GQ and RGQ are capped at 99. However, I am seeing values that go higher than this in my VCF. I wanted to check to make sure this was not indicative of a problem with the VCF.

2) I have also noticed that for sites with RGQ annotations (those sites that were determined to be monomorphic), if a genotype is uncalled (has a value of "./."), then there are fewer fields in the sample genotype blocks than there are in the genotype format field (column 9). Is this intentional?

For instance, here are the first 14 columns from one line that contains both questions--the second genotype only has three fields and the third, fourth and fifth genotypes have RGQ values of 102:

ABCF3   699 .   T   .   20.51   .   AN=436;DP=34798;InbreedingCoeff=-0.1104 GT:AD:DP:RGQ    0/0:176,0:176:0 ./.:195,0:195   0/0:35,0:35:102 0/0:40,0:40:102 0/0:34,0:34:102

Are these things something that I should worry about? I'm using GATK nightly-2017-10-17-g1994025, and the command I used to genotype 283 gVCFs was:

java -Xmx200G -jar ./GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 44 --includeNonVariantSites -R reference.fasta --variant allGVCFfiles.bqsr.list -o samples.bqsr.raw.allSites.vcf > allSites.log 2>&1

Thanks very much for the great work you do!

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @evanmelstad
    Hi,

    1) I remember there being some debate about this, and I think the final answer was to uncap the GQ scores, so it is possible to see higher than 99 scores. I don't think you should worry about anything. However, can you check if this also happens in GATK4 :smile: ?

    2) The reason you see less annotations for the no-call samples, is that there simply is no way to calculate the confidence in the genotype, since there is no genotype :smiley: The RGQ and GQ tell you how much confidence the tool has in the genotype, but if there is no called genotype, the tool cannot calculate the confidence in it. You can assume a GQ/RGQ of 0 for ./. genotypes.

    -Sheila

Sign In or Register to comment.