Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Are RGQ values greater than 99 valid?
I have two questions with regard to RGQ and the --includeNonVariantSites flag in GenotypeGVCFs:
1) I have read in another thread that GQ and RGQ are capped at 99. However, I am seeing values that go higher than this in my VCF. I wanted to check to make sure this was not indicative of a problem with the VCF.
2) I have also noticed that for sites with RGQ annotations (those sites that were determined to be monomorphic), if a genotype is uncalled (has a value of "./."), then there are fewer fields in the sample genotype blocks than there are in the genotype format field (column 9). Is this intentional?
For instance, here are the first 14 columns from one line that contains both questions--the second genotype only has three fields and the third, fourth and fifth genotypes have RGQ values of 102:
ABCF3 699 . T . 20.51 . AN=436;DP=34798;InbreedingCoeff=-0.1104 GT:AD:DP:RGQ 0/0:176,0:176:0 ./.:195,0:195 0/0:35,0:35:102 0/0:40,0:40:102 0/0:34,0:34:102
Are these things something that I should worry about? I'm using GATK nightly-2017-10-17-g1994025, and the command I used to genotype 283 gVCFs was:
java -Xmx200G -jar ./GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 44 --includeNonVariantSites -R reference.fasta --variant allGVCFfiles.bqsr.list -o samples.bqsr.raw.allSites.vcf > allSites.log 2>&1
Thanks very much for the great work you do!