We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Strange distribution of RGQ values

prepagamprepagam Member
edited January 2018 in Ask the GATK team

I extracted the RGQ for a sample from 15.5 million non-variant sites of one chromosome and plotted the distribution (I am trying to work out if I can set some thresholds for RGQ for filtering).

I get a very strange distribution in the values. Specifically, certain values e.g. RGQ=3 is very common (60,929), whereas adjacent values RGQ=1,2,4,5 are very rare (<3,000). (see attached plot). This occurs throughout the distribution of values. It appears that it's values that are multiples of 3 that are common (3,6,9,12,15,18....). I just wondered if this was normal. I've plotted GQ before (which is different of course), but I don't see this pattern.

I'm using gatk 3.7, and the vcf was created using the invariant sites from this command

java -jar -Xmx16G ${GATK} \
-T GenotypeGVCFs \
-allSites \
-stand_call_conf 0 \
-L ${Chrom} \
-V [10 gvcf files produced from haplotype caller -ERC BP_RESOLUTION and emit all sites]\
-o ${Outdir}/${Abb}'jointcalled'${Chrom}'.vcf.gz'

Best Answer


Sign In or Register to comment.