Strange distribution of RGQ values

prepagamprepagam Member
January 2018

I extracted the RGQ for a sample from 15.5 million non-variant sites of one chromosome and plotted the distribution (I am trying to work out if I can set some thresholds for RGQ for filtering).

I get a very strange distribution in the values. Specifically, certain values e.g. RGQ=3 is very common (60,929), whereas adjacent values RGQ=1,2,4,5 are very rare (<3,000). (see attached plot). This occurs throughout the distribution of values. It appears that it's values that are multiples of 3 that are common (3,6,9,12,15,18....). I just wondered if this was normal. I've plotted GQ before (which is different of course), but I don't see this pattern.

I'm using gatk 3.7, and the vcf was created using the invariant sites from this command

java -jar -Xmx16G ${GATK} \
-T GenotypeGVCFs \
-allSites \
-stand_call_conf 0 \
-L ${Chrom} \
-V [10 gvcf files produced from haplotype caller -ERC BP_RESOLUTION and emit all sites]\
-o ${Outdir}/${Abb}'jointcalled'${Chrom}'.vcf.gz'

