Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Strange distribution of RGQ values

prepagamprepagam Member
edited January 2018 in Ask the GATK team

I extracted the RGQ for a sample from 15.5 million non-variant sites of one chromosome and plotted the distribution (I am trying to work out if I can set some thresholds for RGQ for filtering).

I get a very strange distribution in the values. Specifically, certain values e.g. RGQ=3 is very common (60,929), whereas adjacent values RGQ=1,2,4,5 are very rare (<3,000). (see attached plot). This occurs throughout the distribution of values. It appears that it's values that are multiples of 3 that are common (3,6,9,12,15,18....). I just wondered if this was normal. I've plotted GQ before (which is different of course), but I don't see this pattern.

I'm using gatk 3.7, and the vcf was created using the invariant sites from this command

java -jar -Xmx16G ${GATK} \
-T GenotypeGVCFs \
-R ${REFERENCE} \
-allSites \
-stand_call_conf 0 \
-L ${Chrom} \
-V [10 gvcf files produced from haplotype caller -ERC BP_RESOLUTION and emit all sites]\
-o ${Outdir}/${Abb}'jointcalled'${Chrom}'.vcf.gz'

Best Answer

Answers

Sign In or Register to comment.