Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
MQRankSum and ReadPosRankSum for SNPs in a haploid organism?
Apologies if this has been addressed previously. I'm working with genomic resequencing data for a haploid organism, and I have created a VCF file using GenotypeGVCFs from 33 gVCFs created using HaplotypeCaller (using best practices). I did not set the ploidy option when using GenotypeGVCFs as directed. My final aim is to filter for a subset of high-quality SNPs for a downstream analysis.
As I understand it the parameters MQRankSum and ReadPosRankSum can only be calculated if there is an individual with a heterozygous genotype (ref and alt alleles) at that position. Around 15% of my SNPs have been scored for these parameters, can anyone explain what this means for a haploid? Are these sites good candidates to filter outright?
An example SNP is below:
chromosome_1 3316 . G A 492.42 . AC=3;AF=0.136;AN=22;BaseQRankSum=0.731;ClippingRankSum=1.70;DP=1488;FS=0.000;MLEAC=3;MLEAF=0.136;MQ=31.59;MQRankSum=-5.660e-01;QD=16.98;ReadPosRankSum=0.731;SOR=1.308 GT:AD:DP:GQ:PL 0:86,0:86:99:0,1800 0:89,0:89:99:0,1800 0:147,0:147:99:0,1800 0:51,5:56:99:0,1800 1:1,4:5:80:80,0 0:271,0:271:99:0,1800 1:0,8:8:99:247,0 0:21,1:22:99:0,814 .:0,0 1:5,11:16:99:211,0 0:140,72:212:99:0,1800 0:242,13:255:99:0,1800 0:252,0:252:99:0,1800 .:0,0 .:0,0 .:0,0 0:1,0:1:44:0,44 .:0,0 .:0,0 0:3,0:3:99:0,112 0:1,0:1:39:0,39 .:0,0 .:0,0 .:0,0 .:0,0 0:17,0:17:99:0,360 0:1,0:1:37:0,37 0:4,0:4:99:0,135 0:10,0:10:99:0,270 0:3,0:3:99:0,119 0:3,0:3:99:0,111 0:3,0:3:99:0,119 .:0,0