We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

VariantFiltration

brambram BelgiumMember

Hi

I am dealing with a tetraploid species and have two RNA-seq libraries for two genotypes. I followed the RNAseq Best Practices for GATK v3 and I am currently running GATK v3.3.0. I performed a joint analysis using HaplotypeCaller with a ploidy level set to 4. Everything went well up to the final VariantFiltration step. When filtering my variants with the suggested hardcoded filters -filterName FS -filter 'FS > 30.0' -filterName QD -filter 'QD < 2.0' -window 35 -cluster 3, I saw some strange results in my vcf file. Hits with correct annotation values seem to be filtered for some reason. A few examples:

1) R1 567 . A T 56.41 FS AC=3;AF=0.188;AN=16;BaseQRankSum=-1.325;ClippingRankSum=-0.179;DP=29;FS=2.681;MLEAC=2;MLEAF=0.125;MQ=40.00;MQ0=0;MQRankSum=0.680;QD=4.03;ReadPosRankSum=-0.752;SOR=0.939 GT:AD:DP:GQ:PL 0/0/0/1:8,1:9:11:13,0,11,33,344 0/0/0/0:6,0:6:9:0,9,21,42,295 0/0/1/1:3,2:5:1:75,1,0,6,106 0/0/0/0:9,0:9:11:0,11,27,54,405

FS is 2.681 but gets filtered. QD is ok. The variant also not falls within a window of 35nt.

2) R2 663 . G C 3104.31 QD AC=16;AF=1.00;AN=16;DP=115;FS=0.000;MLEAC=16;MLEAF=1.00;MQ=40.00;MQ0=0;QD=26.99;SOR=1.377 GT:AD:DP:GQ:PL 1/1/1/1:0,20:20:25:556,120,60,25,0 1/1/1/1:0,27:27:34:736,162,81,34,0 1/1/1/1:0,25:25:31:694,150,75,31,0 1/1/1/1:0,43:43:53:1143,256,129,53,0

QD=26.99 but gets filtered. FS and window are however ok.

3) I also see this behavior for the labels 'FS,QD' and 'QD,FS' in the FILTER column.
4) In some cases I also see the filter label 'snpCluster' combined with the labels 'QD' or 'FS. However in some cases, variants in a window of 35 are not called a snpCluster. What could be the reason behind this observation? Also in these cases we see filtering for QD > 2.0.

R3 1092 . T C 335.73 QD AC=7;AF=0.438;AN=16;BaseQRankSum=-0.741;ClippingRankSum=0.212;DP=16;FS=0.000;MLEAC=8;MLEAF=0.500;MQ=40.00;MQ0=0;MQRankSum=-1.694;QD=22.38;ReadPosRankSum=-2.435;SOR=0.941 GT:AD:DP:GQ:PL 0/0/1/1:2,2:4:2:78,2,0,2,78 0/0/1/1:2,3:5:1:120,6,0,1,75 0/1/1/1:2,4:6:1:163,10,1,0,73 0/0/0/0:1,0:1:1:0,1,3,6,45
R3 1095 . T A 335.73 QD AC=7;AF=0.438;AN=16;BaseQRankSum=-0.635;ClippingRankSum=-0.318;DP=16;FS=0.000;MLEAC=8;MLEAF=0.500;MQ=40.00;MQ0=0;MQRankSum=-0.423;QD=22.38;ReadPosRankSum=-2.435;SOR=0.941 GT:AD:DP:GQ:PL 0/0/1/1:2,2:4:2:78,2,0,2,78 0/0/1/1:2,3:5:1:120,6,0,1,75 0/1/1/1:2,4:6:1:163,10,1,0,73 0/0/0/0:1,0:1:1:0,1,3,6,45
R3 1097 . A G,C 505.11 QD AC=7,3;AF=0.438,0.188;AN=16;BaseQRankSum=-0.404;ClippingRankSum=0.404;DP=16;FS=0.000;MLEAC=8,5;MLEAF=0.500,0.313;MQ=40.00;MQ0=0;MQRankSum=0.807;QD=33.67;ReadPosRankSum=0.135;SOR=1.981 GT:AD:DP:GQ:PL 1/1/2/2:0,2,2:4:2:168,90,84,80,78,90,12,6,2,84,6,0,80,2,78 0/0/1/1:2,3,0:5:1:120,6,0,1,75,122,9,6,79,126,15,84,132,93,210 1/1/1/2:0,4,2:6:1:253,97,85,78,73,175,19,7,0,169,13,1,165,10,163 0/0/0/0:1,0,0:1:1:0,1,3,6,45,1,3,6,45,3,6,45,6,45,45

Let me first ask if the recommended hard coded window, cluster, FS and QD filters still hold in a case of polyploidy with multiple samples?
Second, can you comment on the provided examples?

Thanks for your answer!

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @bram, those recommendations haven't been tested on non-diploid samples, so I would definitely encourage you to experiment with the values.

    If I recall correctly, when a variant fails multiple filters, the one that gets annotated in the filter field is the one where the value is most distant from the threshold. This is an admittedly clumsy way to estimate which is the worst dimension (clumsy because the scale is not taken into account).

Sign In or Register to comment.