Too high QUAL scores in Haplotypecaller gvcf

redzengenoistredzengenoist Member
edited August 2014 in Ask the GATK team

Hello @Geraldine,

I'm getting some too-high QUAL scores in my VCF, the whole file is full of weird scores in the tens of thousands:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chrM 73 . G A 27120.46 . BaseQRankSum=-8.22
0e-01;ClippingRankSum=1.02;DP=1063;MLEAC=6;MLEAF=0.600;MQ=59.91;MQ0=0;MQRankSum=0.
120;QD=32.10;ReadPosRankSum=-3.840e-01 GT:AD:DP:GQ:PL 1/1:1,332:333:99:10644
,962,0 1/1:1,315:316:99:10159,913,0 0/0:118,0:118:99:0,120,1800 1/1:0,19
6:196:99:6366,586,0 0/0:95,0:95:99:0,120,1800
chrM 146 rs72619361 T C 16226.15 . DB;DP=1315
;MLEAC=2;MLEAF=0.200;MQ=57.18;MQ0=0;QD=34.24 GT:AD:DP:GQ:PL 0/0:338,0:338:99:0,1
20,1800 0/0:324,0:324:99:0,120,1800 0/0:118,0:118:99:0,120,1800 0/0:17
5,0:175:99:0,120,1800 1/1:0,359:359:99:16268,1102,0
chrM 150 . T C 52405.15 . BaseQRankSum=0.310;ClippingRankSum=0.695;DP=1523;MLEAC=8;MLEAF=0.800;MQ=60.00;MQ0=0;MQRankSum=-4.940e-01;QD=30.63;ReadPosRankSum=-1.364e+00 GT:AD:DP:GQ:PL 1/1:0,421:421:99:14544,1264,0 1/1:0,415:415:99:14206,1244,0 0/0:118,0:118:99:0,120,1800 1/1:1,206:207:99:7027,586,0 1/1:0,356:356:99:16670,1132,0
chrM 152 rs117135796 T C 16299.15 . DB;DP=1495;MLEAC=2;MLEAF=0.200;MQ=57.18;MQ0=0;QD=29.09 GT:AD:DP:GQ:PL 0/0:409,0:409:99:0,120,1800 0/0:411,0:411:99:0,120,1800 0/0:118,0:118:99:0,120,1800 0/0:204,0:204:99:0,120,1800 1/1:0,352:352:99:16341,1102,0
chrM 194 . C T 12039.15 . BaseQRankSum=-1.325e+00;ClippingRankSum=-4.920e-01;DP=1754;MLEAC=2;MLEAF=0.200;MQ=60.00;MQ0=0;MQRankSum=-1.597e+00;QD=32.89;ReadPosRankSum=1.11 GT:AD:DP:GQ:PL 0/0:409,0:409:99:0,120,1800 0/0:411,0:411:99:0,120,1800 1/1:6,360:366:99:12081,883,0 0/0:204,0:204:99:0,120,1800 0/0:361,0:361:99:0,120,1800

It's just a 4x WGS file, nothing fancy.

Any idea of why this might be?

Thanks,

Tagged:

Answers

  • KurtKurt Member ✭✭✭

    Because you are looking at MT snps (MT coverage is going to be much higher than the rest of the genome...yours > 1000)

  • redzengenoistredzengenoist Member
    edited August 2014

    @Kurt,

    That's right, the hg19 genomes have chrM in the front. My Chr1's with qual's int eh 100's look pretty normal, do you agree?

    chr1 779322 rs4040617 A G 227.59 . BaseQRankSum=0.103;ClippingRankSum=1.27;DB;DP=39;MLEAC=2;MLEAF=0.200;MQ=60.00;MQ0=0;MQRankSum=0.103;QD=14.22;ReadPosRankSum=0.720 GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,102 0/0:8,0:8:21:0,21,301 0/1:2,4:6:51:120,0,51 0/1:5,5:10:99:143,0,147 0/0:10,0:10:27:0,27,347
    chr1 780027 rs2977613 G T 321.28 . DB;DP=13;MLEAC=6;MLEAF=1.00;MQ=60.00;MQ0=0;QD=32.13 GT:AD:DP:GQ:PL 1/1:0,2:2:6:71,6,0 1/1:0,3:3:9:101,9,0 ./.:3,0:3 1/1:0,5:5:15:174,15,0 ./.:0,0:0
    chr1 780785 rs2977612 T A 417.07 . DB;DP=23;MLEAC=7;MLEAF=0.700;MQ=56.46;MQ0=0;QD=32.08 GT:AD:DP:GQ:PL 1/1:0,2:2:6:71,6,0 1/1:0,8:8:24:277,24,0 0/0:5,0:5:6:0,6,90 0/0:5,0:5:0:0,0,104 1/1:0,3:3:9:106,9,0
    chr1 783071 rs142849724 C T 70.44 . DB;DP=24;MLEAC=2;MLEAF=0.200;MQ=60.00;MQ0=0;QD=23.48 GT:AD:DP:GQ:PL 0/0:2,0:2:6:0,6,47 1/1:0,3:3:9:108,9,0 0/0:8,0:8:18:0,18,270 0/0:7,0:7:21:0,21,242 0/0:3,0:3:9:0,9,82
    chr1 783304 rs2980295 T C 790.04 . DB;DP=24;MLEAC=10;MLEAF=1.00;MQ=38.14;MQ0=0;QD=32.92 GT:AD:DP:GQ:PL 1/1:0,6:6:18:204,18,0 1/1:0,2:2:6:71,6,0 1/1:0,7:7:21:231,21,0 1/1:0,3:3:9:102,9,0 1/1:0,6:6:18:207,18,0
    chr1 785050 rs2905062 G A 491.69 . BaseQRankSum=-1.380e+00;ClippingRankSum=0.00;DB;DP=23;MLEAC=8;MLEAF=0.800;MQ=60.00;MQ0=0;MQRankSum=-7.200e-01;QD=27.32;ReadPosRankSum=1.38 GT:AD:DP:GQ:PL 1/1:0,3:3:9:106,9,0 1/1:0,5:5:15:170,15,0 0/1:2,4:6:54:118,0,54 0/0:5,0:5:0:0,0,30 1/1:0,4:4:12:132,12,0
    chr1 785989 rs2980300 T C 664.91 . BaseQRankSum=0.731;ClippingRankSum=-7.310e-01;DB;DP=22;MLEAC=8;MLEAF=0.800;MQ=60.00;MQ0=0;MQRankSum=0.731;QD=31.66;ReadPosRankSum=0.731 GT:AD:DP:GQ:PL 1/1:0,3:3:9:110,9,0 1/1:0,7:7:21:248,21,0 0/1:1,4:5:20:127,0,20 0/0:1,0:1:0:0,0,4 1/1:0,6:6:18:212,18,0

    I ask because I'm doing some detective work on this dataset - they seem to be having a false positive bias in the whole SNP set, and I'm trying to pinpoint what might be causing it.

    EDIT: I should qualify that the bias is post-filtration. The actual file looks like this, and I apply the usual best-practices filter:

    chr1 58771 rs140128481 T C 105.47 PASS BaseQRankSum=0.358;ClippingRankSum=0.358;DB;DP=17;MLEAC=3;MLEAF=0.375;MQ=44.36;MQ0=0;MQRankSum=0.358;QD=10.55;ReadPosRankSum=0.358 GT:AD:DP:GQ:PL ./.:0,0:0 0/1:3,2:5:49:49,0,83 0/0:2,0:2:6:0,6,65 0/0:5,0:5:0:0,0,104 0/1:2,3:5:55:86,0,55
    chr1 59193 . T G 76.48 PASS BaseQRankSum=0.736;ClippingRankSum=-7.360e-01;DP=14;MLEAC=3;MLEAF=0.375;MQ=34.42;MQ0=0;MQRankSum=0.736;QD=12.75;ReadPosRankSum=0.736 GT:AD:DP:GQ:PL ./.:0,0:0 0/0:6,0:6:0:0,0,138 0/1:1,2:3:21:56,0,21 0/1:1,2:3:27:50,0,27 0/0:2,0:2:6:0,6,69
    chr1 59941 . A G 39.17 PASS BaseQRankSum=0.937;ClippingRankSum=1.30;DP=12;MLEAC=1;MLEAF=0.125;MQ=25.30;MQ0=0;MQRankSum=-5.500e-01;QD=5.60;ReadPosRankSum=0.937 GT:AD:DP:GQ:PL ./.:0,0:0 0/1:4,3:7:68:68,0,100 0/0:1,0:1:3:0,3,30 0/0:2,0:2:6:0,6,65 0/0:2,0:2:6:0,6,68
    chr1 59981 . G A 96.94 PASS BaseQRankSum=0.736;ClippingRankSum=-7.360e-01;DP=11;MLEAC=3;MLEAF=0.500;MQ=25.00;MQ0=0;MQRankSum=0.358;QD=12.12;ReadPosRankSum=0.736 GT:AD:DP:GQ:PL ./.:0,0:0 0/1:1,2:3:21:50,0,21 0/0:3,0:3:0:0,0,35 0/1:2,3:5:44:74,0,44 ./.:0,0:0

    Post edited by redzengenoist on
  • KurtKurt Member ✭✭✭

    Well, I m certainly going to defer on this one (definitely not the best person to ask), but 4x coverage times 5 samples given the allele freqs, having QUALs in the 100s seem reasonable to me.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yeah, looks reasonable to me. You can try plotting the distributions of the various annotations, see if anything looks suspicious. If this data went through VQSR, see if the recalibration plots look reasonable as well.

  • redzengenoistredzengenoist Member
    edited August 2014

    @Geraldine_VdAuwera said:

    Is it possible that this VariantFiltration warning flag might be a utile hint? I don't think I have any FS annotation anywhere in the set. VariantAnnotator doesn't seem to know what the FS field is, though it's recommended best practices.

    WARN 05:50:43,782 Interpreter - ![12,14]: 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || HaplotypeScore > 13.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0;' undefined variable FS

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Do you mean VA is erroring out when you try to get it to annotate FS? If so that's because you have to give it the full name, ie FisherStrand.

Sign In or Register to comment.