Allele Depth = '0,0' and DP = '0' but the mutation was passed by filter

summer_lisummer_li ChinaMember
edited October 2016 in Ask the GATK team

There is a question if anyone can explain to me how this variant calling was made, and why it was passed by filter.
A variant information in filtered VCF is:
10 118265356 . G A 152.03 PASS AC=2;AF=1.000;AN=2;DP=0;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.000;MQ=0.00;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,0:0:12:180,12,0
My gatk version is v3.3.

  • micknudsenmicknudsen DenmarkMember

    I think that the answer to your question is in the last paragraph here:

  • summer_lisummer_li ChinaMember

    I have read the paragraph you mentioned before. According to it, AD values are 0 for both of the alleles because of “uninformative” reads, but they were counted in DP values. But in my case, DP values is 0. Are they the same case?
    Additionally, if no read support the mutation significantly, can you explain to me how does it passed the quality filters?

  • micknudsenmicknudsen DenmarkMember

    I completely missed that also DP=0 (even though it was in the title of your post). I have nu clue why it still calls a variant then...

  • valentinvalentin Cambridge, MAMember, Dev

    I guess you are using HaplotypeCaller... I think it would be useful to ask for the bam-out. This bam file would include the assembled haplotypes (as special reads) and the input reads realigned to their best (or one of their best) haplotype. The option is -bamout my-debugging.bam argument. You just need to do that region (say -L 10:118265256-118265456).

    One possibility here is that the supporting informative reads don't actually overlap the variant after realignment; I believe genotype annotations use only overlapping ones. Is there a variant nearby with the PLs?

    Now the question would be, how is that we are calling the variant if no "informative" read overlaps it?

    To answer that we would need to look carefully at the assembly... some times we may use reads in the assembly that are then later not considered in likelihood calculations because they don't pass some filters. These reads may induced a link between adjacent variants alleles so that reads informative on a adjacent variants become informative for the this variant yet without overlapping it and so the don't show in AD or DP.

  • valentinvalentin Cambridge, MAMember, Dev

    Also it might be helpful to updated to the latest release (3.6) or the nightly although it may not make a difference in this case.

  • summer_lisummer_li ChinaMember

    There is a adjacent variant exactly as follows:
    10 118265273 rs2420301 A G 1210.77 . AC=2;AF=1.000;AN=2;DB;DP=38;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.000;MQ=60.00;QD=31.86;SOR=1.493 GT:AD:DP:GQ:PL 1/1:0,38:38:99:1239,114,0

    Thanks a lot for you explaination.

