GATK4 - CreateSomaticPanelOfNormals. What's under the hood ?

tonytony Member
edited January 28 in Ask the GATK team

Hi,

I am using GATK4 (4.0.12) to create a panel of normals to be used with Mutect2.

As suggested, I have first created a VCF for each normal BAM that I have with Mutect2 in tumor-only mode. Then I use CreateSomaticPanelOfNormals to aggregate all these and get a PON. The only parameter you can play with at this step is --min-sample-count.

But when I look at one of the single normal sample VCFs I see a lot of positions with a very low read depth support (a lot have DP=1 actually). Here follows a example :

chr1    866940  .       G       A       .       .       DP=134;ECNT=2;MBQ=10,33;MFRL=218,201;MMQ=35,49;MPOS=29;POPAF=0.220;TLOD=371.15  GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:2,115:0.991:117:1,54:1,61:0.980,0.980,0.983:0.024,0.029,0.946
chr1    872261  .       C       A       .       .       DP=1;ECNT=1;MBQ=0,34;MFRL=0,209;MMQ=0,60;MPOS=8;POPAF=5.40;TLOD=3.58    GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:0,1:0.667:1:0,1:0,0:0.990,0.990,1.00:0.025,0.028,0.947
chr1    890446  .       C       CA      .       .       DP=2;ECNT=1;MBQ=0,31;MFRL=0,144;MMQ=0,60;MPOS=24;POPAF=0.243;RPA=11,12;RU=A;STR;TLOD=4.88       GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:0,2:0.750:2:0,0:0,2:0.990,0.990,1.00:0.027,0.027,0.947
chr1    893093  .       T       A       .       .       DP=1;ECNT=1;MBQ=0,37;MFRL=0,178;MMQ=0,60;MPOS=39;POPAF=5.40;TLOD=3.88   GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:0,1:0.667:1:0,0:0,1:0.990,0.990,1.00:0.025,0.028,0.947
chr1    900096  .       G       A       .       .       DP=1;ECNT=1;MBQ=0,31;MFRL=0,156;MMQ=0,60;MPOS=34;POPAF=5.40;TLOD=3.28   GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:0,1:0.667:1:0,0:0,1:0.990,0.990,1.00:0.028,0.025,0.947
chr1    916377  .       A       G       .       .       DP=2;ECNT=1;MBQ=0,28;MFRL=0,232;MMQ=0,60;MPOS=38;POPAF=3.926e-03;TLOD=5.98      GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:0,2:0.750:2:0,1:0,1:0.990,0.990,1.00:0.029,0.025,0.946
chr1    934937  .       G       A       .       .       DP=1;ECNT=1;MBQ=0,34;MFRL=0,153;MMQ=0,60;MPOS=11;POPAF=0.385;TLOD=3.58  GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:0,1:0.667:1:0,1:0,0:0.990,0.990,1.00:0.028,0.025,0.947
chr1    951408  .       G       A       .       .       DP=5;ECNT=2;MBQ=0,37;MFRL=0,264;MMQ=0,60;MPOS=32;POPAF=0.102;TLOD=20.31 GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:0,5:0.857:5:0,2:0,3:0.990,0.990,1.00:0.025,0.030,0.945

Basically, I was wondering how are handled these positions with DP=1 (or say DP<10) in the merge process ?

  • Are they removed/filtered out ?
  • If two distinct normal samples share a same position, both with DP=1, will we see it in the PON ? Another way to say it is, do you consider all these positions in the global count regardless of DP, TLOD, ... fields ?

Best,
Anthony

Post edited by tony on

Best Answer

Answers

  • tonytony Member

    Hi @bshifaw

    Thank you for your answer. This is clear.

    Would the the dev team recommend to keep the low depth sites (For instance to remove maximum mapping artefacts, etc ...) ?

    Anthony

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    @tony
    Here is their reply

    It depends on what a user wants to prioritize their pipeline for. Keeping the low depth sites will, as noted, remove maximum mapping artifacts, and would be best for precision. On the flip side, this will potentially lead to filtering true somatic variants, and reduce sensitivity. Finding the correct balance of these depends on the particular use case, so we encourage a user to try different options to see what works best for their situation. The tool is built in a pretty heuristic fashion (and is still in beta) so we don't have a strong recommendation one way of the other.

  • tonytony Member

    Many thanks @bshifaw
    I will try different configurations.

Sign In or Register to comment.