somatic variants

SystemSystem Administrator admin
This discussion was created from comments split from: what dose the AF value stands for in a combined vcf file?.


  • manbamanba Member ✭✭
    edited December 2018

    @bhanuGandham said:
    Hi @JinboWuGlasgow

    It is difficult to tell why this is without looking at the data. Would you please post example records of the variants. Thank you.

    hi @bhanuGandham , everytime see you photo, really very nice.
    I want to seek for help about interpret of gatk4 vcf especially when filtered by FilterMutectCalls, FilterByOrientationBias, PoN, there can be a lot of strings I do not clearly know. So I am eager to have a rfeference of somatic vcf interpret.

    like "clustered_events;germline_risk;t_lod", "clustered_events;germline_risk;read_position", " clustered_events;germline_risk".

    why there sometimes differernt string, and what it stands for, and why there is no value for it?

    and can you help me how the AF is calculated, I am not understand SkyWarrior's answer.
    for example
    "germline_risk;panel_of_normals DP=395;ECNT=1;IN_PON;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;TLOD=247.72 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:SA_MAP_AF:SA_POST_PROB 0/1:262,118:0.320:140,63:122,55:26:215,215:39:26:false:false:.:.:50.05:100.00:0.293,0.303,0.311:0.014,7.869e-03,0.979"
    thanks a lot

    AF here seems to be 118/(118+262), but you can see the following site, if calculate like this, the result is not

    chr1 11854457 . G A . germline_risk;panel_of_normals DP=827;ECNT=1;IN_PON;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;TLOD=3005.46 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:SA_MAP_AF:SA_POST_PROB 0/1:2,805:0.985:1,397:1,408:33:169,203:60:35:false:false:.:.:44.72:100.00:0.990,0.990,0.998:0.031,0.023,0.946
    chr1 11856378 . G A . germline_risk;panel_of_normals DP=934;ECNT=1;IN_PON;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;TLOD=1311.79 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:SA_MAP_AF:SA_POST_PROB 0/1:429,469:0.521:202,248:227,221:29:211,217:60:37:false:false:.:.:44.72:100.00:0.515,0.515,0.522:0.014,7.984e-03,0.978

  • manbamanba Member ✭✭
    edited December 2018

    sometimes I maybe confused about the word "filter", whether wipe out the site or just label the site?
    or the pon can only wiped out in paired mode

    and why base 'A' pruned, and T keeped, just because of less A reads, if so , how to define less and more, thanks a lot

    Post edited by manba on
  • manbamanba Member ✭✭
    edited December 2018

    and the workshop said skip pon, means wiped out, but why there some some sites wiped after pon, there are are some labeled with 'panel of normals', how comes ,and should I wiped site labled with 'panel of normals', I see somewhere 'cluster_events ' is a high possibility false

    can I use AF to calculate purity of sample

  • manbamanba Member ✭✭
    edited December 2018

    does this mean germline_risk line should all be wiped out, if so, all my vcf after FilterMutectCalls, CollectSequencingArtifactMetrics, FilterByOrientationBias ,the variant number is 0, really amazing.

  • manbamanba Member ✭✭
    edited December 2018

    in gatk4.0.0.0, you said manually filter. but in one pdf in your workshop you said

    “not somatic”, if grep -v "germline_risk", there is no site in my vcf, how should I deal this problem.

    I used CreateSomaticPanelOfNormals to create pon , I used gatk4.0.0.0, I am really afraid. thanks a lot

  • manbamanba Member ✭✭


    there is one parameter ,i made no change
    false Whether to call sites in the PoN even though they will ultimately be filtered.
    but finally, you know, it does not filter the pon sites, just add one label "panel of normals" in that line,

    heart broken. there should really be clear clafication of annotaion and filter, when talking about variant sites, easy to be misunderstood by me thanks a lot.

Sign In or Register to comment.