Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Mutect2: --force-active vs force calling --alleles

yingchen69yingchen69 nanjingMember

Hi,

I have a ctDNA (liquid biopsy) sample that vendor reported EGFR L858R at 0.9%. When we did variant calling using varscan 2.4.4, varscan reported EGFR L858R at 1.01% but varscan fpfilter labeled EGFR L858R as false positive that failed filters VarBaseQual,MMQSdiff.

When we did variant calling using gatk 4.1.3.0 Mutect2 in tumor only mode following best practice, Mutect2 did not detect EGFR L858R although we can see the variant reads with IGV. But when we ran Mutect2 using --force-active or force calling --alleles options, Mutect2 called EGFR L858R both at ~ 0.86% vaf. To our surprise, when we ran FilterMutectCalls on the two vcf files, EGFR L858R passed filtering in both cases. Since Mutect2 did not detect EGFR L858R without --force-active or force calling --alleles options, we thought EGFR L858R would fail in FilterMutectCalls step. How to make sense of these scenarios?

This also lead us to think how should we proceed to analyze clinical samples since we do not know what variants a clinical sample has before analysis.

Any suggestion?

Thanks a lot!

Ying

Answers

  • bshifawbshifaw Member, Broadie, Moderator admin

    Just want to confirm your sample ran through mutect2 in tumor only mode, and the output VCF didn't call EGFR L858R as a variant. Then this VCF was run through FilterMutectalls, which passed EGFR L858R. What do you mean FilterMutectalls passed EGFR L858R if mutect2 didn't call it as a variant?

    You may find the mutect pdf, it describes mutect filtering tools in detail.

  • yingchen69yingchen69 nanjingMember

    Hi,

    Actually I ran Mutect2 in three ways:

    1. Mutect2 tumor only mode following Best Practice
    2. Mutect2 tumor only mode with --force-active true
    3. Mutect2 tumor only mode with --alleles mysites.vcf

    run1 did not call EGFR L858R; run2 and run3 did call EGFR L858R and after FilterMutectalls L858R call from run2 and run3 all have filter value "PASS".

    I also tried HaplotyperCaller in the same fashion, and no L858R was detected in 3 runs. (Best Practice, --force-active true, GGA mode --alleles mysites.vcf). I also tried strelka, sentieon DNAscope, sentieon TNscope, none of them detected L858R.

    We specifically look at L858R because it was provided as a variant by the sequencing vendor. But when we tried to do our own variant calling we get inconsistent results in favor of not calling L858R.

    Best,

    Ying

  • bshifawbshifaw Member, Broadie, Moderator admin

    Got it, thanks.

    I spoke with my team and they asked for a snippet of the bam so that we can investigate further, to provide snippet please follow this article

  • yingchen69yingchen69 nanjingMember

    Hi,

    I have the bamout bam file. How can I upload it? I got the error saying the file type is not allowed.

    Thanks a lot for the help!

    Best,

    Ying

  • yingchen69yingchen69 nanjingMember

    OK, I found the instruction at https://software.broadinstitute.org/gatk/documentation/article?id=11021 and I sent the bam file (FIPM190005784.m2fa.bam) to ftp.broadinstitute.org. This is the --bamout bam file when I ran Mutect2 with --force-active true.

    Thanks,

    Ying

  • xiuczxiucz Member

    @yingchen69
    hi, can you post your total GGA command here, thanks a lot.

  • yingchen69yingchen69 nanjingMember

    @xlucz,

    The bam I sent is not from HaplotypeCaller. It's from mutect2 using --force-active true option together with intervals and padding.

    Best,

    Ying

  • xiuczxiucz Member

    @yingchen69

    GGA mode --alleles in Muect2 did not seem to work using following command when I tried to call every site in force-call-alleles.vcf :

    gatk Mutect2 \
       -R reference.fa \
       -I sample.bam \
       -alleles force-call-alleles.vcf
       -O single_sample.vcf.gz
    

    However, the output vcf seemed to be the callable sites in the sample.bam , not the special sites in force-call-alleles.vcf. How did you use this argument?

    Thank you.

  • bshifawbshifaw Member, Broadie, Moderator admin

    @yingchen69

    I see your bam file, please pass the bam index along too. Thanks

  • yingchen69yingchen69 nanjingMember

    @bshifaw

    The index file FIPM190005784.m2fa.bai is uploaded.

    Thanks,

    Ying

  • yingchen69yingchen69 nanjingMember

    @xiucz

    The only difference is that I used --alleles, not -alleles. It's supposed to call all callable sites, in addition to force genotyping alleles in force-call-alleles.vcf regardless of evidence.

    Best,

    Ying

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @xiucz The --alleles argument force calls alleles in addition to any variants that would be called. Furthermore, until recently you also had to specify -genotyping-mode GENOTYPE_GIVEN_ALLELES, although in recent release this extra argument is gone. Finally, if there is no coverage at all Mutect2 will not force call an allele.

  • bshifawbshifaw Member, Broadie, Moderator admin

    @yingchen69

    It could just be a variant that was on the edge of being called. We may be able to confirm this with the -filteringStats.tsv output of FilterMutectCalls and the VCF line of the variant in question.

  • yingchen69yingchen69 nanjingMember

    Hi @bshifaw,

    The VCF line of the variant (EGFR L858R) in question is:

    chr7    55259515    .   T   G   .   PASS    AC=1;AF=0.500;AN=2;CONTQ=93;ClippingRankSum=0.086;DP=2500;ECNT=2;FS=0.000;GERMQ=93;LikelihoodRankSum=-0.884;MBQ=20,20;MFRL=171,176;MMQ=60,60;MPOS=42;MQ=60.00;MQRankSum=0.000;POPAF=7.30;ROQ=71;ReadPosRankSum=0.352;SEQQ=38;SOR=0.665;STRANDQ=31;TLOD=6.42;UNIQ_ALT_READ_COUNT=27;ANN=G|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_005228.3|protein_coding|21/28|c.2573T>G|p.L858R|2819/5600|2573/3633|858/1210||   GT:AD:AF:DP:F1R2:F2R1:SB    0/1:2281,27:8.581e-03:2308:1074,11:1200,14:1085,1196,13,14
    

    And the filteringStats.tsv output of FilterMutectCalls is:

    #<METADATA>Ln prior of deletion of length 10=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 9=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 8=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 7=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 6=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 5=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 4=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 3=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 2=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 1=-9.918326902732767
    #<METADATA>Ln prior of SNV=-7.615741809738722
    #<METADATA>Ln prior of insertion of length 1=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 2=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 3=-10.611474083292713
    #<METADATA>Ln prior of insertion of length 4=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 5=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 6=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 7=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 8=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 9=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 10=-20.72326583694641
    #<METADATA>High-AF beta-binomial cluster=weight = 0.1154, alpha = 10.01, beta = 0.50
    #<METADATA>Background beta-binomial cluster=weight = 0.1923, alpha = 1.15, beta = 1.42
    #<METADATA>Binomial cluster 1=weight = 0.6286, mean = 0.990
    #<METADATA>Binomial cluster 1=weight = 0.2286, mean = 0.496
    #<METADATA>Binomial cluster 1=weight = 0.1143, mean = 0.012
    #<METADATA>threshold=0.082
    #<METADATA>fdr=0.001
    #<METADATA>sensitivity=0.918
    filter  FP  FDR FN  FNR
    weak_evidence   0.0 0.0 0.0 0.0
    strand_bias 0.0 0.0 0.91    0.04
    orientation 0.0 0.0 0.0 0.0
    slippage    0.0 0.0 0.0 0.0
    haplotype   0.0 0.0 0.0 0.0
    germline    0.03    0.0 1.04    0.04
    

    Thanks,

    Ying

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @yingchen69 The base qualities are on the low side -- MBQ=20 for both ref and alt reads. Since this implies an error rate of 1 in 100 reads, 27 out of 2308 reads is not distinguishable from sequencing error and Mutect2 is not detecting this variant as active.

    The reason FilterMutectCalls passes this variant is because the reasoning I gave you is too conservative -- the error rate for all substitutions may be 1 in 100, but the error rate for T -> G may be only 1 in 300. I think we should experiment with putting that factor of 1/3 in Mutect2's isActive function.

    Leaving that aside, I'm skeptical about these base qualities. Does this sequencing really have a substitution error every 100 bases? If the error rate is actually lower than this then BQSR can improve results if you're not already including it in your pipeline.

  • 29043594952904359495 Member

    here is another question, in his site, MBQ is 20,20.
    here is another site
    chr1 162740327 . T C . PASS DP=349;ECNT=1;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-8.602e+00;TLOD=1153.19 > GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:0,333:0.997:333:0,108:0,225:0,28:0,203:60:33:false:false:.:.:100.00:47.96:0:0.990,0.990,1.00:0.027,0.027,0.946

    its MBQ corrsponding to 0,28, so which is the value of MBQ, since the value is equal to -10*log(error rate), I guess it is 28, so what does 0 here stands for?

    thanks a lot

  • yingchen69yingchen69 nanjingMember

    @davidben ,

    Thanks a lot for the explanation. The original bam file had already been through BQSR step.

    @2904359495 ,

    0 stands for ref MBQ, which means ref error rate of 1 (100%) :(

    Best,

    Ying

  • 29043594952904359495 Member

    @yingchen69 ,thanks, so which value you think is MBQ value

  • yingchen69yingchen69 nanjingMember

    @2904359495 ,

    You bave MBQ as 0,28. 0 is for ref, and 28 is for alt.

    Best,

    Ying

  • 29043594952904359495 Member

    thanks a lot,@yingchen69

    chr1 162740327 . T C . PASS DP=349;ECNT=1;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-8.602e+00;TLOD=1153.19 > GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:0,333:0.997:333:0,108:0,225:0,28:0,203:60:33:false:false:.:.:100.00:47.96:0:0.990,0.990,1.00:0.027,0.027,0.946

    so here you can see the in field AD (0,333) , ref count is 0 and its MBQ is 0(100% error), I think maybe the particular situation is prescribed by the developer

Sign In or Register to comment.