MuTect in High Confidence (HC) mode

Hi,
In the MuTect Nat Biotech paper the authors discuss the STD, HC and HC+PON filtering modes, however in the online manual (http://www.broadinstitute.org/cancer/cga/mutect_run) there is no indication of how to use the 3 modes, I was wondering if it was possible to clarify this point. Now I am using the following command:
java -jar muTect-1.1.4.jar --analysis_type MuTect --reference_sequence ref.fa --cosmic cosmic.vcf --dbsnp dbsnp.vcf--input_file:normal normal.bam --input_file:tumor tumor.bam --out tumor.mutect
thank you,
Andrea

Answers

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev

    Hi -- thanks for your question, I'll try to clarify how to run in the three different modes.

    STD: in standard mode, we simply must disable all the filters. Although this can be done through resetting all the thresholds, the easiest way is to pass the flag "--artifact_detection_mode" which disables all the downstream filters. Also, do not specify a panel of normals via the "--normal_panel" option.

    HC: basically the invocation you have provide is the HC configuration (standard plus filters)

    HC+PON: in addition to the HC setting, supply a VCF containing the sites identified as either being germline or noise using the "--normal_panel" option. These sites will be flagged in the output. Of course, you could also do this as a post processing step, removing HC candidates observed in the panel of normal samples

    Hope that helps!

  • yl3yl3 Member

    Hello, I am running muTect-1.1.4 as well. I ran MuTect with default parameters on a tumor-normal pair and it gave me thousands of mutations flagged as "KEEP". However, many of these mutations have very low frequency and especially the low frequency putative mutations have strong strand bias. I could not figure out what are the default parameters used in MuTect. I was wondering if HC settings are on by default, and how can I remove the low frequency variants that show very high strand bias (for example 7+, 0-) visually obvious in IGV?

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev

    Hi

    HC mode is enabled by default. I'm curious about your example, while the stand bias of the alternate allele is strong (7,0) what is the strand bias of the reference allele? It's possible that the test is not powered given your data (for example, if your reference allele strand distribution is also 7+,0-). If that's the case, that's where the panel of normals can be very powerful

  • Hi,
    Is there any other way to filter the mutect output rather than the flags of KEEP and COVERED? I am using them for my normal/tumor pair and I found quite a handful number of variants, over 1800 that are NOVEL, they are not annotated as of now, these are exome data. I am just a bit curious about which kind of filtering can I use now on this. Can anyone tell me about the p-value threshold I can consider or for the log likelihood value threshold I should look for. Also if I want to understand which are the most potential high confidence variants that are damaging , is there any score on the output of mutect which will allow me to categorize? I am using the mutect for the first time with GATK

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @vivekdas_1987‌

    There are two kinds of filtering that you can apply at this point: one is filtering to distinguish real variants from artifacts, and the other is filtering to distinguish mutations that are likely to cause deleterious effects from mutations that have little or no effect. For the first type, you can use the data output by MuTect, or look into the filtering methods provided by GATK. For the second type, you have to do some functional annotation and analysis, with a program like Oncotator, for example.

  • @Geraldine_VdAuwera , Thank you very much. I appreciate your quick reply. Yes I will do both. So other than functional annotation there is no other filtering right if am considering the mutect output which is only COVERED , NOVEL and having the judgement KEEP. They should be considered as the most high confidence ones right. Then for downstream functional effect can look for Oncotator. So the other output scores like LOD and others are usually not in much effective use for further filtering of Mutect Output apart from variants having flags like COVERED,NOVEL and KEEP.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    That's right to some extent -- the KEEP judgment is the result of applying various internal filters, so those variants should be fairly high-confidence already.

  • @Geraldine_VdAuwera,
    Thanks a lot for the reply. In any case I would like to ask something which might not be totally relevant but just in case if you have seen such scenarios earlier. While trying to understand the statistics of my reads in my samples I found using GATK that my sample reads were 70% around the exonic region which sounds fair enough. While fishing out the most probable high confidence somatic variations using GATK(alone) , GATK+Mutect and VarScan , I get fair amount of high confident variations. But when I annotate these with onctotator, annovar or snpEff I find only 30% mutations on the exons and rest are on the introns,integenic and other regions. Is this a likely scenario? Is it because of the fact the target regions(target bed file) which is used for target enrichment does not entirely fall only on the exons and also span on introns and other regions to some extent and so the reads on those regions fairly get more mutations and that is the reason I get annotations mostly non-exonic? I can understand that somatic exonic variants wont be too high in number but out of 1500 variants i shoul expect more than 50% should be exonic right(nonsynonymous + synonymous). Can you share some light on this matter? Is it a short coming of the callers or is this a general trend in exome data?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @vivekdas_1987

    From a technical standpoint there is no obvious reason why the caller would produce a bias toward calls in non-exonic regions. As to whether this is an expected biological finding, that is outside the scope of support we can provide. I would recommend examining the literature and discussing this on a more general forum (like SeqAnswers) or with colleagues.

  • Hello I have a question related to this thread. If I want to get all variants in my output (standard mode) but be able to filter variants based on HC filters post-hoc, is there a way to do that? In other words, does MuTect in HC mode emit the variants caught by the HC filters? Thanks, Stathis

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Stathis @‌ekanterakis,

    The program only emits "good" variants in the VCF file, but the callstats file contains all variants, and specifies which are rejected and why.

  • apallav2apallav2 Member
    edited January 2015

    Interesting ... I still have one question (sorry)- Can I pass Pool of Normal BAM files as --input_file:normal argument? either as multiple --input_file:normal arg or --input_file:normal <.list of Normal Bams>?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @apallav2 No, at this time MuTect can only take Panel Of Normals data as a VCF of called variants.

Sign In or Register to comment.