Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

fraction_contamination and unrelated status

darbrobdarbrob Member

Hello,

I recently used ASCAT to calculate percent contamination in a tumor specimen. I used array data to do this. I was also planning on checking this with contEst if I can get my SNP array calls into VCF format (Illumina HumanCoreExome array). That last big aside, I was wondering whether the --fraction_contamination input for MuTect was the appropriate place to input my percent contamination from ASCAT. I am confused by the documentation that states this input is for an "estimate of fraction (0-1) of physical contamination with other unrelated samples". The contamination I have is not from an unrelated sample it is basically the "normal" tissue contaminating the "tumor" tissue.

Is --fraction_contamination the appropriate place to input this data or is there another way to better incorporate this information in MuTect calling?

Thank you very much for your consideration.

Best,
Ben

Best Answer

Answers

  • umarumar Member

    My understanding is that the -- fraction_contamination is what it says. The second issue becomes when the normal tissue becomes contaminated with tumor and there is an option for that as well. The Tumor tissue always has normal tissue in it like connective tissue tumor has infiltrated. That will affect the sensitivity and to over come we have to sequence deeper.

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    Actually, --fraction_contamination is the estimated level of contamination from a different individual, not contamination with normal tissue from the same patient. This allows MuTect to call mutations in samples which have been contaminated with foreign (but human) DNA without having a huge false positive problem due to interpreting private, germline snps in the contaminating individual as somatic variants.

    The recommended approach to setting this flag is to use a method like ContEst (See https://confluence.broadinstitute.org/display/CGATools/ContEst) to first estimate the level of contamination in your data, and use that estimate to set this parameter.

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    To answer the other part of your question -- we don't currently incorporate estimates of tumor purity into the MuTect, although it's something we are looking into to see if could help improve performance

  • darbrobdarbrob Member

    Thank you for the reply. I really appreciate it. If there any comment you can make about implementing MuTect on tumor samples that may have a high-ish level of contamination with normal tissue (such as 10-30% as estimated by ASCAT)?

  • darbrobdarbrob Member

    Great! Thanks for your help! Sounds like a plan.

  • UltimaSeqUltimaSeq Member

    Hi,
    does the parameter "--minimum_mutation_allele_fraction" allow to tune the tumor sample purity (vs contamination by normal)?
    If it does, which is the default value?
    And what about default value of "--minimum_normal_allele_fraction" parameter?
    Is it right to think to turn down the last one in order to achieve higher sensitivity in the tumor calls in case of "tumor-contaminated-normal-sample"?

    Thanks in advance

  • Wondering the same thing. What's the best way to tune mutect for tumor samples with a "tumor-contaminated-normal-sample"?

  • To clarify, we have samples with contamination of the tumor in the normal sample. The tumor sample is quite pure, so no issues there.

    The effect of this is now that only low frequency variants get passed, since there will have a low frequency in the normal as well. All variants with a high enough frequency to be relevant drivers are filtered out since they will look like germline heterozygotes in the filtering...

    @kcibul, what's the best way to tune mutest for this scenario?

    cheers

  • @kcibul, wondering the same as dklevebring.
    As I understand it from the paper, the log-odds model presented is only based on sequence data from the tumor sample. Then the normal is evaluated in the HC (high-confidence) set of filters. More specifically, this was presented as "Observed in Control". Would it be possible to set this parameter for the filtering or how would you go about if you have an AML-case with 15% contaminating tumor reads in the "normal" DNA?

  • HasaniHasani GermanyMember

    @kcibul said:
    Actually, --fraction_contamination is the estimated level of contamination from a different individual, not contamination with normal tissue from the same patient. This allows MuTect to call mutations in samples which have been contaminated with foreign (but human) DNA without having a huge false positive problem due to interpreting private, germline snps in the contaminating individual as somatic variants.

    Can MuTect detect mutations in RNA-Seq data?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    In principle, it should work, but we have not validated this, and MuTect does not currently support reads with Ns in their CIGAR string. You'll need to process the data through the GATK pre-processing steps as detailed in the GATK Best Practices for variant discovery in RNAseq. But again, I can't give you any guarantee that MuTect will do the right thing since it was not designed to do this.

  • afaddaafadda Member

    hi, regarding contamination scores. why would it be variant specific? if it means contamination with other samples then shouldn't it be the same for all the variants in that sample? how is it calculated? i'm missing something....

Sign In or Register to comment.