We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Recommended Syntax for unmatched tumor

Hi, I was wondering what the recommended syntax is for standard Exome and WGS data? I saw a lot of options but wasn't sure what the general use case was (and what to fiddle with). Sorry if I missed an obvious document.

More specifically, what strategy would you use for calling unmatched tumors?

Best Answer


  • vyellapavyellapa Member

    Thank you Kristian,
    So can I use as a "contemporary" normal, a normal bam from a different individual?
    If my samples are cell-lines, can I use a patient normal sample as "contemporary normal" ?


  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    if you have a patient-matched normal (be it cell line or not) that would be the best thing to use. Absent that, using any normal sequenced with the same technology (e.g. illumina) and sample prep (e.g. exome capture) would be better than nothing. The main risk you run there is that you throw something out that is somatic in your tumor (truely) because it's at the site of a germline variant in your "un-matched" normal.

  • vyellapavyellapa Member

    For the risk you mentioned, if normal from the same disease type is used, would the somatic mutations thrown out "rarely" be drivers?
    Or is there a possibility that a particular mutation is a driver in one sample while its a germ-line in another?

  • vyellapavyellapa Member
    edited March 2013

    My other question is if there is low allele frequency (say 10%) of the variant allele potentially due to CNA's or tumor heterogeneity, would Mutect be able to detect such loci? Are there any settings I can play with in this regard to get better results? Basically, I am working with non-solid tumor cell-lines which have high tumor purity. When playing with the mentioned settings, I would like to assume little heterogeneity as the passages in the cell lines could have only kept the most selectively advantageous clone. I have CNA information using aCGH and would also like to incorporate this information while making calls if possible.

    Thank you,

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    It all comes back to allele fraction -- the fraction of dna which harbors the mutation -- and the depth of sequencing. In the publication we explore this relationship and you can see that we are very sensitive for events such as you describe.

  • jblachlyjblachly Member

    When we attempt to call mutations without matched normals, does it make sense that we would want to use the LATEST version of COSMIC and an EARLIER version of dbSNP (say, 129 or 132) , in order to reduce the chances that a recurrent somatic mutation gets flagged as "dbSNP"?

    I ask, because without matched normal, my first filtering strategy is to eliminate anything present in dbSNP (unless in COSMIC, in which case keep).

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    It's a tricky problem -- calling unmatched tumors. If your tumors are impure (e.g. stromally contaminated) you could think about using the allele fraction of the event to classify events as germline events will be at 50/50 and somatic events will be off that ratio. However, once you add in copy number events to the picture it becomes unclear again (e.g. a germline event in a region with a duplication could look to have f=0.3).

    From an annotation approach, I would argue to use the best available databases. The latest dbSNP and COSMIC. By using an earlier version just to gain a smaller filter, you're also eliminating the newest variants, which may be very high quality. If anything, perhaps you could filter by dbSNP with a certain frequency in the population.

    Hope that helps

Sign In or Register to comment.