Is it advisable to use multiple different types of cancer subtypes for joint variant calling?

smk_84smk_84 Member
edited November 2018 in Ask the GATK team


I am running GATK on multiple breast cancer cell line subtypes. I was wondering if it would be advisable to call the variants on them together as well as doing their joint genotyping. If not what is the best practice of using genotyping on individual lines.


Best Answer


  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Hello @smk_84

    The recommendations for joint variant calling on somatic variants can be found at this link.

    The variant discovery is done using individual cell lines, but the genotyping is done on cohorts. Please read through the document and let us know if you have follow-up questions.

  • smk_84smk_84 Member

    Thank you for your answer. I don't have cohort data.

    I was wondering gouw would one do genotyping of a single sample/single cell line?


  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @smk_84 I am not sure it is possible to run a genotype analysis on a single sample, because the genotyping relies on a population wide analysis.

    Here is another article and a related blog post that might provide some more considerations.

    It may be better to analyze your variants, and instead of genotyping, perform a clustering analysis to determine which variants are shared among cell subtypes.

    @shlee do you have any additional thoughts?

  • smk_84smk_84 Member

    Thank you @shlee for your answer that was very helpful.

    So basically I have 3 cell lines of breast cancer namely MDAMB436,SKBR3 and ZR751 and I want to find out the somatic mutations in them by comparing them with HMEC (Primary Mammary Epithelial Cells) as control. Is it possible to get the somatic mutations from such an approach? and If there is then what are possible ways to reduce the number of false-positives?


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @smk_84

    Yes you can call somatic mutations. You could use the Mutect2's Tumor with matched normal approach. You can find all the recommended steps regarding contamination and filtering processes in this document. We also provide a wdl that you can use to launch Mutect2.

    Hope this helps.


  • smk_84smk_84 Member

    @shlee Do you have anything to add. Thanks!

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @smk_84,

    You have three different cell lines with what I think are different origins. Your control, HMEC (Primary Mammary Epithelial Cells) is batch specific. In this case, you do not have a germline matched control for each cell line and therefore your choices are limited in terms of calling somatic mutations.

    You can call somatic CNVs given a CNV panel of normals. Please see and for a ModelSegments CNV tutorial (in two parts). You should be aware that the workflow will account for common germline CNVs (gCNVs) represented in the PoN but will likely call rare gCNVs specific to each sample and these will be indistinguishable from somatic CNV calls.

    As for calling short SNVs and indels with Mutect2, without a matched normal you will have many germline calls included in your callset even if you utilize a PoN and the gnomAD germline variants resource. Because of selective bottlenecks cell lines undergo, e.g. cloning, your somatic mutations will be hard to distinguish from germline variants. One approach that might help is to perform functional annotation with Funcotator on your Mutect2 callset and to then focus on those calls that have a significant impact, e.g. missense mutations. Funcotator is currently in beta and the workflow offers two pre-prepared human annotation resource bundles--one for hg19 (b37) and one for hg38. If you will use the former reference assembly, then Oncotator is in production status and accepts hg19/b37 data.

Sign In or Register to comment.