Implementation of GATK4 for variant calling in WES of human cancer samples without reference normals

mbgventermbgventer Member
edited April 2018 in Ask the GATK team

Dear GATK community,

i would like to ask a very specific question concerning the implementation of GATK toolkit for exome sequencing data. In detail, i have for 3 patients both whole exome sequencing data ( Genomic DNA captured using Agilent in-solution enrichment methodology/paired-end 75 bases massively parallel sequencing on Illumina HiSeq4000) from CTCs (circulating tumor cells) and also exome sequencing data from biopsies of the same patients. Moreover, because both biopsies and circulating tumor cells were isolated from the same timepoint of diagnosis-where the tumor has already spread due to its "specific nature", so it is not definately primary tumor in both. I have both FASTQ files and BAM files for each patient.

The main goal idea, is to identify if there are any "common mutational patterns" (ie.SNPs) between circulating tumor cells and biopsies, in the same patients, which would be very vital mainly for the validation of the CTC isolation protocol (as also for the crusial time of diagnosis of the specific cancer, relative biological mechanisms, etc). However, a major issue is that there is no reference normal tissue (that probably limits the identification of somatic variants), as also the small number of patients (6 cancer samples in total)-

but in your opinion, i could still implement GATK for germline/indel analysis, and try to focus on "rare" germline variants ? and perhaps any common types of these variants in specific genes that could be shared by both types of biological materials ? Any other ideas or suggestions would be grateful.

Please excuse me for any naive questions on this matter, as it is the first time to analyze WES data !!

Thank you in advance,


Best Answer


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Efstathios-Iason,

    There are a couple of ways you can do this. First, we have a somatic SNVs/Indels workflow.

    Second, you may find these articles helpful:

    I suspect the last article there will be most relevant :smile:


  • mbgventermbgventer Member
    edited April 2018

    Dear Sheila,

    thank you for your answer, and please excuse me for replying with delay, but unfortunately i did not received any notification about the post-based on the very important posts you have mentioned, i would like to ask you a very important question for implementing mutect2 :


    If i have understood correctly the methodology for creating the panel of normal samples with Mutect2:

    1) As i dont have reference normal samples, i could download some reference normal samples from any relevant database ? like the exac or the 1000 genomes, if it is feasible ?

    2) Then run for each normal sample, the pipeline above, using mutect2 in tumor mode, ending in a "universal Normal VCF", right ?

    3) But for the next step, how i will implement the actual cancer samples, in order to call somatic mutations ?

    Thank you for your consideration on this matter !!



  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Efstathios,

    1) Perhaps this article will help. Also, in the tutorial Iinked to, we use 40 samples from the 1000Genomes project. If you can find samples that were sequenced int he same way as your tumor samples, that will work.

    2) Yes, the tutorial will help with more details.

    3) I think the tutorial linked to here will help. Also, have a look at the hands on tutorial and powerpoint here.


  • mbgventermbgventer Member

    Dear Sheila,

    thank you one more time for your updated resourses and comments-i have already found the specific tutorial, as also going to read in the detail the relative powerpoints from the workshop-i would like to ask one last question, because from the tutorial i was a bit confused about a specific part :

    If I'm correct, based for example-the 40 samples from the 1000Genomes project-could used one by one with mutect2 in -tumor only mode-, to create at the end a "universal VCF" of normal samples right ?

    But afterwards, mutect2 should run again for each cancer sample, but using each time as the reference normal sample, the collapsed normal VCF file ? like the T-N pair mode ?To finally call somatic variants ?



