We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

(How to) Call somatic mutations using GATK4 Mutect2 (Deprecated)



  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @rshahi Thanks for letting me know!

  • 29043594952904359495 Member ✭✭

    @ddaneels hi, how do you find af-only-gnomad.raw.sites.hg19.vcf.gz, thanks a lot

  • sinemselvisinemselvi Member

    I used af-only-gnomad.hg38.vcf.gz , wholegenome.interval_list, small_exac_common_3.hg38.vcf.gz instead of chr17_af-only-gnomad_grch38.vcf.gz , chr17plus.interval_list , chr17_small_exac_common_3_grch38.vcf.gz .

    I got the same result that passed variants in chr17. I also want to obtain variants in other chromosomes.

    Why do we get only chr17 variants?

  • syer89syer89 Member
    Hi, I have a question for CreatingSomaticPanelOfNormals.

    I see this suggestion above - "One option that is not used here is to include a germline resource with --germline-resource" for processing each normal sample in -tumour only mode.

    But in another post "tumor-only-mode-should-collectsequencingartifactmetrics-and-filterbyorientationbias-be-used-on-bam" its suggested not to use the --germline resource with mutect2 for normal sample processing.

    Can you please clarify this ? And to confirm if "--disable-read-filter" needs to be used for normal sample processing in order to create PON.
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @syer You don't need to use --disable-read-filter. In fact, it would probably end up putting too many sites in the PoN and hurting sensitivity.

    There have been a lot of changes in the last few releases, which is why a blog post and new tutorial for Mutect2 GATK 4.1.1 is about to come out. To make a PoN you should not use --germline-resource for Mutect2 but you should use --germline-resource for CreateSomaticPanelOfNormals.

  • syer89syer89 Member
    edited April 2019
    Hi @davidben Thanks for clarifying it . It's actually very clear now. And I assume the above stands true for >=v4.0.4.0.

    When using CreateSomaticPanelOfNormals, does it look at the "ALT" column at all when combining variants from multiple samples ?

    Because when I tested a small set I got the below -
    sample1 variant
    1 235918688 . CAAAAG CAAAAGAAAAG,C
    sample2 variant
    1 235918688 . CAAAAG C

    Resulting combine (with default 2 min. samples)
    1 235918688 . CAAAAG CAAAAGAAAAG,C

    I expected only ALT=CAAAAGAAAAG to be present in the final PON vcf, but it's just looking if the site is present in min. 2 samples without considering the ALT's. Is that how its suppose to function? Thanks alot!
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @syer89 CreateSomaticPanelOfNormals ignores the alt column, which is intended. The idea is that we're flagging error-prone sites, where an open-ended set of bad things can occur. This is pretty reasonable as far as modeling STR and mapping errors is concerned, less so for other artifacts.

  • BegaliBegali GermanyMember ✭✭

    Hi @Geraldine_VdAuwera @Sheila

    can I call somatic variants with Mutect2 for merged.bam(multiple samples for different patients) or should I repeat that for each sample(individual_file.bam) alone. Moreover, I have to apply only tumor mode as normal.bam is not available.

    Thanks in advance,,,

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @Begali as of GATK 4.1 Mutect2 can jointly call an arbitrary number of tumor and normal samples for the same individual, but we have no plans to support joint calling of multiple individuals.

  • sahiilsethsahiilseth Member

    Using the latest, one can no longer use CreateSomaticPanelOfNormals directly.

    We need to create a pon db first using: GenomicsDBImport. However, the creation of db fails with:

    A USER ERROR has occurred: Failed to create reader from file:///pathto/ponm/vcfs/1234N.vcf.gz


    If I switch to a previous version of GATK (, I am able to successfully create the VCFs.

    (Mutect2 from latest GATK[], was used to call variants).

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @sahiilseth What is your command line for GenomicsDBImport?

  • tahuhtahuh Member

    Hi, as of v. the command line here to create PoN seems not working (see the documentation below)


    Could there will be any difference in results between using this updated version of GATK( and the version describing here?

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @tahuh What error message are you getting?

  • MT_badrMT_badr Member
    @davidben Hi David, regarding my previous problem not finding a suitable AF germline resource for mm10 i am trying to create one using the SNP VCF provided by the sanger ftp. Is the script you used to create the germline resource in Humans is the mutect_resources.wdl script gatk github?

    and is there any further explanation, description on how the AF resource was created? it would be really helpful.
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @MT_badr Unfortunately that wdl is very specific to gnomad. For example, it contains the lines

    # Get all lines in the file except the header:
    # Preserve all fields before INFO, Grab only the AF annotation from the INFO Field
    # replace ID (3rd) and QUAL (6th) columns with '.' (empty):
    grep -v "^#" ${input_vcf} | sed -e 's#\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)\t\(.*\)\t.*;AF=\([0-9]*\.[e0-9+-]*\).*#\1\t\2\t.\t\4\t\5\t.\t\7\tAF=\8#g' > simplified_body &

    which, as you can see, makes all sorts of assumptions about formatting.

    The important thing to understand is that any vcf with an AF INFO field works as a germline resource. the vcf you can download from the gnomad website would work too, and we only need the resource wdl because that VCF is so full of (for our purposes) extraneous INFO fields that it incurs a significant runtime cost. If your SNP vcf from Sanger has AF then you should be able to use it unchanged.

  • vivekruhelavivekruhela Member

    I am trying to get somatic calls from tumor and normal samples using mutect2 as mentioned above tutorial. But I am getting error statement as Reference coordinate corresponds to a non-existent base in the read.. But I generated the tumor.bam and normal.bam using the same pipeline [same version of BWA + picard + samtools]. I am not getting the meaning of the error. I have attached the error message here [Check line No. 42 in attached file]. Kindly suggest. Thanks.

  • Hi, Could someone tell me where I can download NA19771.bam and HG02759.bam used in this tutorial?
    I only get HG00190.bam in google drive. Thanks a lot.

  • siriansirian USMember ✭✭
    edited October 2019

    Post was deleted

Sign In or Register to comment.