Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK4 with Mutect2, calling somatic SNVs and indels with normal-tumor matched sample

Lijia YuLijia Yu BeijingMember

Hi all,

Recently, I am working on test different variants callers with one matched tumor-normal sample. I have successfully run the test program with Strelka, MuTect, GATK3+MuTect2. However, the test with GATK4+Mutect2 is not very successful. I cannot find any mutations with PASS flag in the VCF file. I think that's may because I am doing with wrong commands.

Here is the command line. I copy it from the Mutect2 homepage.

gatk --java-options "-Xmx$MAX_MEM" Mutect2 \
-R $GENOME_REFERENCE \
-I $OUT_DIR/$TUMOR \
-I $OUT_DIR/$NORMAL \
-tumor Illumina_cancer \
-normal Illumina_normal \
--germline-resource $GERMLINE \
--af-of-alleles-not-in-resource 0.0000025 \
--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \
-L $EXON_REGION \
-O $OUT_DIR/$PREFIX.vcf

  1. Do I need to add an additional PoN file when running a paired samples?
  2. Could anyone help me to check if the command above could provide a list of somatic Variants with PASS flag? Actually, I didn't get any variants with PASS flag in my VCF file.

My GATK version is 4.1.0.0

Many thanks.

Best Answer

Answers

  • JiantaoShiJiantaoShi BostonMember ✭✭
    1. PON is optional but give you better results when you have one.
    2. You need to run FilterMutectCalls on your raw VCF to filter variants.
  • Lijia YuLijia Yu BeijingMember

    Hi @JiantaoShi and @davidben

    By the way, is $GERMLINE the AF-only gnomAD from the GATK resource bucket?

    Yes, of course.

    Thank you for your help. I have got the filtered VCF successfully under your guide.

  • TomliuTomliu Member
    edited March 11
    Hi all,
    I try to GATK4+Mutect2 create PoN for my normal sample, But it's too slow. I have run 100 hours for my WGS data. Is there anything wrong ?
    java -Xmx40g -Djava.io.tmpdir=/tmp \
    -jar gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar Mutect2 \
    -R GDDH13.fa \
    -I realn.bam \
    -tumor C4 \
    --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \
    --native-pair-hmm-threads 8 \
    -O mutect2_pon/1.vcf.gz

    Thanks in advance.
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @Tomliu What is the coverage of your normal sample? Is the data FFPE, or is its quality compromised in some other way? How far has it gotten in 100 hours? What reference is GDDH13.fa?

  • Lijia YuLijia Yu BeijingMember

    @davidben said:
    What reference is GDDH13.fa?

    I guess it is Apple

  • TomliuTomliu Member
    > @davidben said:
    > @Tomliu What is the coverage of your normal sample? Is the data FFPE, or is its quality compromised in some other way? How far has it gotten in 100 hours? What reference is GDDH13.fa?

    My samples' coverage may 20 and the its quality is no problem to call germline mutation.
    Now this program just accomplish 88%.
    I think there were something wrong in my parameter.
  • TomliuTomliu Member
    @davidben @Lijia Yu
    yeah, it's apple
  • TomliuTomliu Member
    edited March 11
    @davidben I'm not sure about the warning.05:56:09.500
    WARN NativeLibraryLoader - Unable to load libgkl_pairhmm_omp.so from native/libgkl_pairhmm_omp.so
    (snp/tmp/libgkl_pairhmm_omp2213064987687237120.so: /usr/lib64/libgomp.so.1: version `GOMP_4.0' not found (required by snp/tmp/libgkl_pairhmm_omp2213064987687237120.so)) 05:56:09.500 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
    05:56:09.500 INFO NativeLibraryLoader - Loading libgkl_pairhmm.so from jar:file: GATK-4.0.11.0/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm.so 05:56:09.838 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    05:56:09.839 WARN IntelPairHmm - Ignoring request for 8 threads; not using OpenMP implementation
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @Tomliu the emssage OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported means that you are running on hardware that lacks the Intel AVX instruction set and read-to-haplotype likelihoods are being computed without hardware-accelerated native code. This slows Mutect2 significantly. While I am surprised that a 20x sample is taking so long it's not completely unreasonable.

    My question about quality was whether there are a lot of little errors, not necessarily enough to compromise calling but enough to slow down assembly etc. For example, how complete is the apple reference? Does it have a lot of alignment artifacts?

    If you need more speed, you could try setting --max-num-haplotypes-in-population to something lower than its default of 128, like 20.

  • TomliuTomliu Member
    > @davidben said:
    Thanks for your help. I will try to set.
    Yeah, the genome is not complete and more complex because of its Repeat Region and heterozygosity.
    But when I try to use the Strelka2, it takes only few hours.
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @Tomliu Mutect2 can spend longer than it should in regions with complicated assemblies. We are currently working on that and expect to improve things greatly in a few months. In the meantime, --max-num-haplotypes-in-population will probably help, and running on machines with the AVX instruction set, which is, I think most Intel chips these days, will give you a speedup of 3x or more.

Sign In or Register to comment.