Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MuTect2 does not call tumor-only variants

Hi,
I'm trying to use MuTect2, however I'm not getting tumor-specific variants called. To get MuTect2 working I'm using an artificial tumor/normal data set, however, somatic mutations I introduced are not detected.

I use an arbitrarily selected 100 kb region of chromosome 19 and two sets of variants from the Genome-in-a-Bottle database to generate the tumor and normal bam's (using bamsurgeon) whereby the tumor batch of 67 SNPs contains also all 33 'normal' SNPs All SNPs have allele frequencies close to 1.

For variant calling I followed in general the Best Practice pipeline used the picardTools MarkDuplicates and recalibrated and then called MuTect2 with

java -jar /usr/local/src/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T MuTect2 -R /data/references/human/genome/GRCh37_hg19/hg19_wochrs.fasta -I:tumor reg5_T_67snps_noDup_wRG_recal.bam -I:normal reg5_N_33snps_noDup_wRG_recal.bam -o reg5_redSNPs_MuTect2_redetection.vcf -L Regions_files/reg5.bed -dontTrimActiveRegions -forceActive

However, the tumor-only sites are not correctly called. The output states "0 active regions".
On the other hand, when I perform MuTect2 calling only with a single sample (as "-I:tumor") applying "--artifact_detection_mode" all variants are detected correctly. Likewise, the HaplotypeCaller calls the variants correctly for both sets.

Any thoughts? What am I missing when calling on the tumor/normal pair?
Alex Neef

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Are you saying that when you call on the tumor + normal, no active regions are detected?
  • alexneefalexneef SpainMember

    Yes, exactly.
    I'm getting the following output:

    java -jar /usr/local/src/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T MuTect2 -R /data/references/human/genome/GRCh37_hg19/hg19_wochrs.fasta -I:tumor reg5_T_67snps.bam -I:normal reg5_N_33snps.bam -o giab-T-67redSNPsT-vs-N-33redSNPSN_MuTect2.vcf -L Regions_files/reg5.bed --dbsnp ../general_files/common_all_20151104.vcf -dontTrimActiveRegions -forceActive
    

    INFO 06:49:59,673 HelpFormatter - --------------------------------------------------------------------------------
    INFO 06:49:59,676 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
    INFO 06:49:59,676 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO 06:49:59,676 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
    INFO 06:49:59,676 HelpFormatter - [Wed Nov 23 06:49:59 PST 2016] Executing on Linux 3.2.0-23-generic amd64
    INFO 06:49:59,677 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 JdkDeflater
    INFO 06:49:59,680 HelpFormatter - Program Args: -T MuTect2 -R /data/references/human/genome/GRCh37_hg19/hg19_wochrs.fasta -I:tumor reg5_T_67snps.bam -I:normal reg5_N_33snps.bam -o giab-T-67redSNPsT-vs-N-33redSNPSN_MuTect2.vcf -L Regions_files/reg5.bed --dbsnp ../general_files/common_all_20151104.vcf -dontTrimActiveRegions -forceActive
    INFO 06:49:59,682 HelpFormatter - Executing as [email protected] on Linux 3.2.0-23-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14.
    INFO 06:49:59,683 HelpFormatter - Date/Time: 2016/11/23 06:49:59
    INFO 06:49:59,683 HelpFormatter - --------------------------------------------------------------------------------
    INFO 06:49:59,683 HelpFormatter - --------------------------------------------------------------------------------
    INFO 06:49:59,700 GenomeAnalysisEngine - Strictness is SILENT
    INFO 06:49:59,781 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 06:49:59,787 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 06:49:59,809 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
    INFO 06:49:59,921 IntervalUtils - Processing 100000 bp from intervals
    INFO 06:49:59,983 GenomeAnalysisEngine - Preparing for traversal over 2 BAM files
    INFO 06:50:00,002 GenomeAnalysisEngine - Done preparing for traversal
    INFO 06:50:00,002 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 06:50:00,003 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 06:50:00,003 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
    INFO 06:50:08,982 MuTect2 - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
    Using AVX accelerated implementation of PairHMM
    INFO 06:50:13,326 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
    INFO 06:50:13,326 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
    INFO 06:50:30,005 ProgressMeter - 19:34310300 0.0 30.0 s 49.6 w 10.3% 4.9 m 4.4 m
    INFO 06:51:00,006 ProgressMeter - 19:34321400 0.0 60.0 s 99.2 w 21.4% 4.7 m 3.7 m
    INFO 06:51:30,007 ProgressMeter - 19:34335538 0.0 90.0 s 148.8 w 35.5% 4.2 m 2.7 m
    INFO 06:52:00,008 ProgressMeter - 19:34345700 0.0 120.0 s 198.4 w 45.7% 4.4 m 2.4 m
    INFO 06:52:40,009 ProgressMeter - 19:34361000 0.0 2.7 m 264.6 w 61.0% 4.4 m 102.0 s
    INFO 06:53:20,010 ProgressMeter - 19:34371500 0.0 3.3 m 330.7 w 71.5% 4.7 m 79.0 s
    INFO 06:54:00,011 ProgressMeter - 19:34388600 0.0 4.0 m 396.8 w 88.6% 4.5 m 30.0 s
    INFO 06:54:21,663 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.194528518
    INFO 06:54:21,663 PairHMM - Total compute time in PairHMM computeLikelihoods() : 28.391553218000002
    INFO 06:54:21,664 MuTect2 - Ran local assembly on 0 active regions
    INFO 06:54:21,666 ProgressMeter - done 100000.0 4.4 m 43.6 m 100.0% 4.4 m 0.0 s
    INFO 06:54:21,667 ProgressMeter - Total runtime 261.66 secs, 4.36 min, 0.07 hours
    INFO 06:54:21,667 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 992744 total reads (0.00%)
    INFO 06:54:21,667 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
    INFO 06:54:21,668 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
    INFO 06:54:21,668 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
    INFO 06:54:21,668 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 06:54:21,669 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
    INFO 06:54:21,669 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter

    INFO 06:54:21,669 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

    Done. There were no warn messages.

    AS I mentioned, variant calling of single saples with the Haplotypecalleras well as with MuTect2 is completely as expected with reporting of all variants. Problems only when calling tumor + normal together.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    This sounds like a possible bug. Could you please try it again with the new 3.7 version (which came out today)? It has a few MuTect2 fixes. If that doesn't fix it we'll need some test files for a bug report.

Sign In or Register to comment.