Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Generating pool of normal

When generating a pooled VCF file to use as the pool of normals with mutect is it best to run the germ line samples through haplotypecaller or should germline variants be called with mutect?

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    You should run on normals with MuTect in artifact detection mode. Note that a brand new version of MuTect will be released in the near future, rewritten based on the HaplotypeCaller (so it also does indel calls).

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    You should run on normals with MuTect in artifact detection mode. Note that a brand new version of MuTect will be released in the near future, rewritten based on the HaplotypeCaller (so it also does indel calls).

  • vrenaultvrenault FranceMember

    Hello, I have been trying to generate a VCF file from a normal bam but the vcf ends up empty (only header lines appear). I ran the following command with GATK version 3.5:
    java -jar /code/bin//GATK/GenomeAnalysisTK.jar -T MuTect2 -R /Genomes/hs37d5.fa -I:tumor normal.bam --dbsnp /code/bin/GATK/resources/dbsnp_138.b37.vcf --artifact_detection_mode -o output.normal1.vcf

    The same bam file has been used successfully to generate somatic variants using MuTect2 and the associated tumor bam file.

    Anything wrong there?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @vrenault,

    I can't see anything obviously wrong with your command. Can you post the full console output?

  • pachewychomppachewychomp OregonMember

    @vrenault This looks very similar to the command I ran, though I included a COSMIC reference and an -nct of 4. I also ended up with an empty VCF (header only). A small number of reads were filtered (0.89%) from my data. One thing I'm seeing is this line "INFO 10:21:54,512 MuTect2 - Ran local assembly on 0 active regions ". I would expect this rarely to be zero, but maybe I am misunderstanding what this is doing.

    Full console output:

    INFO 19:29:13,728 HelpFormatter - --------------------------------------------------------------------------------
    INFO 19:29:13,731 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
    INFO 19:29:13,731 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 19:29:13,732 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 19:29:13,735 HelpFormatter - Program Args: -T MuTect2 -R /opt/installed/galaxy_genomes/hg19/Homo_sapiens_assembly19.fasta -I:tumor /home/users/letaw/lustre1/projs/create_pon/raw/3000.bam --artifact_detection_mode --dbsnp /home/users/letaw/lustre1/projs/create_pon/temp/3000/grch37_p13_142.vcf --cosmic /home/users/letaw/lustre1/projs/create_pon/temp/3000/cosmic_75_coding_hg19_sorted.vcf -L /home/users/letaw/lustre1/projs/create_pon/temp/3000/agilent_cre.interval_list -o /home/users/letaw/lustre1/projs/create_pon/data/3000/3000.call_stats.vcf --disable_auto_index_creation_and_locking_when_reading_rods -nct 4
    INFO 19:29:13,742 HelpFormatter - Executing as [email protected] on Linux 2.6.32-504.30.3.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_91-mockbuild_2015_10_21_19_56-b00.
    INFO 19:29:13,742 HelpFormatter - Date/Time: 2016/01/06 19:29:13
    INFO 19:29:13,742 HelpFormatter - --------------------------------------------------------------------------------
    INFO 19:29:13,743 HelpFormatter - --------------------------------------------------------------------------------
    INFO 19:29:13,872 GenomeAnalysisEngine - Strictness is SILENT
    INFO 19:29:13,988 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 19:29:13,996 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 19:29:14,036 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
    INFO 19:29:14,681 IntervalUtils - Processing 54328695 bp from intervals
    INFO 19:29:14,708 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 4 CPU thread(s) for each of 1 data thread(s), of 24 processors available on this machine
    INFO 19:29:14,779 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 19:29:15,312 GenomeAnalysisEngine - Done preparing for traversal
    INFO 19:29:15,312 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 19:29:15,313 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 19:29:15,313 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
    INFO 19:29:15,438 MuTect2 - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
    INFO 19:29:15,439 PairHMM - Performance profiling for PairHMM is disabled because HaplotypeCaller is being run with multiple threads (-nct>1) option
    Profiling is enabled only when running in single thread mode

    Using AVX accelerated implementation of PairHMM
    INFO 19:29:16,797 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
    INFO 19:29:16,797 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
    INFO 19:29:45,316 ProgressMeter - 1:1168742 1903735.0 30.0 s 15.0 s 0.1% 10.7 h 10.7 h
    INFO 19:30:15,317 ProgressMeter - 1:1650978 1.6281538E7 60.0 s 3.0 s 0.2% 8.4 h 8.4 h
    INFO 19:30:45,319 ProgressMeter - 1:2529846 4.2513529E7 90.0 s 2.0 s 0.3% 8.3 h 8.2 h
    INFO 19:31:15,321 ProgressMeter - 1:3649721 6.2147359E7 120.0 s 1.0 s 0.4% 9.3 h 9.2 h
    ...
    INFO 10:18:48,574 ProgressMeter - X:77295610 2.53400581141E11 14.8 h 0.0 s 98.1% 15.1 h 17.1 m
    INFO 10:19:48,578 ProgressMeter - X:114427167 2.54738084368E11 14.8 h 0.0 s 98.7% 15.0 h 11.5 m
    INFO 10:20:48,582 ProgressMeter - X:139866602 2.5635996929E11 14.9 h 0.0 s 99.3% 15.0 h 6.4 m
    INFO 10:21:48,586 ProgressMeter - Y:14953121 2.5778740981E11 14.9 h 0.0 s 99.9% 14.9 h 54.0 s
    INFO 10:21:54,512 MuTect2 - Ran local assembly on 0 active regions
    INFO 10:21:54,718 ProgressMeter - done 2.57805502705E11 14.9 h 0.0 s 100.0% 14.9 h 0.0 s
    INFO 10:21:54,719 ProgressMeter - Total runtime 53559.41 secs, 892.66 min, 14.88 hours
    INFO 10:21:54,719 MicroScheduler - 792145 reads were filtered out during the traversal out of approximately 88727587 total reads (0.89%)
    INFO 10:21:54,719 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
    INFO 10:21:54,719 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
    INFO 10:21:54,720 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
    INFO 10:21:54,720 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 10:21:54,720 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
    INFO 10:21:54,720 MicroScheduler - -> 7816 reads (0.01% of total) failing NotPrimaryAlignmentFilter
    INFO 10:21:54,720 MicroScheduler - -> 784329 reads (0.88% of total) failing UnmappedReadFilter
    INFO 10:21:56,382 GATKRunReport - Uploaded run statistics report to AWS S3

    Cheers!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, well spotted. That would definitely explain why you see no output in the VCF, though I can't explain why no active regions are getting triggered. MuTect2 is supposed to be more sensitive to entropy, not less... we'll figure this out asap.

  • pachewychomppachewychomp OregonMember

    @Geraldine_VdAuwera I'm wondering if this should be migrated elsewhere, as the thread already has an accepted answer?

  • vrenaultvrenault FranceMember

    Just saw there were replies to my comments... I also noticed the message "Ran local assembly on 0 active regions" and I thus tried to force the PON file creation on an interval. I tried one chromosome then one nucleotide where I know there is a variation without more success actually.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @pachewychomp @vrenault

    We have received test data from other users and are working to fix this bug. When we have a fix we'll post an announcement on the blog.

    FYI you can get email notifications when there are new responses to your pots -- there's a FAQ article that describes how to set it up.

  • vrenaultvrenault FranceMember

    ok thank you for your help and for the tip regarding email notification!

  • pachewychomppachewychomp OregonMember

    @vrenault I would recommend downloading one of the daily builds from the "Downloads" section if you need to create the panel of normals quickly. The issue does appear to have been fixed, it has just not been included in the official release yet.

  • vrenaultvrenault FranceMember

    @pachewychomp Thanks for the tip, I will try that!

Sign In or Register to comment.