Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Empty ContEst Output

Hello,

I have some truseq cancer panel amplicon data and I am in process of calling Somatic variants in Tumor samples with MuTect. By using the default fraction_contamination, I am not getting any entries with "KEEP" status. However, reducing this does start yielding SNPs. I am not sure what will be the ideal value here, hence I have been trying to run ContEst with GATK 3.7 as suggested on the website. But, I am not getting any output. Here are the details of the command:

GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar -T ContEst -R hg19.fa -I:eval tumor_sorted.bam -I:genotype normal_sorted.bam --popfile popaf/hapmap_3.3_hg19_pop_stratified_af_sorted.vcf -isr INTERSECTION -o output.txt -L targets.bed

Here are the STDOUT and STDERR:

INFO  13:53:43,463 HelpFormatter - --------------------------------------------------------------------------------
INFO  13:53:43,652 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO  13:53:43,652 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO  13:53:43,652 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO  13:53:43,652 HelpFormatter - [Wed Apr 05 13:53:43 BST 2017] Executing on Linux 3.10.0-229.el7.x86_64 amd64
INFO  13:53:43,653 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15
INFO  13:53:43,656 HelpFormatter - Program Args: -T ContEst -R hg19.fa -I:eval tumor_sorted.bam -I:genotype normal_sorted.bam --popfile popaf/hapmap_3.3_hg19_pop_stratified_af_sorted.vcf -isr INTERSECTION -o output.txt -L targets.bed
INFO  13:53:43,892 HelpFormatter - Executing as [email protected] on Linux 3.10.0-229.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15.
INFO  13:53:43,892 HelpFormatter - Date/Time: 2017/04/05 13:53:43
INFO  13:53:43,892 HelpFormatter - --------------------------------------------------------------------------------
INFO  13:53:43,892 HelpFormatter - --------------------------------------------------------------------------------
INFO  13:53:43,905 GenomeAnalysisEngine - Strictness is SILENT
INFO  13:53:44,787 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  13:53:44,794 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  13:53:44,973 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.17
INFO  13:53:45,491 IntervalUtils - Processing 38385 bp from intervals
INFO  13:53:46,311 GenomeAnalysisEngine - Preparing for traversal over 2 BAM files
INFO  13:53:46,390 GenomeAnalysisEngine - Done preparing for traversal
INFO  13:53:46,390 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  13:53:46,391 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  13:53:46,391 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
INFO  13:53:46,392 ContEst - Running in sequencing mode
INFO  13:54:08,909 ContEst - Total sites:  37742
INFO  13:54:08,910 ContEst - Population informed sites:  37
INFO  13:54:08,910 ContEst - Non homozygous variant sites: 32
INFO  13:54:08,910 ContEst - Homozygous variant sites: 5
INFO  13:54:08,910 ContEst - Passed coverage: 5
INFO  13:54:08,911 ContEst - Results: 0
INFO  13:54:08,913 ProgressMeter -            done     68536.0    22.0 s       5.5 m       99.8%    22.0 s       0.0 s
INFO  13:54:08,914 ProgressMeter - Total runtime 22.52 secs, 0.38 min, 0.01 hours
INFO  13:54:08,914 MicroScheduler - 918 reads were filtered out during the traversal out of approximately 893918 total reads (0.10%)
INFO  13:54:08,914 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter
INFO  13:54:08,915 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO  13:54:08,915 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO  13:54:08,915 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO  13:54:08,915 MicroScheduler -   -> 918 reads (0.10% of total) failing NotPrimaryAlignmentFilter
INFO  13:54:08,915 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter
Done. ------------------------------------------------------------------------------------------
There were no warn messages.

I seems that most things are filtered out as data is down sampled with target coverage of 1000. I will be grateful if you could provide any help.

Many thanks.

Issue · Github
by Sheila

Issue Number
1955
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Unfortunately it seems you have too few sites for the ContEst algorithm to work properly. I would recommend trying several different contamination values, then reviewing results manually to evaluate whether the calls make sense. If you have a fairly small amount of territory to look at, it shouldn't be too bad.

    Keep in mind that the contamination level you go with is basically going to be the lower threshold for allele frequency of mutations you can detect.

Sign In or Register to comment.