Filters produce different results for Mutect and GATK Deapth of Coverage

lkeelerlkeeler CaliforniaMember

Hi,

We are experiencing a problem where the DuplicateReadFilter is being applied in GATK Depth of Coverage and SomaticAnalysisTK Mutect but the result of the Duplicate Read filter differs despite using the same BAM file. Is there a way to make the Duplicate Read filter apply to GATK Depth of Coverage to mimic the coverage shown for a particular base in Mutect.

See below for Details:

Run in GATK Depth of Coverage:

/apps/java/jre1.7.0_67/bin/java -Xmx4g -jar /apps/cga/2014.3/GenomeAnalysisTK.jar -T DepthOfCoverage -R /apps/assay/referenceData/bwamem_reference/hg19.fa -dt None --minBaseQuality 13 --read_filter DuplicateRead --read_filter MappingQualityZero -L /hpc/dev/assay/encap/NexCourse-4-16-15/share/gpc/GPC-intervals-v1.0.0.intervals -I /hpc/dev/assay/NextSeqData/NextSeq545/150331_NS500913_0002_AH5LYCBGXX/Data/Aligned/1.0.0/Project_Pancancer/Sample_LU1Day1a/BQSR/LU1Day1a.cleaned.bam -o /hpc/dev/assay/NextSeqData/NextSeq545/150331_NS500913_0002_AH5LYCBGXX/Data/Aligned/1.0.0/Project_Pancancer/Sample_LU1Day1a/GATK-Coverage/LU1Day1a.coverage
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/scratch
INFO 10:07:33,766 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 10:07:33,768 HelpFormatter - The Genome Analysis Toolkit (GATK) v2014.3-3.2.2-7-gf9cba99, Compiled 2014/08/06 10:49:54
INFO 10:07:33,768 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:07:33,769 HelpFormatter - For support and documentation go to http://gatkdocs.appistry.com/
INFO 10:07:33,772 HelpFormatter - Program Args: -T DepthOfCoverage -R /apps/assay/referenceData/bwamem_reference/hg19.fa -dt None --minBaseQuality 13 --read_filter DuplicateRead --read_filter MappingQualityZero -L /hpc/dev/assay/encap/NexCourse-4-16-15/share/gpc/GPC-intervals-v1.0.0.intervals -I /hpc/dev/assay/NextSeqData/NextSeq545/150331_NS500913_0002_AH5LYCBGXX/Data/Aligned/1.0.0/Project_Pancancer/Sample_LU1Day1a/BQSR/LU1Day1a.cleaned.bam -o /hpc/dev/assay/NextSeqData/NextSeq545/150331_NS500913_0002_AH5LYCBGXX/Data/Aligned/1.0.0/Project_Pancancer/Sample_LU1Day1a/GATK-Coverage/LU1Day1a.coverage
INFO 10:07:33,776 HelpFormatter - Executing as on Linux 2.6.32-358.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_67-b01.
INFO 10:07:33,776 HelpFormatter - Date/Time: 2015/06/23 10:07:33
INFO 10:07:33,776 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 10:07:33,776 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 10:07:34,602 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:07:34,765 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 10:07:34,783 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 10:07:34,815 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
INFO 10:07:34,984 IntervalUtils - Processing 559718 bp from intervals
INFO 10:07:35,122 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 10:07:35,340 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:07:35,341 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:07:35,341 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:07:35,342 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 10:08:05,349 ProgressMeter - chr3:70008418 263384.0 30.0 s 113.0 s 13.7% 3.6 m 3.1 m
INFO 10:08:35,351 ProgressMeter - chr5:112173844 492392.0 60.0 s 2.0 m 26.9% 3.7 m 2.7 m
INFO 10:09:05,352 ProgressMeter - chr8:38306574 821023.0 90.0 s 109.0 s 42.2% 3.6 m 2.1 m
INFO 10:09:35,353 ProgressMeter - chr11:22647126 1156634.0 120.0 s 103.0 s 57.7% 3.5 m 88.0 s
INFO 10:10:05,354 ProgressMeter - chr13:32913896 1483357.0 2.5 m 101.0 s 71.8% 3.5 m 58.0 s
INFO 10:10:35,355 ProgressMeter - chr18:42533165 1754690.0 3.0 m 102.0 s 86.0% 3.5 m 29.0 s
INFO 10:11:01,079 DepthOfCoverage - Printing summary info
INFO 10:11:01,085 DepthOfCoverage - Printing locus summary
INFO 10:11:01,169 ProgressMeter - done 2038682.0 3.4 m 100.0 s 100.0% 3.4 m 0.0 s
INFO 10:11:01,169 ProgressMeter - Total runtime 205.83 secs, 3.43 min, 0.06 hours
INFO 10:11:01,176 MicroScheduler - 21017 reads were filtered out during the traversal out of approximately 9108983 total reads (0.23%)
INFO 10:11:01,177 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO 10:11:01,178 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 10:11:01,178 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 10:11:01,179 MicroScheduler - -> 21017 reads (0.23% of total) failing MappingQualityZeroFilter
INFO 10:11:01,179 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 10:11:01,180 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

Run in Mutect:

/apps/java/jre1.7.0_67/bin/java -Xmx6g -jar /apps/cga/2014.3/SomaticAnalysisTK.jar --analysis_type MuTect -R /apps/assay/referenceData/bwamem_reference/hg19.fa --dbsnp /apps/referenceData/dbsnp138.vcf --cosmic /apps/referenceData/cosmic70.vcf -dt None -L /hpc/dev/assay/encap/NexCourse-4-16-15/share/gpc/GPC-intervals-v1.0.0.intervals -I:tumor /hpc/dev/assay/NextSeqData/NextSeq545/150331_NS500913_0002_AH5LYCBGXX/Data/Aligned/1.0.0/Project_Pancancer/Sample_LU1Day1a/BQSR/LU1Day1a.cleaned.bam -o LU1Day1a_bwa_mutect_stats.txt -vcf LU1Day1a_bwa_mutect.vcf
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/scratch
INFO 10:26:48,445 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 10:26:48,446 HelpFormatter - The Genome Analysis Toolkit (GATK) v2014.3-3.2.2-7-gf9cba99, Compiled 2014/08/06 10:49:54
INFO 10:26:48,447 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:26:48,447 HelpFormatter - For support and documentation go to http://gatkdocs.appistry.com/
INFO 10:26:48,450 HelpFormatter - Program Args: --analysis_type MuTect -R /apps/assay/referenceData/bwamem_reference/hg19.fa --dbsnp /apps/referenceData/dbsnp138.vcf --cosmic /apps/referenceData/cosmic70.vcf -dt None -L /hpc/dev/assay/encap/NexCourse-4-16-15/share/gpc/GPC-intervals-v1.0.0.intervals -I:tumor /hpc/dev/assay/NextSeqData/NextSeq545/150331_NS500913_0002_AH5LYCBGXX/Data/Aligned/1.0.0/Project_Pancancer/Sample_LU1Day1a/BQSR/LU1Day1a.cleaned.bam -o LU1Day1a_bwa_mutect_stats.txt -vcf LU1Day1a_bwa_mutect.vcf
INFO 10:26:48,455 HelpFormatter - Executing as on Linux 2.6.32-358.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_67-b01.
INFO 10:26:48,456 HelpFormatter - Date/Time: 2015/06/23 10:26:48
INFO 10:26:48,456 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 10:26:48,456 HelpFormatter - Executing version 1.1.7-2-g519b88f of the MuTect tool
INFO 10:26:48,456 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 10:26:48,847 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:26:48,884 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 10:26:48,891 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 10:26:48,910 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
WARN 10:26:54,275 FSLockWithShared$LockAcquisitionTask - WARNING: Unable to lock file /apps/referenceData/cosmic70.vcf.idx because we could not open a file channel
WARN 10:26:54,276 RMDTrackBuilder - Unable to write to /apps/referenceData/cosmic70.vcf.idx for the index file, creating index in memory only
INFO 10:26:54,400 IntervalUtils - Processing 559718 bp from intervals
INFO 10:26:54,535 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 10:26:54,761 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:26:54,762 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:26:54,762 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:26:54,763 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 10:26:54,764 MuTect - VERSION INFO: MuTect:1.1.7-2-g519b88f Gatk:2014.3-3.2.2-7-gf9cba99
INFO 10:26:57,865 MuTect - [MUTECT] Processed 1001190 reads in 2940 ms
INFO 10:34:21,021 MuTect - [MUTECT] Processed 774585269 reads in 597 ms
INFO 10:34:21,601 MuTect - [MUTECT] Processed 775585863 reads in 580 ms
INFO 10:34:21,914 Walker - [REDUCE RESULT] Traversal result is: 0
INFO 10:34:22,094 ProgressMeter - done 2038682.0 7.5 m 3.7 m 100.0% 7.5 m 0.0 s
INFO 10:34:22,094 ProgressMeter - Total runtime 447.33 secs, 7.46 min, 0.12 hours
INFO 10:34:22,096 MicroScheduler - 1487876 reads were filtered out during the traversal out of approximately 9108983 total reads (16.33%)
INFO 10:34:22,097 MicroScheduler - -> 1472536 reads (16.17% of total) failing DuplicateReadFilter
INFO 10:34:22,097 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 10:34:22,097 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 10:34:22,097 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 10:34:22,097 MicroScheduler - -> 15340 reads (0.17% of total) failing UnmappedReadFilter

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @lkeeler Are you saying that these were run on the exact same bam, and in the GATK run, the duplicates were not recognized? That's very odd. I see that you're using an Appistry version -- can you try downloading our latest version and try with that? This will help determine whether it's something to do with the data or with the software.

  • GenefinderGenefinder Cambridge, MAMember

    Hi @lkeeler These are indeed Appistry versions of the CGA Suite and GATK that you are using. At Appistry, we do not change any of the underlying filter or tool codes or behavior, so I am surprised the two jar files are acting differently.

    I'm not sure @Geraldine will be able to help you here, as the Broad doesn't routinely build MuTect on top of new GATK releases, though they may have. Trying the latest versions is always sound advice though, and Appistry has just released new versions of both GATK and our CGA Suite (versions 2015.1, based on Broad's GATK v.3.4). You may want to try using these latest versions, and see if the issue persists.

    Remember, you can always contact Appistry support at [email protected] for assistance with Appistry licensed software.

Sign In or Register to comment.