Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MarkDuplicates: "Exception in thread "main" htsjdk.samtools.SAMException"

benjaminpelissiebenjaminpelissie Madison, WIMember
edited May 2016 in Ask the GATK team

Hi,

When trying to mark duplicates with MarkDuplicates, I constantly get an error (I copied the entire output below my message) while using MarkDuplicatesWithMateCigar seem to work just fine. I tried to modify TMP_DIR a few times already (I tried the same repository as input files, their parental repository, another partition), with no more success. What am I missing?

Ben

[Mon May 23 14:57:20 CDT 2016] picard.sam.markduplicates.MarkDuplicates INPUT=[CPBWGS_11_piped.bam] OUTPUT=CPBWGS_11_markduplicates.bam METRICS_FILE=CPBWGS_11_markduplicates_metrics.txt OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 TMP_DIR=[/data2/CPBWGS/all_fastq_files/markdup_tmp] CREATE_INDEX=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX= VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Mon May 23 14:57:20 CDT 2016] Executing as [email protected] on Linux 3.13.0-85-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13; Picard version: 2.2.4(f7bcc560e4e936f8c5782421b8196cda46fab833_1462390762) JdkDeflater
INFO 2016-05-23 14:57:20 MarkDuplicates Start of doWork freeMemory: 2013246576; totalMemory: 2025848832; maxMemory: 30542397440
INFO 2016-05-23 14:57:20 MarkDuplicates Reading input file and constructing read end information.
INFO 2016-05-23 14:57:20 MarkDuplicates Will retain up to 117470759 data points before spilling to disk.
[Mon May 23 14:57:26 CDT 2016] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=4487905280
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: /data2/CPBWGS/all_fastq_files/markdup_tmp/CSPI.8763470411475836719.tmp/12511.tmpnot found
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:63)
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.put(DiskBasedReadEndsForMarkDuplicatesMap.java:65)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:447)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:193)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Caused by: java.io.FileNotFoundException: /data2/CPBWGS/all_fastq_files/markdup_tmp/CSPI.8763470411475836719.tmp/12511.tmp (Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:60)
... 10 more

Best Answers

Answers

Sign In or Register to comment.