To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

MarkDuplicates: "Exception in thread "main" htsjdk.samtools.SAMException"

benjaminpelissiebenjaminpelissie Madison, WIMember
edited May 2016 in Ask the GATK team

Hi,

When trying to mark duplicates with MarkDuplicates, I constantly get an error (I copied the entire output below my message) while using MarkDuplicatesWithMateCigar seem to work just fine. I tried to modify TMP_DIR a few times already (I tried the same repository as input files, their parental repository, another partition), with no more success. What am I missing?

Ben

[Mon May 23 14:57:20 CDT 2016] picard.sam.markduplicates.MarkDuplicates INPUT=[CPBWGS_11_piped.bam] OUTPUT=CPBWGS_11_markduplicates.bam METRICS_FILE=CPBWGS_11_markduplicates_metrics.txt OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 TMP_DIR=[/data2/CPBWGS/all_fastq_files/markdup_tmp] CREATE_INDEX=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX= VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Mon May 23 14:57:20 CDT 2016] Executing as pelissie@denali on Linux 3.13.0-85-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13; Picard version: 2.2.4(f7bcc560e4e936f8c5782421b8196cda46fab833_1462390762) JdkDeflater
INFO 2016-05-23 14:57:20 MarkDuplicates Start of doWork freeMemory: 2013246576; totalMemory: 2025848832; maxMemory: 30542397440
INFO 2016-05-23 14:57:20 MarkDuplicates Reading input file and constructing read end information.
INFO 2016-05-23 14:57:20 MarkDuplicates Will retain up to 117470759 data points before spilling to disk.
[Mon May 23 14:57:26 CDT 2016] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=4487905280
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: /data2/CPBWGS/all_fastq_files/markdup_tmp/CSPI.8763470411475836719.tmp/12511.tmpnot found
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:63)
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.put(DiskBasedReadEndsForMarkDuplicatesMap.java:65)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:447)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:193)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Caused by: java.io.FileNotFoundException: /data2/CPBWGS/all_fastq_files/markdup_tmp/CSPI.8763470411475836719.tmp/12511.tmp (Too many open files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:60)
... 10 more

Best Answers

Answers

Sign In or Register to comment.