MarkDuplicates---“did not start with a parseable number”

I was running RNA-seq data through the MarkDuplicates in Picard package for SNP calling getting the message:

WARNING 2016-08-31 16:48:12 AbstractOpticalDuplicateFinderCommandLineProgram A field field parsed out of a read name was expected to contain an integer and did not. Read name: SN1001:449:HGTN3ADXX:1:2115:4606:89966:R2#1#CTCCT. Cause: String 'R2#1#CTCCT' did not start with a parsable number.

Then I run the command:

samtools view resources/HepG2.RBFOX2/HepG2.RBFOX2.rep2.R2.tag.uniq.sorted.bam | grep R2#1#CTCCT

And I got this:

SN1001:449:HGTN3ADXX:2:2110:11231:12370:R2#1#CTCCT 16 chrX 133035777 37 31M * 0 0 TTTTTGAGTTAAAGTTATACACCTGAAGAGG BFFBFB

So I tried to run the command ValidateSamFile


ERROR: Record 2243391, Read name SN1001:449:HGTN3ADXX:1:1207:18758:88057:R2#2#TATAT, NM tag (nucleotide differences) in file [2] does not match reality [6]
ERROR: Record 2500598, Read name SN1001:449:HGTN3ADXX:2:1206:6729:26515:R2#1#TAATC, NM tag (nucleotide differences) in file [3] does not match reality [7]
ERROR: Record 2500599, Read name SN1001:449:HGTN3ADXX:2:2110:7481:90889:R2#1#TTCAG, NM tag (nucleotide differences) in file [3] does not match reality [8]
ERROR: Record 2500601, Read name SN1001:449:HGTN3ADXX:1:1210:6384:54132:R2#1#TAGAT, NM tag (nucleotide differences) in file [3] does not match reality [9]
ERROR: Record 2500602, Read name SN1001:449:HGTN3ADXX:2:2111:15383:87620:R2#1#GGGGG, NM tag (nucleotide differences) in file [3] does not match reality [9]
ERROR: Record 2500603, Read name SN1001:449:HGTN3ADXX:2:2212:9564:100522:R2#1#GGCGC, NM tag (nucleotide differences) in file [3] does not match reality [10]
ERROR: Record 2500604, Read name SN1001:449:HGTN3ADXX:1:2110:14050:67425:R2#1#GGCCC, NM tag (nucleotide differences) in file [2] does not match reality [9]
ERROR: Record 2500605, Read name SN1001:449:HGTN3ADXX:1:2205:20926:12488:R2#1#TCCCC, NM tag (nucleotide differences) in file [1] does not match reality [3]
ERROR: Record 2500606, Read name SN1001:449:HGTN3ADXX:2:2109:8561:77192:R2#1#CCCCC, NM tag (nucleotide differences) in file [1] does not match reality [3]

Does this warning message matters?

Answers

  • zillurbmb51zillurbmb51 USAMember
    edited May 2017

    Hi there,
    I have the same problem. I tried with ValidateSamFile and getting the following:

    [[email protected] pberghei]$ java -jar /gondor/zillur/tools/picard.jar ValidateSamFile I=SRR1858992_gca.sorted.bam MODE=SUMMARY
    [Tue May 16 12:55:00 EDT 2017] picard.sam.ValidateSamFile INPUT=SRR1858992_gca.sorted.bam MODE=SUMMARY    MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
    [Tue May 16 12:55:00 EDT 2017] Executing as [email protected] on Linux 3.10.0-514.16.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_131-b11; Picard version: 2.9.0-1-gf5b9f50-SNAPSHOT
    [Tue May 16 12:55:14 EDT 2017] picard.sam.ValidateSamFile done. Elapsed time: 0.25 minutes.
    Runtime.totalMemory()=5984747520
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" htsjdk.samtools.SAMException: /tmp/zillur/CSPI.1880274338990258510.tmp/5847.tmpnot found
        at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:63)
        at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49)
        at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76)
        at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180)
        at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164)
        at htsjdk.samtools.SamFileValidator$CoordinateSortedPairEndInfoMap.put(SamFileValidator.java:761)
        at htsjdk.samtools.SamFileValidator.validateMateFields(SamFileValidator.java:501)
        at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:297)
        at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:215)
        at htsjdk.samtools.SamFileValidator.validateSamFileSummary(SamFileValidator.java:143)
        at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:196)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
    Caused by: java.io.FileNotFoundException: /tmp/zillur/CSPI.1880274338990258510.tmp/5847.tmp (Too many open files)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:60)
        ... 13 more
    

    What does it mean? The problem still persists. Thanks in advance. Waiting for your suggestions.

    Best regards
    Zillur

    Post edited by shlee on
Sign In or Register to comment.