We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

errors when running picard ValidateSamFile on bam file got from SplitNCigarReads

hongenhongen GermanyMember

Hi

when I ran picard ValidateSamFile on the bam file got from GATK (VERSION 3.5) SplitNCigarReads,

I got errors,

HISTOGRAM java.lang.String

Error Type Count
ERROR:INVALID_CIGAR 1638
ERROR:MATES_ARE_SAME_END 3769966
ERROR:MATE_NOT_FOUND 2126887
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 7539932
ERROR:MISMATCH_MATE_ALIGNMENT_START 9738077
WARNING:MISSING_TAG_NM 50341381

My command for ValidateSamFile is

java -jar ~/softwares/picard-tools-1.141/picard.jar ValidateSamFile I=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.merged.RG.dedupped.split.bam MODE=SUMMARY

And my command for SplitNCigarReads

java -Xmx32g -jar /mnt/home/hongenxu/softwares/gatk-3.5/GenomeAnalysisTK.jar -T SplitNCigarReads -R /mnt/home/hongenxu/softwares/bwa/galgal5.fa -I /mnt/scratch/hongenxu/RNAseq/STAR/017748_merged.dedupped.bam -o /mnt/scratch/hongenxu/RNAseq/STAR/017748_merged.dedupped.splited.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

I also tried using gatk current version (version 3.6), and ValidateSamFile gave the same errors.

Would you please help me?

Best regards,
Hongen

Issue · Github
by Sheila

Issue Number
1057
State
closed
Last Updated
Closed By
chandrans

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @hongen
    Hi Hongen,

    Can you tell us how you aligned your reads and what pre-processing steps you did before SplitNCigarReads? Also, please confirm the errors from ValidateSamFile did not exist before SplitNCigarReads.

    Thanks,
    Sheila

  • hongenhongen GermanyMember

    Hi Sheila,

    thanks for your help.

    Before SplitNCigarReads, ValidateSamFile only gave me warnings, see below.

    java -jar ~/softwares/picard-tools-1.141/picard.jar ValidateSamFile I=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.merged.RG.dedupped.bam MODE=SUMMARY

    [Thu Jul 07 12:06:55 EDT 2016] picard.sam.ValidateSamFile INPUT=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.merged.RG.dedupped.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
    [Thu Jul 07 12:06:55 EDT 2016] Executing as [email protected] on Linux 2.6.32-504.30.3.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_35-b35; Picard version: 1.141(8ece590411350163e7689e9e77aab8efcb622170_1447695087) IntelDeflater
    INFO 2016-07-07 12:07:41 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:00:45s. Time for last 10,000,000: 45s. Last read position: 3:14,633,167
    INFO 2016-07-07 12:08:28 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:01:32s. Time for last 10,000,000: 47s. Last read position: 7:23,654,243
    INFO 2016-07-07 12:09:16 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:02:20s. Time for last 10,000,000: 47s. Last read position: 19:150,688
    INFO 2016-07-07 12:10:03 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:03:07s. Time for last 10,000,000: 47s. Last read position: NT_456325.1:6,267

    HISTOGRAM java.lang.String

    Error Type Count
    WARNING:MISSING_TAG_NM 43002472

    I aligned my RNAseq reads using STAR and used picard to add RG and mark duplicates. see below for details.

    /mnt/home/hongenxu/softwares/STAR-2.5.1b/bin/Linux_x86_64_static/STAR --twopassMode Basic --genomeDir /mnt/scratch/hongenxu/RNAseq/genomeDir --readFilesIn /mnt/scratch/hongenxu/RNAseq/trimm/017748_L005_R1_paired.fastq.gz /mnt/scratch/hongenxu/RNAseq/trimm/017748_L005_R2_paired.fastq.gz --readFilesCommand zcat --outFileNamePrefix /mnt/scratch/hongenxu/RNAseq/STAR/017748_L005_paired_ --runThreadN 8 --outFilterMultimapNmax 10 sjdbOverhang 124

    java -Xmx32g -jar /mnt/home/hongenxu/softwares/picard-tools-1.141/picard.jar SamFormatConverter I=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005_paired_Aligned.out.sam O=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.merged.bam

    java -Xmx32g -jar /mnt/home/hongenxu/softwares/picard-tools-1.141/picard.jar AddOrReplaceReadGroups I=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.merged.bam O=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.merged.RG.bam SO=coordinate RGID=017748_L005 RGLB=Normal RGPL=ILLUMINA RGSM=017748 RGPU=CGTAGA_L005

    java -Xmx32g -jar /mnt/home/hongenxu/softwares/picard-tools-1.141/picard.jar MarkDuplicates I=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.merged.RG.bam O=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.merged.RG.dedupped.bam CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT M=/mnt/scratch/hongenxu/RNAseq/STAR/017748_L005.metrics.txt

    Hongen

  • SheilaSheila Broad InstituteMember, Broadie admin

    @hongen
    Hi Hongen,

    Sorry. It seems this is a known issue, and there is a fix coming soon.

    I will let you know when it is in.

    -Sheila

  • hongenhongen GermanyMember

    Hi Sheila,
    any news?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @hongen
    Hi Hongen,

    Sorry for the confusion. It seems this issue is going to be fixed in GATK4, but we're not sure if the fix will be backported to GATK3. It depends how much work might be involved.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @hongen
    Hi Hongen,

    The fix will only be in GATK4, not in GATK3.

    -Sheila

Sign In or Register to comment.