Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

After indelrealignment and BQSR, the bam failed on picard validatesam

shenglaishenglai ChicagoMember
edited October 2015 in Ask the GATK team

Hi GATK,

Recently, I was running the preprocessing pipeline for paired normal and tumor sample. I ran realignertargetcreator for all samples, used this interval list to perform indelrealignment together. I also ran the bqsr and printreads individually for each sample.
The pipeline is shown below:

cmd = ['java', '-d64', '-Xmx16G', '-jar', gatk_path, '-nt ' + thread_count, '-T RealignerTargetCreator', '-R ' + reference_fasta_name, '-I ' + bam_list_path, '-known ' + known_1k_genome_indel_sites, '-o ' + intervals_path]

cmd = ['java', '-d64', '-Xmx16G', '-jar', gatk_path, '-T IndelRealigner', '-R ' + reference_fasta_name, '-I ' + bam_list_path, '-known ' + known_1k_genome_indel_sites, '-targetIntervals ' + intervals_path, '-disable_auto_index_creation_and_locking_when_reading_rods', '-nWayOut ' + output_map]

for IR_bam in IR_bam_path:

cmd = ['java', '-d64', '-Xmx16G', '-jar', gatk_path, '-nct ' + thread_count, '-T BaseRecalibrator', '-R ' + reference_fasta_name, '-I ' + IR_bam, '-knownSites ' + dbsnp_known_snp_sites, '-o ' + BQSR_table_path]
cmd = ['java', '-d64', '-Xmx16G', '-jar', gatk_path, '-nct ' + thread_count, '-T PrintReads', '-R ' + reference_fasta_name, '-I ' + IR_bam, '-BQSR ' + BQSR_table_path, '-o ' + BQSR_bam_path]

The preprocessing pipeline ran smoothly. And I just wanna make sure they are good in the end, so that I ran picard validatesam:

cmd = ['java', '-d64', '-Xmx40G', '-jar', picard_path, 'ValidateSamFile', 'INPUT=' + BQSR_bam_path, 'OUTPUT=' + validate_file]

However, I got following error:

Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 0: B09V7ABXX110411:2:1203:13567:65642
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:132)
at htsjdk.samtools.CoordinateSortedPairInfoMap.access$300(CoordinateSortedPairInfoMap.java:53)
at htsjdk.samtools.CoordinateSortedPairInfoMap$MapIterator.advanceToNextNonEmptyReferenceIndex(CoordinateSortedPairInfoMap.java:227)
at htsjdk.samtools.CoordinateSortedPairInfoMap$MapIterator.(CoordinateSortedPairInfoMap.java:221)
at htsjdk.samtools.CoordinateSortedPairInfoMap$MapIterator.(CoordinateSortedPairInfoMap.java:211)
at htsjdk.samtools.CoordinateSortedPairInfoMap.iterator(CoordinateSortedPairInfoMap.java:208)
at htsjdk.samtools.SamFileValidator$CoordinateSortedPairEndInfoMap.iterator(SamFileValidator.java:728)
at htsjdk.samtools.SamFileValidator.validateUnmatchedPairs(SamFileValidator.java:227)
at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:200)
at htsjdk.samtools.SamFileValidator.validateSamFileVerbose(SamFileValidator.java:159)
at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:166)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Command exited with non-zero status 1

Then I also used picard sortsam to sort the preprocessed bam file, and ran validatesam again, but I also got the same error.

In the validate file, it showed:

ERROR: Record 60229, Read name XXXXXXXXXXXXXXX:3:1201:12294:71391, Mate alignment does not match alignment start of mate
ERROR: Record 60229, Read name XXXXXXXXXXXXXXX:3:1201:12294:71391, Mate CIGAR string does not match CIGAR string of mate
ERROR: Record 66872, Read name XXXXXXXXXXXXXXX:1:1311:16342:69759, Mate alignment does not match alignment start of mate
ERROR: Record 66872, Read name XXXXXXXXXXXXXXX:1:1311:16342:69759, Mate CIGAR string does not match CIGAR string of mate
ERROR: Record 71257, Read name XXXXXXXXXXXXXXX:6:1113:2807:97456, Mate alignment does not match alignment start of mate
ERROR: Record 71257, Read name XXXXXXXXXXXXXXX:6:1113:2807:97456, Mate negative strand flag does not match read negative strand flag of mate
ERROR: Record 71257, Read name XXXXXXXXXXXXXXX:6:1113:2807:97456, Mate CIGAR string does not match CIGAR string of mate
ERROR: Record 94667, Read name XXXXXXXXXXXXXXX:1:1314:1448:56683, Mate alignment does not match alignment start of mate
ERROR: Record 94667, Read name XXXXXXXXXXXXXXX:1:1314:1448:56683, Mate CIGAR string does not match CIGAR string of mate
ERROR: Record 133492, Read name XXXXXXXXXXXXXXX:7:1306:18584:18693, Mate alignment does not match alignment start of mate
ERROR: Record 133492, Read name XXXXXXXXXXXXXXX:7:1306:18584:18693, Mate negative strand flag does not match read negative strand flag of mate

(I x out the read name avoid PHI issue)

All the bam files passed validatesam before preprocessing. FYI, I used GATK v3.4-0-g7e26428 and picard both 1.133 and 1.139

Thank you in advance.

Best Answer

Answers

  • shenglaishenglai ChicagoMember
    edited October 2015

    I noticed there is a flag "--noOriginalAlignmentTags". Should I use this parameter to avoid this type of issue? and also run sort and markduplicates again?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I don't think that will help you. Let's try to figure out what's causing these errors. But I'm confused; you say you initially got this error:

    Value was put into PairInfoMap more than once
    

    And that after you ran sortsam, you got the same validation error, but the errors you actually post are all

    Mate alignment does not match alignment start of mate
    

    Can you please clarify what's happening where?

  • shenglaishenglai ChicagoMember

    I apologize for the confusion.

    Value was put into PairInfoMap more than once

    This error is the stderr from picard validatesam.

    Mate alignment does not match alignment start of mate

    This is shown by the output from picard validatesam.

    These two errors are shown together, whatever I run sortsam or not.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah that makes a lot more sense, thank you :)

    Did you happen to run ValidateSAMFile on the original bams before indel realigner?

  • shenglaishenglai ChicagoMember

    Yes, I did. They all pass'd without errors.

  • shenglaishenglai ChicagoMember

    Before the indel realigner, all bams went through picard sort, picard fixmateinfo, and picard MarkDuplicatesWithMateCigar. Should I do these steps after indel realignment?

  • shenglaishenglai ChicagoMember

    Eh. It looks like I would only get the stderr from running the pipeline, not manually. I will check the pipeline. Thanks!

  • shenglaishenglai ChicagoMember

    As I changed to the latest version, all errors are gone. Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Glad to hear it! I love it when all the errors go away by just upgrading :)

Sign In or Register to comment.