Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

SAM bin field error for the GATK run

Hello,

I am running Picard+GATK pipeline on paired-end illumina samples. The bam files were downloaded from TCGA. GATK 3.1.1 and java v1.7.0 were used. I have encountered such an error as below. I found the same errors in the picard Markduplicates step, but then as I changed picard version to 1.88, these errors were gone (as I read from another forum). GATK now picks up these errors again. When I set the --validation_strictness to be LENIENT, these errors do not affect the GATK run. I am wondering if there is a better way to solve this problem???

BTW, is the option IGNORE=INVALID_INDEXING_BIN of picard ValidateSamFile related to such a problem??

INFO 22:50:57,977 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21 INFO 22:50:57,977 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 22:50:57,977 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 22:50:57,983 HelpFormatter - Program Args: -T RealignerTargetCreator -R b37_2.8/human_g1k_v37.fasta -I A3NJ_NB_rmdup.bam -I A3NJ_TP_rmdup.bam -known b37_2.8/100 0G_phase1.indels.b37.vcf -known b37_2.8/Mills_and_1000G_gold_standard.indels.b37.vcf -o realigner.A3NJ.intervals --validation_strictness LENIENT INFO 22:50:57,987 HelpFormatter - Executing as [email protected] on Linux 2.6.18-194.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0-b147. INFO 22:50:57,987 HelpFormatter - Date/Time: 2014/06/10 22:50:57 INFO 22:50:57,987 HelpFormatter - -------------------------------------------------------------------------------- INFO 22:50:57,987 HelpFormatter - -------------------------------------------------------------------------------- INFO 22:50:58,801 GenomeAnalysisEngine - Strictness is LENIENT INFO 22:50:58,973 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 22:50:58,984 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 22:50:59,062 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 INFO 22:50:59,567 GenomeAnalysisEngine - Preparing for traversal over 2 BAM files INFO 22:51:00,913 GenomeAnalysisEngine - Done preparing for traversal INFO 22:51:00,914 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 22:51:00,914 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining Ignoring SAM validation error: ERROR: Record 13726, Read name HWI-ST735:144061002:C3D17ACXX:1:2202:14399:15015, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned Ignoring SAM validation error: ERROR: Record 8265, Read name HWI-ST735:144061002:C3D17ACXX:8:2107:2975:86239, bin field of BAM record does not equal value computed ba sed on alignment start and end, and length of sequence to which read is aligned Ignoring SAM validation error: ERROR: Record 79, Read name HWI-ST735:144061002:C3D17ACXX:1:2202:14399:15015, bin field of BAM record does not equal value computed bas ed on alignment start and end, and length of sequence to which read is aligned ............

Any input would be very appreciated!

Thanks,

Xiayu

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rxy712‌

    Hi Xiayu,

    The reason that Picard 1.88 doesn't report this error is because of a recently added validation of the index bin in 1.90: http://sourceforge.net/projects/picard/files/picard-tools/1.90/

    I don't see a problem with setting validation stringency = lenient except for the annoyance of getting the messages. My concern is that the bam index associated with the bam might be incorrect if it is based on these bin numbers.

    The only way I can think of to fix this is to fix the BAM. In order to do that, you would need to convert to SAM format, which does not have a field for indexing bin, and then back to BAM. That would be the safest thing, followed by recreating the BAI.

    -Sheila

  • rxy712rxy712 Member

    Yes!! I followed your instructions and solved the problem. Thank you very much!!!!!!!

  • cmalleycmalley JHUMember

    I'm having the same problem with Picard 2.4.1 and Illumina BAM files. I used VALIDATION_STRINGENCY=LENIENT and there are innumerable bin field errors, though Picard isn't dying. Can anyone verify whether this error truly doesn't affect GATK later on?

    Issue · Github
    by Sheila

    Issue Number
    1014
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • deklingdekling Broad InstituteMember admin

    @cmalley: It is difficult to predict whether the errors you are observing will have consequences on downstream tools. If you run ValidateSamFile on your BAM files in SUMMARY mode, what do you see?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Bin field errors may affect the ability of downstream programs to access specific records in the BAM based on their position (as opposed to just reading in the entire file contents sequentially). Picard that don't use random access would not be affected, but many GATK tools do use random access. So I would recommend fixing the problem rather than hoping it won't cause issues.

  • alexey_larionovalexey_larionov UKMember

    A tool to fix this specific error is available in htsjdk library (previously known as, for instance, sam-1.99.jar).
    This library is provided with picard. It can be used like this:

    java -cp /path/to/picard/bin/htsjdk-1.133.jar htsjdk.samtools.FixBAMFile source.bam fixed.bam

    Obviously, path to picard and version of htsjdk should be adjusted to match ones in your system :)

    See more details about this solution here:
    https://sourceforge.net/p/samtools/mailman/message/31853465/

  • el116el116 usaMember

    @alexey_larionov said:
    A tool to fix this specific error is available in htsjdk library (previously known as, for instance, sam-1.99.jar).
    This library is provided with picard. It can be used like this:

    java -cp /path/to/picard/bin/htsjdk-1.133.jar htsjdk.samtools.FixBAMFile source.bam fixed.bam

    Obviously, path to picard and version of htsjdk should be adjusted to match ones in your system :)

    See more details about this solution here:
    https://sourceforge.net/p/samtools/mailman/message/31853465/

    Hello,

    I am encountering the same error as the original poster, but it appears that FixBamFile has been deprecated and removed from htsjdk:

    https://github.com/samtools/htsjdk/pull/947
    https://github.com/samtools/htsjdk/pull/1213

    Would anyone have suggestions on how to resolve this issue with the latest version of htsjdk without FixBamfile? Thank you!

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    HI @el116,

    From reading online, it seems like the tool was not consistent, so it was removed.

    I am just pasting Sheila's original suggestion here for reference:

    I don't see a problem with setting validation stringency = lenient except for the annoyance of getting the messages. My concern is that the bam index associated with the bam might be incorrect if it is based on these bin numbers.
    
    The only way I can think of to fix this is to fix the BAM. In order to do that, you would need to convert to SAM format, which does not have a field for indexing bin, and then back to BAM. That would be the safest thing, followed by recreating the BAI.
    

    So you can do this with GATK tools using the following commands.

    gatk SamFormatConverter - to convert Bam to Sam

    gatk SamFormatConverter - to convert Sam back to Bam

    gatk BuildBamIndex - to create the Bam index file.

    Also, please make sure your Picardtools and GATK versions are up to date to the most recent versions.

Sign In or Register to comment.