The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

Mapping, processing and duplicate marking with Picard tools: ValidateSamFile errors

VanillaVanilla Member Posts: 6

I am trying to follow the best practices for mapping my (Paired-end Illumina HiSeq) reads to the reference, by following this presentation:

From what I understand, I should use MergeBamAlignment to clean up the output from bwa, and then use this cleaned up output for the rest of the analysis. However, when I run ValidateSamFile after running MergeBamAlignment I get a lot of errors, and running CleanSam on the file does not resolve any of them. What am I doing wrong? I've tried searching the web for more details about MergeBamAlignment but I haven't been able to find much. Please let me know if you require any additional information.

How I ran MergeBamAlignment
picard-tools MergeBamAlignment \
UNMAPPED_BAM=unmapped_reads.sam \
ALIGNED_BAM=aligned_reads.sam \
OUTPUT=aligned_reads.merged.bam \
REFERENCE_SEQUENCE=/path/to/reference.fasta \
PAIRED_RUN=true # Why is this needed?

Error report from ValidateSamFile
## HISTOGRAM java.lang.String
Error Type Count
ERROR:INVALID_CIGAR 5261
ERROR:MATES_ARE_SAME_END 30
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 30

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,118 admin

    Hi there,

    Some of these errors aren't fixable by CleanSam. Try running FixMate to solve the mate pair errors. For the cigar errors, you'll be able to filter those out using -rf BadCigar in the subsequent GATK steps.

    Geraldine Van der Auwera, PhD

  • VanillaVanilla Member Posts: 6

    Thank you for your response. I've tried running FixMate on the file to fix the errors but evidently I'm doing something wrong, since this only leads to more errors!

    picard-tools ValidateSamFile INPUT=input.bam MODE=SUMMARY
    
    ## HISTOGRAM    java.lang.String
    Error Type  Count
    ERROR:INVALID_CIGAR 5261
    ERROR:MATES_ARE_SAME_END    30
    ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 30
    

    Above is the sam validation for the original file. To fix these errors, I ran
    picard-tools FixMateInformation INPUT=input.bam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT

    However, after running FixMateInformation the number of errors increased, and I'm not sure what I did wrong that could cause this.

    picard-tools ValidateSamFile INPUT=output.bam MODE=SUMMARY
    
    ## HISTOGRAM    java.lang.String
    Error Type  Count
    ERROR:INVALID_CIGAR 5261
    ERROR:MATES_ARE_SAME_END    27346
    ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 54
    ERROR:MISMATCH_FLAG_MATE_UNMAPPED   108
    

    Thanks again for your time

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,118 admin

    Hi @Vanilla,

    Sorry for the delayed response. I'm not 100% sure but I think this is expected. Some of these mapping issues can't be fixed as such, and if so, what the tools do is they flag the problem reads so downstream tools know how to handle them (or ignore them) appropriately. I will check but I think you can move past these errors.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.