The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.

Mapping, processing and duplicate marking with Picard tools: ValidateSamFile errors

I am trying to follow the best practices for mapping my (Paired-end Illumina HiSeq) reads to the reference, by following this presentation:

From what I understand, I should use MergeBamAlignment to clean up the output from bwa, and then use this cleaned up output for the rest of the analysis. However, when I run ValidateSamFile after running MergeBamAlignment I get a lot of errors, and running CleanSam on the file does not resolve any of them. What am I doing wrong? I've tried searching the web for more details about MergeBamAlignment but I haven't been able to find much. Please let me know if you require any additional information.

How I ran MergeBamAlignment
picard-tools MergeBamAlignment \
UNMAPPED_BAM=unmapped_reads.sam \
ALIGNED_BAM=aligned_reads.sam \
OUTPUT=aligned_reads.merged.bam \
REFERENCE_SEQUENCE=/path/to/reference.fasta \
PAIRED_RUN=true # Why is this needed?

Error report from ValidateSamFile
## HISTOGRAM java.lang.String
Error Type Count


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    Some of these errors aren't fixable by CleanSam. Try running FixMate to solve the mate pair errors. For the cigar errors, you'll be able to filter those out using -rf BadCigar in the subsequent GATK steps.

  • Thank you for your response. I've tried running FixMate on the file to fix the errors but evidently I'm doing something wrong, since this only leads to more errors!

    picard-tools ValidateSamFile INPUT=input.bam MODE=SUMMARY
    ## HISTOGRAM    java.lang.String
    Error Type  Count

    Above is the sam validation for the original file. To fix these errors, I ran
    picard-tools FixMateInformation INPUT=input.bam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT

    However, after running FixMateInformation the number of errors increased, and I'm not sure what I did wrong that could cause this.

    picard-tools ValidateSamFile INPUT=output.bam MODE=SUMMARY
    ## HISTOGRAM    java.lang.String
    Error Type  Count

    Thanks again for your time

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @Vanilla,

    Sorry for the delayed response. I'm not 100% sure but I think this is expected. Some of these mapping issues can't be fixed as such, and if so, what the tools do is they flag the problem reads so downstream tools know how to handle them (or ignore them) appropriately. I will check but I think you can move past these errors.

  • jingmengjingmeng AustraliaMember

    Hi, when I run picard ValidateSamFile, it gives a different error report:

    INFO 2017-06-28 15:26:12 SamFileValidator Validated Read 1,100,000,000 records. Elapsed time: 04:23:53s. Time for last 10,000,000: 106s. Last read position: chrX:143,377,982
    INFO 2017-06-28 15:28:46 SamFileValidator Validated Read 1,110,000,000 records. Elapsed time: 04:26:26s. Time for last 10,000,000: 153s. Last read position: chrEBV:121,817
    INFO 2017-06-28 15:31:15 SamFileValidator Validated Read 1,120,000,000 records. Elapsed time: 04:28:55s. Time for last 10,000,000: 149s. Last read position: /

    HISTOGRAM java.lang.String

    Error Type Count

    What does this error mean? How can I fix it? Thanks for your time!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Looks like you have records that do not respect the sorting order of your bam. You should re-sort the bam -- there is a Picard tool to do that.
Sign In or Register to comment.