The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Picard AddOrReplaceReadgroups doesn't like my BAM file

kohrnbkohrnb Portland State UniversityMember Posts: 19
edited March 2016 in Ask the GATK team

I'm processing chloroplast-captured DNA sequence, and picard tools doesn't like a few of my files. My pipeline up to the failure point (as best I can tell) is:

Adapter-trimming with cutadapt
Quality trimming with sickle
Alignment to a reference genome (KM360047.1) using bwa aln + sampe
sorting with samtools (using -F 4 option to remove unmapped reads)
index with samtools
reheader with samtools (to make sure the file is marked as sorted)
AddOrReplaceReadGroups with picardtools

The command for AddOrReplaceReadGroups is:

java -jar /disk/scratch/Cruzan/programs/picard-tools-1.128/picard.jar AddOrReplaceReadGroups INPUT=LascalWhetSSL15_pe.sort.clean.bam OUTPUT=LascalWhetSSL15_pe.sort.rg.bam RGLB=7_2 RGPL=Illumina RGPU=TAATGCGC-GGCTCTGA RGSM=LascalWhetSSL15

and the message printed on screen is:

[Thu Mar 17 13:03:23 PDT 2016] picard.sam.AddOrReplaceReadGroups INPUT=LascalWhetSSL15_pe.sort.clean.bam OUTPUT=LascalWhetSSL15_pe.sort.rg.bam RGLB=7_2 RGPL=Illumina RGPU=TAATGCGC-GGCTCTGA RGSM=LascalWhetSSL15 RGID=1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu Mar 17 13:03:23 PDT 2016] Executing as bkohrn@circe.rc.pdx.edu on Linux 2.6.32-573.12.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_91-mockbuild_2015_10_21_19_56-b00; Picard version: 1.128(c8e12338d226532b30e9ecdbf33180a073c3ffc7_1421081159) IntelDeflater
INFO 2016-03-17 13:03:23 AddOrReplaceReadGroups Created read group ID=1 PL=Illumina LB=7_2 SM=LascalWhetSSL15

[Thu Mar 17 13:03:23 PDT 2016] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 1, Read name D00735:83:C8TJUANXX:4:2208:17299:69297, Mate Alignment start should != 0 because reference name != *.
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:439)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:643)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.(BAMFileReader.java:616)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.(BAMFileReader.java:604)
at htsjdk.samtools.BAMFileReader.getIterator(BAMFileReader.java:312)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.iterator(SamReader.java:424)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.iterator(SamReader.java:334)
at picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:113)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

I looked at the first lines of the bam file in question, and it shows:
D00735:83:C8TJUANXX:4:2208:17299:69297 73 gi|704001763|gb|KM360047.1| 0 25 99M = 0 0 AACAGATCGTATGGTAGGACACAAATTGGGAGAATTTTCGCCTACTTTAAATTTCCGAGGACATGCAAAAAGCGATAATAGATCTCGTCGTTAATTTAA @B@AC
D00735:83:C8TJUANXX:4:1301:7783:38994 145 gi|704001763|gb|KM360047.1| 1 37 55M = 150478 150422 GGCGAACGACGGGAATTGAACCCGCGCATGGTGGATTCACAATCCACTGCCTTGA E/F0E/EB=/E111C1=0/<C//E9>11@=>11@F1E1=1=1GFC=/GGGFDE
D00735:83:C8TJUANXX:4:2202:11078:83774 177 gi|704001763|gb|KM360047.1| 5 37 49M1I39M = 82574 82581 AACGACGGGAATTGAACCCGCGCATGGTGGATTCACAATCCACTGCCTTGATCCACTTGGCTACATCCGCCCCTTTACTAATATCTCGT GGGGG
D00735:83:C8TJUANXX:4:2105:1447:79915 153 gi|704001763|gb|KM360047.1| 5 37 49M1I35M = 5 0 AACGACGGGAATTGAACCCGCGCATGGTG

so the first line does indeed seem to have a position of 0 (for both self and mate; I'm guessing this has to do with how BWA handles unmapped mates, along with the fact that the chloroplast genome is circular). Any idea how I can keep this from happening in the future? I'd like to run these samples through indel-realigner, and I need AddOrReplaceReadgroups working first.

I'm using BWA version 0.7.5a-r405, Samtools version 0.1.17 (for reasons of downstream-pipline processes), and PicardTools version 1.128 (I also tried using version 1.141, which is the most recent version that will run on my cluster, but it failed the same way).

Any advice about how to avoid this problem in the future would be appreciated; I want to be able to automate this part of processing, and until I get this worked out, I can't.

Issue · Github
by Sheila

Issue Number
718
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,583 admin

    @kohrnb
    Hi,

    Thanks for reporting the potential solution. Yes, we were thinking the circular reference might be the issue, as GATK tools are not designed to handle it.

    Let us know how everything works!

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin

    The problem with this strategy is that for the portion that you copied, reads will map non-uniquely so they will be considered suspect. And for what does get mapped with useable scores, the coverage will be split between the two copies so calls made in those regions will probably be only weakly supported.

    I'm not sure what is the right way to deal with this problem but I don't think the solution presented here is going to work out. You should look up how others deal with this in e.g. the microbial literature.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.