If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Picard AddOrReplaceReadgroups doesn't like my BAM file
I'm processing chloroplast-captured DNA sequence, and picard tools doesn't like a few of my files. My pipeline up to the failure point (as best I can tell) is:
Adapter-trimming with cutadapt
Quality trimming with sickle
Alignment to a reference genome (KM360047.1) using bwa aln + sampe
sorting with samtools (using -F 4 option to remove unmapped reads)
index with samtools
reheader with samtools (to make sure the file is marked as sorted)
AddOrReplaceReadGroups with picardtools
The command for AddOrReplaceReadGroups is:
java -jar /disk/scratch/Cruzan/programs/picard-tools-1.128/picard.jar AddOrReplaceReadGroups INPUT=LascalWhetSSL15_pe.sort.clean.bam OUTPUT=LascalWhetSSL15_pe.sort.rg.bam RGLB=7_2 RGPL=Illumina RGPU=TAATGCGC-GGCTCTGA RGSM=LascalWhetSSL15
and the message printed on screen is:
[Thu Mar 17 13:03:23 PDT 2016] picard.sam.AddOrReplaceReadGroups INPUT=LascalWhetSSL15_pe.sort.clean.bam OUTPUT=LascalWhetSSL15_pe.sort.rg.bam RGLB=7_2 RGPL=Illumina RGPU=TAATGCGC-GGCTCTGA RGSM=LascalWhetSSL15 RGID=1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu Mar 17 13:03:23 PDT 2016] Executing as [email protected] on Linux 2.6.32-573.12.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_91-mockbuild_2015_10_21_19_56-b00; Picard version: 1.128(c8e12338d226532b30e9ecdbf33180a073c3ffc7_1421081159) IntelDeflater
INFO 2016-03-17 13:03:23 AddOrReplaceReadGroups Created read group ID=1 PL=Illumina LB=7_2 SM=LascalWhetSSL15
[Thu Mar 17 13:03:23 PDT 2016] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.00 minutes.
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 1, Read name D00735:83:C8TJUANXX:4:2208:17299:69297, Mate Alignment start should != 0 because reference name != *.
I looked at the first lines of the bam file in question, and it shows:
D00735:83:C8TJUANXX:4:2208:17299:69297 73 gi|704001763|gb|KM360047.1| 0 25 99M = 0 0 AACAGATCGTATGGTAGGACACAAATTGGGAGAATTTTCGCCTACTTTAAATTTCCGAGGACATGCAAAAAGCGATAATAGATCTCGTCGTTAATTTAA @[email protected]
D00735:83:C8TJUANXX:4:1301:7783:38994 145 gi|704001763|gb|KM360047.1| 1 37 55M = 150478 150422 GGCGAACGACGGGAATTGAACCCGCGCATGGTGGATTCACAATCCACTGCCTTGA E/F0E/EB=/E111C1=0/<C//E9>[email protected]=>[email protected]=1=1GFC=/GGGFDE
D00735:83:C8TJUANXX:4:2202:11078:83774 177 gi|704001763|gb|KM360047.1| 5 37 49M1I39M = 82574 82581 AACGACGGGAATTGAACCCGCGCATGGTGGATTCACAATCCACTGCCTTGATCCACTTGGCTACATCCGCCCCTTTACTAATATCTCGT GGGGG
D00735:83:C8TJUANXX:4:2105:1447:79915 153 gi|704001763|gb|KM360047.1| 5 37 49M1I35M = 5 0 AACGACGGGAATTGAACCCGCGCATGGTG
so the first line does indeed seem to have a position of 0 (for both self and mate; I'm guessing this has to do with how BWA handles unmapped mates, along with the fact that the chloroplast genome is circular). Any idea how I can keep this from happening in the future? I'd like to run these samples through indel-realigner, and I need AddOrReplaceReadgroups working first.
I'm using BWA version 0.7.5a-r405, Samtools version 0.1.17 (for reasons of downstream-pipline processes), and PicardTools version 1.128 (I also tried using version 1.141, which is the most recent version that will run on my cluster, but it failed the same way).
Any advice about how to avoid this problem in the future would be appreciated; I want to be able to automate this part of processing, and until I get this worked out, I can't.