Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to make Picard and BWA compatible for mtDNA

ericminikelericminikel Member
edited January 2013 in Ask the GATK team

Picard appears not to like the way BWA codes mtDNA. I am doing human exome sequencing using a copy of hg19 which I obtained from UCSC and indexed using BWA per the instructions here:

Example 1

[Tue Aug 28 12:45:16 EDT 2012] net.sf.picard.sam.SortSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=125435904
FAQ: http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Non-numeric value in ISIZE column; Line 3982
Line: FCC0CHTACXX:1:1101:14789:3170#TAGCTTAT 117 chrM 304415842 0 100M = -1610645157 2379906297 TGCGACTTGGAAGCGGATTCAGAGGACAGGACAGAACACTTGGGCAAGTGAATCTCTGTCTGTCTGTCTGTCTCATTGGTTGGTTTATTTCCATTTTCTT [email protected]<:>[email protected]JJJIJIJIJJJJIJJJGIHHGHFFEFFFCCC RG:Z:1868 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:39G45G14 XA:Z:chrM,-391302964,100M,2;
at net.sf.samtools.SAMTextReader.reportFatalErrorParsingLine(SAMTextReader.java:223)
at net.sf.samtools.SAMTextReader.access$400(SAMTextReader.java:40)
at net.sf.samtools.SAMTextReader$RecordIterator.parseInt(SAMTextReader.java:293)
at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:394)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:278)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:250)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:641)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:619)
at net.sf.picard.sam.SortSam.doWork(SortSam.java:68)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.SortSam.main(SortSam.java:57)

Example 2

java -jar ~/bin/picard-tools-1.74/MarkDuplicates.jar \
INPUT=1sorted.bam \
OUTPUT=1dedup.bam \
ASSUME_SORTED=true \
METRICS_FILE=metrics \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=LENIENT

...
Ignoring SAM validation error: ERROR: Record 691, Read name FCC0CHTACXX:1:1302:4748:176644#GGCTACAT, Mate Alignment start (436154938) must be <= reference sequence length (16571) on reference chrM
Ignoring SAM validation error: ERROR: Record 692, Read name FCC0CHTACXX:1:2104:8494:167812#GGCTACAT, Mate Alignment start should != 0 because reference name != *.
Ignoring SAM validation error: ERROR: Record 693, Read name FCC0CHTACXX:1:1201:21002:183608#GGCTACAT, Mate Alignment start should != 0 because reference name != *.
Ignoring SAM validation error: ERROR: Record 694, Read name FCC0CHTACXX:1:2303:3184:35872#GGCTACAT, Mate Alignment start (436154812) must be <= reference sequence length (16571) on reference chrM
...

I've truncated the output; in fact it throws such an error for every single line of mitochondrial reads.

I suspect I could solve this by writing my own script to go in and change the way one column is coded, but more broadly, I am interested in the answer to "how do you make BWA, Picard and GATK work seamlessly together without needing to do your own scripting"?

Post edited by Geraldine_VdAuwera on
Tagged:

Best Answer

Answers

Sign In or Register to comment.