The current GATK version is 3.4-46

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# BaseRecalibrator in 2.4-9 RUNTIME ERROR

São Paulo - BrazilPosts: 28Member
edited March 2013

Hello,

sorry if i missed the same problem in other threads in the forum... but we are having trouble running BaseRecalibrator in a sample and i couldn't find the solution.

I tried many steps and here is what i've found until now:

1 - Other samples work fine

2 - Running picard ValidateSamFile in realigned.bam (after IndelRealigner) gives many erros :
2a - Mate negative strand flag does not match read negative strand flag of mate
2b - Mate alignment does not match alignment start of mate
3c - Value was put into PairInfoMap more than once. (fatal)

3 - Running BaseRecalibrator with option -L 1:428-249250621 works fine!

After the fact that -L works fine i discarded the problem in vcf files and reference file. I don't know how to go further in this investigation since GATK 1 realined.bam also gives me the errors in (2) and those error are peanuts comparing the total number of reads.

The big difference here is that we're are using bwa7.

Any ideas? Thanks!

(i'm filtering out "secondary hits" given by bwa7 and will update this thread, if it works it may be helpful in the future)

GATK output:

INFO 14:11:47,441 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:11:47,443 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.4-9-g532efad, Compiled 2013/03/19 07:35:36
INFO 14:11:47,443 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 14:11:47,443 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 14:11:47,447 HelpFormatter - Program Args: -nct 8 -T BaseRecalibrator -I /mnt/work/rlb/pac661825//OUT_661825.realigned.bam -R ../data/databases//1KGP/GRCh37_female_exome_mt1kg.fasta --knownSites ../data/databases//dbSNP/dbSNP_137/00-All.vcf -o /mnt/work/rlb/pac661825//OUT_661825.grp
INFO 14:11:47,447 HelpFormatter - Date/Time: 2013/03/26 14:11:47
INFO 14:11:47,447 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:11:47,447 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:11:47,458 ArgumentTypeDescriptor - Dynamically determined type of ../data/databases/dbSNP/dbSNP_137/00-All.vcf to be VCF
INFO 14:11:47,500 GenomeAnalysisEngine - Strictness is SILENT
INFO 14:11:47,558 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 14:11:47,565 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:11:47,577 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO 14:11:47,587 RMDTrackBuilder - Loading Tribble index from disk for file ../data/databases/dbSNP/dbSNP_137/00-All.vcf
INFO 14:11:47,704 MicroScheduler - Running the GATK in parallel mode with 8 total threads, 8 CPU thread(s) for each of 1 data thread(s), of 8 processors available on this machine
INFO 14:11:47,745 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files
INFO 14:11:47,750 GenomeAnalysisEngine - Done creating shard strategy
INFO 14:11:47,750 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 14:11:47,750 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 14:11:47,773 BaseRecalibrator - The covariates being used here:
INFO 14:11:47,773 BaseRecalibrator - ReadGroupCovariate
INFO 14:11:47,773 BaseRecalibrator - QualityScoreCovariate
INFO 14:11:47,773 BaseRecalibrator - ContextCovariate
INFO 14:11:47,774 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 14:11:47,774 BaseRecalibrator - CycleCovariate
INFO 14:11:47,776 ReadShardBalancer$1 - Loading BAM index data for next contig INFO 14:11:47,777 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO 14:12:18,626 ProgressMeter - 1:15956928 1.10e+06 30.0 s 28.0 s 0.5% 95.1 m 94.6 m
INFO 14:12:48,655 ProgressMeter - 1:34102053 2.70e+06 60.0 s 22.0 s 1.1% 89.0 m 88.0 m
INFO 14:13:18,685 ProgressMeter - 1:59096606 4.50e+06 90.0 s 20.0 s 1.9% 77.1 m 75.6 m
INFO 14:13:48,714 ProgressMeter - 1:103467532 5.90e+06 120.0 s 20.0 s 3.4% 58.7 m 56.7 m
INFO 14:14:18,745 ProgressMeter - 1:153234111 7.50e+06 2.5 m 20.0 s 5.0% 49.5 m 47.0 m
INFO 14:14:48,774 ProgressMeter - 1:172414433 9.30e+06 3.0 m 19.0 s 5.7% 53.1 m 50.1 m
INFO 14:15:19,054 ProgressMeter - 1:208266349 1.10e+07 3.5 m 19.0 s 6.9% 51.3 m 47.8 m
INFO 14:15:49,095 ProgressMeter - 1:247611815 1.27e+07 4.0 m 19.0 s 8.2% 49.3 m 45.2 m
INFO 14:15:56,507 GATKRunReport - Uploaded run statistics report to AWS S3

##### ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (0) > (-1) STOP -- this should never happen -- call Mauricio!
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:230) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:218)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

##### ERROR ------------------------------------------------------------------------------------------
Post edited by Geraldine_VdAuwera on
Tagged:

Heng Li has just fixed this bug on BWA.

The latest version in git should work now.

• São Paulo - BrazilPosts: 28Member

I forgot to say that it would be nice to give us Mauricio's phone number!

edited March 2013

it's 'call mauricio' in the epic sense of the word. Like scream to the mountains calling for him.

Post edited by Carneiro on
• São Paulo - BrazilPosts: 28Member
edited March 2013

and he replies!

So, i was wrong about the locus regarding the problem. It's in 2:7273810-7273820, which still gives me the problem and i was able to filter the reads out of the bam and picardTools ValidateSam gives me:

Am I on the right path?

Post edited by brdido on
• São Paulo - BrazilPosts: 28Member

I did a mistake, please do not consider the last comment. I did a wrong "small bam" with the reads, their pairs exists... So i'm back with the problem in 2:7273810-7273820 locus....

• São Paulo - BrazilPosts: 28Member

The original reference in gatk bundle gives me the same error.

• São Paulo - BrazilPosts: 28Member
edited March 2013

I tested with all reads and filtered out those two above, it goes until chr10 until next similar error. So, it must be something in my pipeline before GATK? This was done with bwa0.7 mem and sorted with novosort then IndelReligner.

Here goes the reads filtered out (that i believe are the problematics):

HWI-ST993:370:D1PHEACXX:3:1206:13326:14308 147 2 7273691 42 24D101M = 7273694 -122 GGTACTCTGCTGGGCTGGTTCCCTGGTACTCTGCTGGGCTGGTTCCCTGGTACTCAGCTGGCCTGGCCCCCTGCTGCTCTGCTAGCCTCATCCCCTGCCAC AA>:@DBCD@BDBABA?BBDB@>@ RG:Z:661825 NM:i:25 MQ:i:42 AS:i:66 XS:i:54
HWI-ST993:370:D1PHEACXX:3:1206:13326:14308 99 2 7273694 42 101M = 7273691 122 ACTCTGCTGGGCTGGTTCCCTGGTACTCTGCTGGGCTGGTTCCCTGGTACTCTGCTGGGCTGGTTCCCTGGTACTCAGCTGGCCTGGCCCCCTGCTGCTCT @CCFFFFFGGHHHJDBGGIJJJJHGIJJJEHIIHIJIIJJJJJJJJJFHIGHIJIJJ@CHIIJGHIEHHHEBCFE?DAEEEDDDDBDDDDDDDDDDCDCDA RG:Z:661825 NM:i:1 MQ:i:42 AS:i:96 XS:i:77
@CFFFFFFD?FFIIJIJJJJJFIJIDHIEIIEIICGGGGGHGHFGFGHGCGGEDHGFGHGIEGCEHCFHE@;CDDECEEECDECAACCCDCDDDDC@EDEDEEDEDDFFEHHFEEHHE=JIIGIHHEDDGGAD<IIGIJIHFBGDGDIHIHHGHGFIGIGHHHHDFFDDB@@ RG:Z:661825 NM:i:0 MQ:i:60 AS:i:101 XS:i:0

Post edited by brdido on

Even if there is something wrong with your reads, the GATK should handle the error gracefully and tell you what the problem is rather than blowing up, so we will look at this more closely. We may need some more info from you to figure this out -- please stand by and we'll get back to you asap.

Geraldine Van der Auwera, PhD

• São Paulo - BrazilPosts: 28Member

Thanks! I'll be glad to help. And i'll be updating the issue if I have any more clues.

edited March 2013

There are several things here that make me nervous.

First your BAM file is not passing validation. You say you are using bwa7, are you referring to BWA 0.7.3a ? If so, are you using bwa aln, sw or mem? If it's mem, are you using the -M flag? All are good leads as to why your BAM is malformed.

Second, You say that the validation of the BAM file only incur errors after Indel Realignment? Is it passing validation before indel realignment?

Third, you should never run BaseRecalibrator with -L unless you really know what you're doing. -L 1:428-249250621 is not nearly enough data to calibrate the error model for base recalibration. When you say 'it works fine' what do you mean? That it doesn't give you errors, or that it produces the right output? Have you looked at the recalibration plots?

I did not understand this sentence:

"After the fact that -L works fine i discarded the problem in vcf files and reference file. I don't know how to go further in this investigation since GATK 1 realined.bam also gives me the errors in (2) and those error are peanuts comparing the total number of reads."

can you clarify?

Post edited by Carneiro on
• São Paulo - BrazilPosts: 28Member
edited March 2013

First your BAM file is not passing validation. You say you are using bwa7, are you referring to BWA 0.7.3a ? If so, are you using bwa aln, sw or mem? If it's mem, are you using the -M flag? All are good leads as to why your BAM is malformed.

Yes, i'm referring to BWA 0.7.3a-r367. Using 'bwa mem' and -M flag.
It's important to note that i'm using novosort and picard mark duplicate before getting into GATK. I'm reviewing each step right now.

Second, You say that the validation of the BAM file only incur errors after Indel Realignment? Is it passing validation before indel realignment?

The errors (2a and 2b) occurs before Indel Realignment (too) BUT the error 2c (3c in the original message, sorry about that) only happens after IndelRealigner.

Third, you should never run BaseRecalibrator with -L unless you really know what you're doing. -L 1:428-249250621 is not nearly enough data to calibrate the error model for base recalibration. When you say 'it works fine' what do you mean? That it doesn't give you errors, or that it produces the right output? Have you looked at the recalibration plots?

I just used -L option to surround the error. When i say 'it works fine' i mean i don't get the error.

I did not understand this sentence:

"After the fact that -L works fine i discarded the problem in vcf files and reference file. I don't know how to go further in this >investigation since GATK 1 realined.bam also gives me the errors in (2) and those error are peanuts comparing the total number of >reads."

I mixed up 2 things:

1 - The parameter -L gave me a hint that reference file and VCF are ok.
2 - I have the same errors (2a and 3b) from SamFileValidator int the bwa 0.6.2-r126 + GATK-1.6-13 and GATK don't complain about it, but considering the number of reads in the sample, i wouldn't worry about it because there are only few of these erros (2a and 2b).

It's clearer now?

Post edited by Geraldine_VdAuwera on
edited March 2013

2a and 2b are intrinsic of your data, but shouldn't matter to the gatk.

error 2c will only happen if you use BWA MEM because it has splits single read alignments into multiple ones. I have updated the indel realigner to understand those reads, but we would have to check it on your data to see what's going on.

Geraldine can coordinate with you to send us a snippet of the offending BAM file so we can debug the Indel Realigner to see what's happening here.

Post edited by Carneiro on

Hi @brdido,

As Mauricio said you'll need to upload a snippet of your bam file so we can reproduce the error locally. Please see the detailed instructions here: http://www.broadinstitute.org/gatk/guide/article?id=1894

Let me know if you need any help to do this.

Geraldine Van der Auwera, PhD

• São Paulo - BrazilPosts: 28Member

Ok! Thanks a lot. I'll prepare the files needed ant get back to you..

• Posts: 3Member

Hi,

I seem to be getting the same error. We have reads mapped with stampy, duplicates removed with picard, indel realignment with GATK. Then I attempted to run BaseReaclibrator

Args: -T BaseRecalibrator -R ../../pantro3/panTro3.bamorder.fasta -I ../indel_realigner/dennis.realigned.bam -knownSites ../init_calling_per_chrom/pedigree.gatk.raw.vcf -o dennis.recal.grp

Error:

INFO 20:44:03,973 ProgressMeter - chr10:130748575 1.03e+08 2.9 h 100.0 s 10.9% 26.3 h 23.5 h
INFO 20:45:03,991 ProgressMeter - chr10:132896073 1.03e+08 2.9 h 100.0 s 10.9% 26.3 h 23.5 h
INFO 20:46:04,009 ProgressMeter - chr10_GL391380_random:214 1.04e+08 2.9 h 100.0 s 11.0% 26.4 h 23.5 h
INFO 20:47:04,029 ProgressMeter - chr10_AACZ03166204_random:694 1.04e+08 2.9 h 100.0 s 11.0% 26.5 h 23.6 h
INFO 20:48:04,058 ProgressMeter - chr10_AACZ03166550_random:1274 1.05e+08 2.9 h 100.0 s 11.0% 26.6 h 23.6 h

##### ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (100) > (99) STOP -- this should never happen -- call Mauricio!
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:230) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:218)

##### ERROR ------------------------------------------------------------------------------------------

INFO 20:48:45,045 HelpFormatter - --------------------------------------------------------------------------------

Let me know if you need any data to debug.

Isaac

Hi Isaac,

Yes, it would help if you could upload a snippet of your file to our FTP server. Please see the article I linked to above for full instructions.

Geraldine Van der Auwera, PhD

• São Paulo - BrazilPosts: 28Member

Geraldine and Mauricio, i've uploaded the files in your FTP server. RLB/bugReportBrdido4725.tar.gz

If anything else is needed please let me know. Thanks!

• São Paulo - BrazilPosts: 28Member

Thanks @Carneiro, the problem is before IndelRealigner. Cheers.

Heng Li has just fixed this bug on BWA.

The latest version in git should work now.

• Posts: 6Member
edited April 2013

I wanted to report that I had the same error. I used bwasw. The issue was corrected after including the flag -rf BadCigar as suggeted above.

bwa version: 0.7.3a

Post edited by Mutagenic on
• Posts: 61Member ✭✭

Same identical error here.

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (100) > (99) STOP -- this should never happen -- call Mauricio!

I'll try to upload my file tomorrow. Thanks for all your help guys!

I don't think we need a file for this -- version 2.5 should now catch this issue cleanly. Can you please upgrade your gatk version and run again?

Geraldine Van der Auwera, PhD

• Posts: 61Member ✭✭

unfortunately it's happening again.

the version is version 2.5-2-gf57256b
and the error looks very similar (except the call to Mauricio :-P)

##### ERROR stack trace
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (100) > (99) STOP -- this should never happen, please check read: FCD1R7BACXX:2:2101:19788:52999#ATGAACCT 1/2 100b aligned read. (CIGAR: 94M4I2M3D)

Ah, it's an issue with the cigar -- the read ends in deletions. This is undesirable -- we've seen this in data output by BWA-mem. More recent versions of BWA should no longer do that. If you don't want to have to realign your data, use the bad cigar filter (add -rf BadCigar to your command).

Geraldine Van der Auwera, PhD

• Posts: 61Member ✭✭

thanks @Geraldine_VdAuwera,
it's strange because I'm using the very latest BWA version, maybe I should report the issue there as well.

I imagine I can add "-rf BadCigar" as an option in my scala script

would you be so kind to clarify which is the correspondent for Queue?
might be useful for others as well.

thank you so much!

flescai, read filters ("-rf") are engine arguments, therefore available to every walker in queue or in the command line. You can add it to your queue script directly.

• Posts: 1Member

I'm getting the same error from alignments with bwa-0.7.4-r385.
Seems to be solved with the -rf BadCigar argument

note that the BadCigar filter doesn't "solve" anything. It just discards the reads with malformed (unsupported) cigars from your data. Is your cigar also with deletions in the ends? If so you should report this issue to BWA.

• Posts: 16Member

I had this problem when running bwa aln 0.7.3a, but can confirm that updating bwa to 0.7.5a has fixed this for me.