The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Unified Genotyper Error

San Diego, CAMember Posts: 13

Hi. I was hoping you could help track down the source of this error. I haven't seen this exact error anywhere here in the forum. I am trying to run unified genotyper on 48 bam files at once. Each bam file has been aligned using bwa and processed through the best practices as mentioned on the website. They all pass validation with no error (I had to run picard tools, cleanSamFile and fixMateInformation, along with removing one read from a single file). The data is targeted sequencing. I am running GATKLite 2.3-5-gd738181.

The error message is:
Somehow the requested coordinate is not covered by the read. Alignment 189665677 | 48M103S

The command line I am using is:
java -Xmx4g -jar pathToGATK.jar -R standardRefFileHG37 -I (48 bam files) -o OutputFile.vcf -nt 4 -nct 4 -T UnifiedGenotyper --dbsnp standardRef -glm Both -metrics metric.file -debug_file debug.txt -A DepthOfCoverage -A AlleleBalance

Is there any way to see if it is a particular bam file that is causing the error? In the error message, what Alignment is it referring to? I am doing the processing on a machine that is not connected to the internet so it is harder for me to cut/copy paste information to here. Is there anything else that you would need to help me track down the source of this error?

In the meantime I am going to try to run it with a .bed file of targeted areas to see if that helps. Should I also try to process each file individually? Thank you for your input.

Lisa

Tagged:

Hi Lisa, can you tell me if you reduced your bams with ReduceReads?

Geraldine Van der Auwera, PhD

• San Diego, CAMember Posts: 13

I did not as the ReduceReads is not available with the GATKLite version. My coverage is anticipated to be at most 80X so I didn't think it was that important anyhow as my files aren't that big.

OK, just checking -- reduced bams were throwing a similar error a little while ago and I wanted to rule that out.

I'm not sure what is going on, so let me know if you still get the same error when you run with the intervals file. FYI, for targeted sequencing you'll get better performance if you use the intervals file at all steps.

Geraldine Van der Auwera, PhD

• San Diego, CAMember Posts: 13

Ok. It now finished without any error when using the targeted sequence file. I'm guessing there was probably just some sort of funny alignment in a file. Thank you.

• San Diego, CAMember Posts: 13

I just wanted to post some follow up with this in case other people may be having this error. I have found this error to be wholly unreproducible. I have run into it again when running different parameters. A re-start of the run often "fixes" it, i.e. it doesn't happen again and the run finishes through. I've also gotten an error when running something again that had finished the first time through with no issues. The Alignment number in the error message changes along with the text string after the pipe sign. Still don't know what is going on with it.

Thanks for following up. It will be interesting to see if this bug still occurs after release 2.4 (coming really really soon!) as we added a lot of internal tests that should prevent subtle bugs from getting out.

Geraldine Van der Auwera, PhD

• Member Posts: 6

Hi, sorry for tagging on to this thread, but I believe I have the same problem - I can open a new thread if you wish.

Like Lisa, I have experienced this error "randomly" and usually re-running works. I have tried GATK versions 2.4-9-g532efad and 2.5-2-gf57256b. Will test the nightly build today.

I get the same stack trace every time:

ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Alignment 8358067 | 20M1I36M87S
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:268) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:256)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

ERROR ------------------------------------------------------------------------------------------

but different Alignments:

ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 8358067 | 20M1I36M87S

I have not ReduceReads'd the bam files.

cheers,
Pallo

Hi Pallo,

Let me know if the nightly build fixes that for you. This is clearly an issue with soft-clipped reads but I'm not aware of any outstanding problems with those in the current version.

Geraldine Van der Auwera, PhD

• Member Posts: 6

Hi Geraldine, thanks for the reply.

No, build nightly-2013-06-05-g83f341b did not solve the problem. Ran the same job in parallel in two different directories, one finished, other gave same error as described above. I should point out that we have been having some file system hangups for the last couple of months, just in case that has anything to do with this GATK error, but as you point out, the cigar string rather suggests that this has to do with handling of soft clipped or otherwise non-perfectly mapped reads...

cheers,
Pallo

I see, thanks for checking. Could you please upload a bam snippet that reproduces the error so we can debug locally? Detailed instructions are here:

Geraldine Van der Auwera, PhD

• Member Posts: 21

Hi Dear GATK Team,
I've got a similar error message using the UnifiedGenotyper...

the abbreviated part of my code is as follows:

function UnifiedGenotyper () {
$JAVA -Xmx4g -jar$ProgDir$GATK/GenomeAnalysisTK.jar \ -T UnifiedGenotyper \ -R$reference \
-L $region \ -I ./$Reg/Reg.55069.realigned.recal.bam \ ... ... ... -I ./Reg/Reg.68332.realigned.recal.bam \ -nct 4 \ -o ./Reg/$Reg.ALL.UG.raw.vcf \ --dbsnp$dbSNP_work \
-out_mode EMIT_VARIANTS_ONLY \
-stand_call_conf 30.0 \
-stand_emit_conf 30.0 \
--genotype_likelihoods_model BOTH \
-l INFO  \
-log ./$Reg/$Reg.ALL.UnifiedGenotyper.log
}
UnifiedGenotyper

I then get the following error message:

##### ERROR ------------------------------------------------------------------------------------------

ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Alignment 1500564 | 33S24M1I12M1I11M1I15M3S
at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

ERROR ------------------------------------------------------------------------------------------

I hope its ok to post this here...

The HaplotypeCaller is still running (its been 11 days now...), but it seems to have made it past this position.
Thanks in advance for any insight.
Cheers,
Chris

@baesc, could you please upload a test file that reproduces the error? See link in my previous post for instructions.

Geraldine Van der Auwera, PhD

• Member Posts: 21

@Geraldine_VdAuwera, sorry for the delayed response. I couldn't reproduce the error... which I find pretty wierd. I ran a script over a 65 animals (bos Taurus) and found 7 of them with this exact error (but different positions) in the log file. I'll have a look at the others and try to reproduce it in them.
Have you had other stuff like that happening with the UnifiedGenotyper?
Cheers,
Chris

Darn, the Heisenbug pops up again... We've never had it happen in our hands but other users have reported it as a transient bug that goes away when they run the job again. Unfortunately without a test file we can't debug it. This is where Queue really comes in handy because it will automatically re-run the parts of the job that failed without re-running the whole thing.

Geraldine Van der Auwera, PhD

• Member Posts: 3

So we continue to have this error when trying to run UG on 1186 BAM files. The error seems to happen at different positions, and I can't reproduce the same position exactly each time. Since according to this thread it appears to potentially be an irregular bug, we checked out the latest GIT release today July 20, 2013, and edited the code so that the error wouldn't kill things. However, we're now getting an error from the build:

ERROR MESSAGE: The likelihoods model provided for the -glm argument (SNP) is not a valid option: null

This seems to be a different bug all together, and perhaps related to the nightly release of the source? I'm not sure, but no matter what i do with the -glm flag it fails. I assume maybe you guys are transitioning UG to change its behavior for -glm. Ultimately, my goal is to simply get the joint variant calling running without this original "Somehow the requested coordinate is not covered by the read." error...

Thanks in advance for any thoughts on this.

Hi @perin,

As far as I know we haven't touched the -glm argument at all anytime recently. There's no reason for it to not be working, and the current version in the Github repo passes all tests. You say you edited the code -- are you sure you didn't break something there?

Geraldine Van der Auwera, PhD

• Member Posts: 3

Thanks Geraldine. I suppose that our edit may have introduced the -glm issue. You are right, the untouched version works fine. Sorry about that. I do want to continue with trying to solve this problem as it seems to continue to plague us, and I don't seem to find any permanent solution to it. We're trying to break down running our samples on smaller regions, under the assumption that this is perhaps a resource allocation issue. I am running the UnifiedGenotyper on a list of 1186 BAM files, each WES samples with roughly 30x or more coverage. The server we are using has 96GB of RAM, dual 8-core xeon CPU's, and a large SSD disk for high I/O.

My first attempt was to run on only chr1, which again failed on us:

ERROR ------------------------------------------------------------------------------------------

Every time i run this, I get the same error, but with a different alignment. I am also not sure of how to even figure out where this coordinate is exactly, or which sample or samples are contributing to this problem.

The only solution that i seem to gather from my searching seems to be to 'rerun'. But again, this happens every time, and in different places. I'm at a bit of a loss right now, and looking for any advice. This seems to be a pretty persistent bug so i'd definitely like to figure out the real reason for it, and how to fix it.

Thank you.

Hmm, it looks like it's always happening in the presence of soft-clips -- weird that it's not consistently reproducible. Well, if you can narrow down the error to a region where the error reproduces every time (even if it's at different alignments) and small enough that you can reasonably upload the files to us, we'd be happy to take a stab at debugging this. But without test files we can't do anything...

Geraldine Van der Auwera, PhD

• Member Posts: 59

Hi I have a problem here. The first one is not related with GATK its about bwa. When I run bwa-mem with paired end files of around 8gb each I get a sam file of size 16 gb which is way smaller than what I usually get (around 40-50 gb). When I run GATK on that file all steps seem to run fine but the last step where I call Unified genotyper seems to throw the following error. It seems that file produced by bwa is still valid but for some reason I am getting this error. Can you kindly let me know how to correct it.

java.lang.IllegalArgumentException: FragmentUtils.create assumes that the incoming objects are ordered by SAMRecord alignment start, but saw a read C12222485 0 scaffold_1649 -23 1 90M * 0 0 TAAAATAGTCATACATAAAATACTTATAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATCCAAAAATTTTTTTATTT 9:::9;87;87:67::;:::::7:::::98:4998887787677877788788889999986898998976;9-5,,,,(8;:56*.845 BD:Z:GECEFHFIHHEHGHFBFFGDGHGFEEFGHFFCCCDCBBCCBBBBBAACCCDCBCEFFEGEEEFCFDFFCFDFFFBBCHGDDDEEJIFBFF RG:Z:Maverick BI:Z:FECDEFDEFFEFFEFBFFEDEFEEEDFEGFFDDCCCBBBCBBBBBBBBCBCCBBDEDCEDDDDCEDEECEEFFFBBBEGCCDCEHHGCFF RR:B:c,2,0,-1,0,1,-1,0,0,-1,0,0,0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,1,2,2,1,1,3,3,2,2,2,2,3,2,3,3,2,3,5,2,1,4,3,4,3,1,4,2,4,4,5,3,5,2,3,1,1,2,1,1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

ERROR MESSAGE: FragmentUtils.create assumes that the incoming objects are ordered by SAMRecord alignment start, but saw a read C12222485 0 scaffold_1649 -23 1 90M * 0 0 TAAAATAGTCATACATAAAATACTTATAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATCCAAAAATTTTTTTATTT 9:::9;87;87:67::;:::::7:::::98:4998887787677877788788889999986898998976;9-5,,,,(8;:56*.845 BD:Z:GECEFHFIHHEHGHFBFFGDGHGFEEFGHFFCCCDCBBCCBBBBBAACCCDCBCEFFEGEEEFCFDFFCFDFFFBBCHGDDDEEJIFBFF RG:Z:Maverick BI:Z:FECDEFDEFFEFFEFBFFEDEFEEEDFEGFFDDCCCBBBCBBBBBBBBCBCCBBDEDCEDDDDCEDEECEEFFFBBBEGCCDCEHHGCFF RR:B:c,2,0,-1,0,1,-1,0,0,-1,0,0,0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,1,2,2,1,1,3,3,2,2,2,2,3,2,3,3,2,3,5,2,1,4,3,4,3,1,4,2,4,4,5,3,5,2,3,1,1,2,1,1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

Your first question is out of scope for us -- we provide support fr GATK, not for BWA. If you have BWA-specific problems it would be best to contact the author of BWA for support.

As to your second question, it looks like your BAM file is not properly sorted. See our documentation on input files and requirements for more details. You can use Picard tools to sort the file correctly.

Geraldine Van der Auwera, PhD

Actually -- if you used ReduceReads, your second issue might be a bug that was introduced at that stage. Can you try running UnifiedGenotyper on the unreduced bam? If that works without error, then we'll need to get a test file from you to debug the ReduceReads error.

Geraldine Van der Auwera, PhD

• Member Posts: 3

FWIW I continued to have the "Somehow the requested coordinate is not covered by the read. Alignment etc..." error. No matter what i tried, on what machine it didn't work and would always fail at some point, and never the same position. I have no idea why this happens, and why its random but, as it was suggested, it is likely something related to soft clipping in novoalign. I have found a working solution for my problem, however, and it happened to improve our speed dramatically as well.

My problem was to run UG on 1186 Whole exome (avg 60x coverage) BAM files. I first broke up our genome into about 600 pieces using the logic behind the discussion on Biostar: http://www.biostars.org/p/77828/#77934 (specifically Pierre's solution)

I then created a script that would run UG (v2.6.4) on each of these pieces individually across our cluster. On our system this actually took only about 24 hours to complete, minus the merging and resorting of the VCFs. Originally UG was reporting 5-6 weeks runtime for the single run on a really powerful server. None of these 'smaller' pieces failed in the UGtyper.

Great, thanks for reporting your solution!

You can also use Queue to scatter gather pieces of your genome, which will handle the merging process for you.

Geraldine Van der Auwera, PhD

• Member Posts: 2

Hello Geraldine,

I believe I'm running into the ReduceReads bug you mention above. I'm using a series of steps from the best practices in a quite large set of samples, and this is the first time I've encountered this error.

The error looks like this:

ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):

ERROR with alignment start -5 before the previous start -1

As I mentioned before, the previous step in the pipeline is ReduceReads and it finished alright.
I would rather not change the GATK version at this point since that would mean either rerunning all samples or using different versions in the same analysis. What do you think about this?

Thanks a lot,
Camille

Hi Camille,

What you could do as a compromise is reprocess the problem sample (specifically, the unreduced bam) through ReduceReads with the latest GATK to get a non-buggy reduced bam, then resume analysis at the UG step with the same version as the rest of your samples. It's not perfect (the ideal is of course to reprocess everything) but I think that should have minimal impact on your analysis.

Geraldine Van der Auwera, PhD

• Member Posts: 4

Hi GATK team and @Geraldine_VdAuwera,

We have a recurring problem with this same error message from the UnifiedGenotyper, which stops our SNP/INDEL calling runs dead in their tracks:

##### ERROR A GATK RUNTIME ERROR has occurred (version 2.7-2-g6bda569):
…
##### ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 38928867 | 35S65M1S


In each of several different cases I’ve seen now, the error message is the same but the alignment coordinate and CIGAR string differ.

Our inputs are all human whole genome and exome samples from Illumina HiSeqs. Genome and exome datasets are processed in separate UG runs.

Input BAMS were prepared according to the best practices protocol: bwa / picard / realignment / base recalibration / reduced BAMs. There were no obvious errors from each of these stages of processing.

After seeing this error with reduced BAM inputs, my general troubleshooting pattern is to fall back on un-reduced BAMs and try again. We first encountered this error in v2.6-5, waited for the v2.7 release to try again, and now have encountered the same problem with certain datasets in v2.7-2.

I'm tacking my question onto this thread since other forum posts about this error seem to involve slightly different circumstances (older versions of GATK, etc).

I’m not sure how best to proceed. Can you suggest any other troubleshooting steps at this point? Thanks!

This sounds like an issue we've seen with soft-clipped reads. Technically it's not a bug in GATK, it's a problem with how the aligner expresses the softclips, if I remember correctly. The simplest way to deal with this (if it's the issue I'm thinking of) is to run with -rf BadCigar, which will filter those reads out. You'll want to check the run summary for the filtering stats, to make sure not too many reads are getting excluded.

Geraldine Van der Auwera, PhD

• Member Posts: 4

I re-ran my UG job with the BadCigar read filter applied and the calling process got a little further along than before, but died again:

##### ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Alignment 62720180 | 8S91M2S

Since it didn't finish, I don't have the run summary to inspect, although I could enable debug message logging.

Is there another culprit I should investigate, or something else I should try?

This is the command line for the job in question, representative of several of our UG jobs that are failing currently on both exome and whole genome datasets.

$JAVA -Xmx90g -jar$CLASSPATH/GenomeAnalysisTK.jar \
-l INFO \
-T UnifiedGenotyper \
-R ref/hg19/ucsc.hg19.fasta \
--dbsnp ref/hg19/dbsnp_137.hg19.vcf \
--performanceLog ug.perflog \
--log_to_file ug.log \
-nct 40 \
--output_mode EMIT_VARIANTS_ONLY \
-glm BOTH \
-I reducedbams.list \
-o ug.raw.vcf

• Member Posts: 13

Hi.
I had the same problem, with some data UG gave me the unreproducable
Somehow the requested coordinate is not covered by the read

My solution was to run the analysis separately on each Chromosome, which seems to get rid of the error completely:

• I created a list of Chromosomenames
• I run UG for each Chromosome separately

for Chr in 'cat ChromosomeList.txt'; do echo "$Chr"; java -Xmx32G -jar ~/Java/GenomeAnalysisTK-2.7-2/GenomeAnalysisTK.jar -R /prj/gf-rhizo/data/reference/WB42_1.2/WB42v1_sorted_by_RefBeet-1.2.fa -T UnifiedGenotyper -nct 3 -nt 8 --genotype_likelihoods_model BOTH -I B2445D1W31ACXX.final.bam -I B2447C15LWACXX.final.bam -L "$Chr" -o "\$Chr"_raw_SNPs_Indels.vcf ; done;

@allinav, it seems you have a different bug than I thought. @DavidRies's suggestion is a good one; you can run it with Queue scatter-gather if you want the job management to be easier.

Geraldine Van der Auwera, PhD

• Member Posts: 4

Hi @Geraldine_VdAuwera, I'd like to post an update on the last issue discussed here.

I implemented the excellent suggestion to parallelize our UG runs by chromosome. I've not used Queue for scatter-gather or job management just yet, but my GATK command lines are virtually identical to my previous example, with the addition of -L and a whole chromosome target per job.

This change did drastically reduce the total walltime needed for our UG run, which is great, BUT alas, some of our chromosome-specific jobs failed (on chr1,2,4,6,7,8,12,X) with the aforementioned error:

##### ERROR MESSAGE: Somehow the requested coordinate is not covered by the read.

My test input was a set of 27 reduced WGS BAMs prepared with v2.7-2 and the best practices guideline. Aside from trying this same approach again with Queue on the odd chance it's a workflow management issue, I'm stuck again. Any thoughts or ideas? Thanks!