To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

stack trace error using IndelRealigner

cbossucbossu Uppsala UniversityMember

Hello,
I'm trying to realign approximately 115 bam files. I am able to do this with the -o command, but this results in an impressively large bam file that I cannot fix in Picard (FixMateInformation and SortSam). Unfortunately these are corrections that need to happen before the downstream GATK snp discovery. So I tried the -nWayOut command, to get an individual realigned bam file for each input, but this returns a stack trace ERROR that includes something about an unavailable reader id. I've pasted it below.

INFO 14:06:32,838 ProgressMeter - scaffold_0:4430818 1.17606954E8 59.5 m 30.0 s 0.4% 9.8 d 9.7 d
INFO 14:07:32,840 ProgressMeter - scaffold_0:4474066 1.18707144E8 60.5 m 30.0 s 0.4% 9.8 d 9.8 d
INFO 14:08:32,841 ProgressMeter - scaffold_0:4505563 1.1980727E8 61.5 m 30.0 s 0.4% 9.9 d 9.9 d
INFO 14:09:32,843 ProgressMeter - scaffold_0:4506325 1.20407434E8 62.5 m 31.0 s 0.4% 10.1 d 10.0 d
INFO 14:09:55,236 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: No such reader id is available
at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource$SAMResourcePool.getReaderID(SAMDataSource.java:809)
at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource.getReaderID(SAMDataSource.java:430)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReaderIDForRead(GenomeAnalysisEngine.java:803)
at org.broadinstitute.gatk.utils.sam.NWaySAMFileWriter.addAlignment(NWaySAMFileWriter.java:158)
at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.writeRead(ConstrainedMateFixingManager.java:356)
at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.addRead(ConstrainedMateFixingManager.java:261)
at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.addRead(ConstrainedMateFixingManager.java:237)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.emit(IndelRealigner.java:492)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:529)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:146)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:216)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: No such reader id is available
ERROR ------------------------------------------------------------------------------------------

I'm not sure if it's the number of bam files submitted or if it is something with the specific position when the error occurs. Any help in this matter would be greatly appreciated!

Christen

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, that's not something I've seen before -- does it happen again if you retry the job? In any case my recommendation would be to spare yourself some trouble and just realign the samples separately. Apart from certain specific experimental designs (like for cancer samples), realigning across samples is not really necessary; the gains are not worth the extra compute. Especially if you are going to use HaplotypeCaller to call variants.

  • cbossucbossu Uppsala UniversityMember

    Interesting. The reason we were realigning across all samples is that we recently did population specific indel realignment and genotype calling using UnifiedGenotyper. Our worry was that when we merged the realigned population bam files, several variant sites were not actually polymorphic (or fixed), but misaligned around an indel that was placed in different positions in each population. For instance, two variants at back to back positions, 14942878 and 14942879, were not actually fixed. So we thought realigning across all samples would reduce the number of miscalled variants. Is this worry unfounded? I will try to run the job again and then if this error occurs again, I'll realign the samples separately. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah, I didn't realize you were using UG -- then that is something you need to address, yes. One workaround that avoids having to do multisample realignment is to generate a set of the true variants (if you know which ones have the right position) and use that as known indels for realignment. The better solution is to use HaplotypeCaller because the reassembly step that it performs will typically lead to more accurate and therefore more consistent placement of indels.

  • cbossucbossu Uppsala UniversityMember

    Hello. I'm trying HaplotypeCaller now, but we have several females in our data, and since only the diploid option is available in HC, I will need to use UnifiedGenotyper for the snp discovery and genotyping the scaffolds on our sex chromosome! That said, I re-ran the IndelRealigner job and the same error came up at the exact same position. First, here's my command line:

    java -Xmx22g -jar /sw/apps/bioinfo/GATK/3.2.2/GenomeAnalysisTK.jar -T IndelRealigner -R $reseq/example.fasta $(printf ' -I %s ' crow/indmerged.*test3.bam) -targetIntervals $region/forIndelRealigner_all.intervals -nWayOut .realigned.bam

    Second, here's the error. It mentions an absent EOF marker at the very end of the stack trace, but I have had EOF errors before in GATK and they are very different. And I re-checked all my input bam files and they are not truncated. I can attach the entire error message if that would help. Thanks!

    INFO 09:15:08,596 ProgressMeter - scaffold_0:4505579 1.1980727E8 62.5 m 31.0 s 0.4% 10.1 d 10.0 d
    INFO 09:16:08,598 ProgressMeter - scaffold_0:4506325 1.20407434E8 63.5 m 31.0 s 0.4% 10.2 d 10.2 d
    INFO 09:16:31,788 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: No such reader id is available
    at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource$SAMResourcePool.getReaderID(SAMDataSource.java:809)
    at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource.getReaderID(SAMDataSource.java:430)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReaderIDForRead(GenomeAnalysisEngine.java:803)
    at org.broadinstitute.gatk.utils.sam.NWaySAMFileWriter.addAlignment(NWaySAMFileWriter.java:158)
    at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.writeRead(ConstrainedMateFixingManager.java:356)
    at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.addRead(ConstrainedMateFixingManager.java:261)
    at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.addRead(ConstrainedMateFixingManager.java:237)
    at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.emit(IndelRealigner.java:492)
    at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:529)
    at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:146)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:216)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: No such reader id is available
    ERROR ------------------------------------------------------------------------------------------

    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_index_core] truncated file? Continue anyway. (-4)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @‌cbossu

    FYI the development version of HC now accepts any ploidy, so you could use HC for everything, including sex chromosomes in the males. That said until now we always just ran in diploid mode even on males, then threw out anything that came up heterozygous...

    Regarding your error, there's not much I can say as this looks like it could have been a file system glitch during a previous run. You can try deleting index files, which is always a good first-line thing to try; and running ValidateSAMFile may help determine whether the error is GATK specific or not

Sign In or Register to comment.