Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MuTect2: Reference name for '3223094' not found in sequence dictionary

Hi,

I'm trying to run MuTect2 using GATK 4.0.3.0, but I'm having a similar issue to this topic. I posted my problem as a comment there but I decided to make my own topic seeing as this is for GATK 4.

I'm trying to run:

gatk Mutect2 \
 -R human_g1k_v37_decoy.fasta \
 -I ${TUMOUR}.recal.bam \
 -tumor ${TUMOUR} \
 -I ${NORMAL}.recal.bam \
 -normal ${NORMAL} \
 --germline-resource af-only-gnomad.raw.sites.b37.vcf \
 --af-of-alleles-not-in-resource 0.00003125 \
 -O ${TUMOUR}_${NORMAL}.vcf

And I'm getting this error message, with a different number after reference name each time:

[August 8, 2018 3:55:11 PM BST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.14 minutes.
Runtime.totalMemory()=2076049408
java.lang.IllegalArgumentException: Reference name for '3223094' not found in sequence dictionary.
        at htsjdk.samtools.SAMRecord.resolveNameFromIndex(SAMRecord.java:569)
        at htsjdk.samtools.SAMRecord.setReferenceIndex(SAMRecord.java:422)
        at htsjdk.samtools.BAMRecord.<init>(BAMRecord.java:87)
        at htsjdk.samtools.DefaultSAMRecordFactory.createBAMRecord(DefaultSAMRecordFactory.java:42)
        at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:210)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:829)
        at htsjdk.samtools.BAMFileReader$BAMFileIndexIterator.getNextRecord(BAMFileReader.java:981)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:803)
        at htsjdk.samtools.BAMFileReader$BAMFileIndexIterator.<init>(BAMFileReader.java:963)
        at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:918)
        at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:575)
        at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:528)
        at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
        at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:125)
        at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:66)
        at org.broadinstitute.hellbender.engine.ReadsDataSource.prepareIteratorsForTraversal(ReadsDataSource.java:404)
        at org.broadinstitute.hellbender.engine.ReadsDataSource.iterator(ReadsDataSource.java:330)
        at org.broadinstitute.hellbender.engine.MultiIntervalLocalReadShard.iterator(MultiIntervalLocalReadShard.java:134)
        at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.<init>(AssemblyRegionIterator.java:109)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:286)
        at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:271)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

I have checked the .dict file and the .bam headers, but can't see any differences between them. All of the .bam files have been successfully run through HaplotypeCaller, but I only get this error in MuTect2.

I've tried using the latest version (4.0.8.0), but I'm getting the same error. As I mentioned in the other topic, I changed something in the header of one of my bam files using samtools reheader, could this be the issue?

Thanks,

Michael

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @michaelmc
    Hi Michael,

    As I mentioned in the other topic, I changed something in the header of one of my bam files using samtools reheader, could this be the issue?

    It could be. What is it that you changed? Can you confirm this change still runs correctly in HaplotypeCaller but not in Mutect2?

    -Sheila

  • michaelmcmichaelmc Member

    Hi Sheila,

    My read group included the lane number under SM:, so it wasn't recognising the ${NORMAL} sample name I was feeding into MuTect, so I used samtools reheader and sed like so:

    samtools view -H ${SAMPLE}.recal.bam | sed "s/SM:${SAMPLE}_L00[0-9]/SM:${SAMPLE}/" | samtools reheader - ${SAMPLE}.recal.bam > tmp
    mv tmp ${SAMPLE}.recal.bam
    

    I can confirm that this is what is causing the issue, the normal sample I am using in MuTect is the only one I have edited in this way, and it is the only one of its batch now failing HaplotypeCaller with the same error.

    I ran BQSR again to regenerate the file I edited, and just included the lane number in the sample name I fed into MuTect, but I am now getting an invalid argument exception for the tumour sample:

    ***********************************************************************
    
    A USER ERROR has occurred: Invalid argument '/scratch/875683_1/tumour.recal.bam'.
    
    ***********************************************************************
    org.broadinstitute.barclay.argparser.CommandLineException: Invalid argument '/scratch/875683_1/tumour.recal.bam'.
            at org.broadinstitute.barclay.argparser.CommandLineArgumentParser.setPositionalArgument(CommandLineArgumentParser.java:600)
            at org.broadinstitute.barclay.argparser.CommandLineArgumentParser.parseArguments(CommandLineArgumentParser.java:432)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.parseArgs(CommandLineProgram.java:220)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:194)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)
    

    I know the tumour .bam and .bai files exist, so I'm not sure what is causing this issue.

    Thanks,

    Michael

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @michaelmc
    Hi Michael,

    So, can this thread be closed?

    In the new issue, it looks like this can be solved by adding a -I in front of /scratch/875683_1/tumour.recal.bam. Can you post the exact command you ran?

    Thanks,
    Sheila

  • michaelmcmichaelmc Member
    edited September 2018

    Hi Sheila,

    Yes, the other thread can be closed, but I'm still having trouble with this new issue.

    This is the command I ran:

    gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' Mutect2 \
     -R human_g1k_v37.fasta \
     -I ${TMPDIR}/${TUMOUR}.recal.bam \
     -tumor ${TUMOUR} \
     -I ${TMPDIR}/${NORMAL}.recal.bam \
     -normal ${NORMAL} \
     --germline-resource af-only-gnomad.raw.sites.b37.vcf \
     --af-of-alleles-not-in-resource 0.00003125 \
     -O ${TMPDIR}/${TUMOUR}_${NORMAL}.vcf
    

    Thanks,

    Michael

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @michaelmc
    Hi Michael,

    Hmm. I am pretty sure this is a typo issue. Are you copying and pasting this? Can you try typing the command out without the backslashes?

    Thanks,
    Sheila

  • Hi Sheila,

    I've found the issue and it's working now. I noticed in the log file that neither the reference genome nor the germline resource were being printed after their respective flags. I've been using variables for these, but I typed out the file names here to show everything I was using.

    I found a thread for an invalid argument error that suggested typing out the command again - I copied and pasted the variable names for both of these resources again and it ran successfully.

    Thanks for your help,

    Michael

Sign In or Register to comment.