To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Local realignment issues for version 2.4-3-g2a7af43

Hi,

I have downloaded newest version of GATK (version 2.4-3) this week and tried to perform local realignment for my targeted sequencing data. Reference genome, SNP and indel data files were downloaded from resource bundle. However, I encountered two issues when I was doing the realignment.

First, in the step of RealignerTargetCreator. With the same command line, if I run it under version 2.4-3, I got an error message "MESSAGE: -49" (no other detail information provided); if I run it under an older version 2.3-9, it ran very well with no errors.

Second, in the step of IndelRealigner. I got error message "MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'". However, reference genome was downloaded from the bundle. I am not sure how to fix this issue.

I hope someone can help me with these issues. Let me know if more info is needed.

Thanks!

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    Can you please post the complete stack trace that is output to the console when these errors occur?

  • @Geraldine_VdAuwera said:
    Hi there,

    Can you please post the complete stack trace that is output to the console when these errors occur?

    Here is the complete stack trace for RealignerTargetCreator.

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.ArrayIndexOutOfBoundsException: -49
    at org.broadinstitute.sting.utils.BaseUtils.convertIUPACtoN(BaseUtils.java:172)
    at org.broadinstitute.sting.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:288)
    at org.broadinstitute.sting.gatk.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150)
    at org.broadinstitute.sting.gatk.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.4-3-g2a7af43):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: -49
    ERROR ------------------------------------------------------------------------------------------
  • Here is the error for IndelRealigner.

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.4-3-g2a7af43):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    OK, can you also post your command line please?

  • Here are the command lines. Thank you!

    RealignerTargetCreator:
    java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta.gz -o indel.intervals -known 1000G_phase1.indels.hg19.vcf

    IndelRealigner:
    java -Xmx4g -Djava.io.tmpdir=/temp/${file} \
    -jar GenomeAnalysisTK.jar \
    -I ${file}-rg.bam \
    -R ucsc.hg19.fasta.gz \
    -T IndelRealigner \
    -targetIntervals indel.intervals \
    -o ${file}-rai.bam \
    -known 1000G_phase1.indels.hg19.vcf \
    --consensusDeterminationModel KNOWNS_ONLY \
    -LOD 0.4

  • @Geraldine_VdAuwera said:
    OK, can you also post your command line please?

    The command lines are listed above. Thanks.

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.

  • @ebanks said:
    The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.

    Thanks, I see. I will give it a try. But I still do not understand why v2.3 works. Does v2.4 disable the usage of compressed reference files?

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    That's a great question. I just looked carefully and I don't think using compressed references was working properly in the past. It's not necessarily blowing up with errors, but I wouldn't trust the results you get using the compressed reference. Sorry to be the bearer of bad news.
    I'll update the GATK to detect this and generate a proper error message.

  • After uncompressing the reference files, it works very well. Thank you.

  • Hello,

    I have the same problem (ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in
    the provided reference: '10') with Fasta Reference Maker tool, but I'm using uncomressed reference genome which was created with Fasta Alternate Reference Maker. How can I solve this problem?

    Thanks,
    Tamara

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Tamara, can you tell me what version you're using?

  • I think this is a function of running GATK on Windows. ASCII value 10 is the line feed character, which is part of the screwy Windows two-character line ending. I'll bet GATK is only looking for the carriage return and flips out when it sees the "unknown" character

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ooh, good point. Tamara, are you running on Windows?

  • I'm using GATK 2.4-7
    OK, I will try to run GATK on the Linux Virtual Mashine

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    OK, let me know if that works. You may need to re-download the reference from our resource bundle, since the one you've been using will have the Windows characters that are tripping up the GATK.

  • This works. Thanks

  • Hi,
    I'm having the same trouble with "Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'". I downloaded the reference from the resource bundle (v37), and just edited the chr names using pico -w. I don't think any additional characters were put in.. Thanks

  • Running on the mac os x server.. thanks

  • This problem seems to be unique to the new version of GATK.. v1.6 has no trouble with this and runs great.. I attempted to remove any carriage characters using the tr command, but this fails to remove the problem.. later versions of mac os x shouldn't be burdened with this issue i thought as well.

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    v1.6 has no trouble because we only recently added the check for bad reference characters. In previous versions these characters would silently cause the tools to produce incorrect results.

  • Eric, could you tell me what GATK is looking for at the end of sequence for the next line.. i've tried using '\n' or '\r' and on the mac os x and either way i'm getting an error with this method... would it be possible to put in the bad reference character check a pass for these linefeeds or carriage characters? in theory mac osx is somewhat linux.. this will allow greater flexibility in people creating reference sequences.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    Unfortunately the check is necessary. The characters in your reference are going to cause the ref bases to be lined up incorrectly relative to their correct positions on the genome.

    I'm not sure what GATK looks for/tolerates in terms of return characters, as we use someone else's tools for loading in FastA files (Samtools/Picard). Technically you don't need to add any return characters within the sequence in a FastA record.

Sign In or Register to comment.