It looks like you're new here. If you want to get involved, click one of these buttons!
Hi,
I have downloaded newest version of GATK (version 2.4-3) this week and tried to perform local realignment for my targeted sequencing data. Reference genome, SNP and indel data files were downloaded from resource bundle. However, I encountered two issues when I was doing the realignment.
First, in the step of RealignerTargetCreator. With the same command line, if I run it under version 2.4-3, I got an error message "MESSAGE: -49" (no other detail information provided); if I run it under an older version 2.3-9, it ran very well with no errors.
Second, in the step of IndelRealigner. I got error message "MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'". However, reference genome was downloaded from the bundle. I am not sure how to fix this issue.
I hope someone can help me with these issues. Let me know if more info is needed.
Thanks!
Comments
Hi there,
Can you please post the complete stack trace that is output to the console when these errors occur?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Here is the complete stack trace for RealignerTargetCreator.
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
java.lang.ArrayIndexOutOfBoundsException: -49 at org.broadinstitute.sting.utils.BaseUtils.convertIUPACtoN(BaseUtils.java:172) at org.broadinstitute.sting.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:288) at org.broadinstitute.sting.gatk.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150) at org.broadinstitute.sting.gatk.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.4-3-g2a7af43):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: -49
ERROR ------------------------------------------------------------------------------------------
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Here is the error for IndelRealigner.
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.4-3-g2a7af43):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •OK, can you also post your command line please?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Here are the command lines. Thank you!
RealignerTargetCreator: java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta.gz -o indel.intervals -known 1000G_phase1.indels.hg19.vcf
IndelRealigner: java -Xmx4g -Djava.io.tmpdir=/temp/${file} \ -jar GenomeAnalysisTK.jar \ -I ${file}-rg.bam \ -R ucsc.hg19.fasta.gz \ -T IndelRealigner \ -targetIntervals indel.intervals \ -o ${file}-rai.bam \ -known 1000G_phase1.indels.hg19.vcf \ --consensusDeterminationModel KNOWNS_ONLY \ -LOD 0.4
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The command lines are listed above. Thanks.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks, I see. I will give it a try. But I still do not understand why v2.3 works. Does v2.4 disable the usage of compressed reference files?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •That's a great question. I just looked carefully and I don't think using compressed references was working properly in the past. It's not necessarily blowing up with errors, but I wouldn't trust the results you get using the compressed reference. Sorry to be the bearer of bad news. I'll update the GATK to detect this and generate a proper error message.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •After uncompressing the reference files, it works very well. Thank you.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hello,
I have the same problem (ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10') with Fasta Reference Maker tool, but I'm using uncomressed reference genome which was created with Fasta Alternate Reference Maker. How can I solve this problem?
Thanks, Tamara
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Tamara, can you tell me what version you're using?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I think this is a function of running GATK on Windows. ASCII value 10 is the line feed character, which is part of the screwy Windows two-character line ending. I'll bet GATK is only looking for the carriage return and flips out when it sees the "unknown" character
- Spam
- Abuse
- Troll
1 • Off Topic Disagree Agree 1Like WTF •Ooh, good point. Tamara, are you running on Windows?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'm using GATK 2.4-7 OK, I will try to run GATK on the Linux Virtual Mashine
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •OK, let me know if that works. You may need to re-download the reference from our resource bundle, since the one you've been using will have the Windows characters that are tripping up the GATK.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •This works. Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, I'm having the same trouble with "Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'". I downloaded the reference from the resource bundle (v37), and just edited the chr names using pico -w. I don't think any additional characters were put in.. Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Running on the mac os x server.. thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •This problem seems to be unique to the new version of GATK.. v1.6 has no trouble with this and runs great.. I attempted to remove any carriage characters using the tr command, but this fails to remove the problem.. later versions of mac os x shouldn't be burdened with this issue i thought as well.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •v1.6 has no trouble because we only recently added the check for bad reference characters. In previous versions these characters would silently cause the tools to produce incorrect results.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Eric, could you tell me what GATK is looking for at the end of sequence for the next line.. i've tried using '\n' or '\r' and on the mac os x and either way i'm getting an error with this method... would it be possible to put in the bad reference character check a pass for these linefeeds or carriage characters? in theory mac osx is somewhat linux.. this will allow greater flexibility in people creating reference sequences.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi there,
Unfortunately the check is necessary. The characters in your reference are going to cause the ref bases to be lined up incorrectly relative to their correct positions on the genome.
I'm not sure what GATK looks for/tolerates in terms of return characters, as we use someone else's tools for loading in FastA files (Samtools/Picard). Technically you don't need to add any return characters within the sequence in a FastA record.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •