The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Local realignment issues for version 2.4-3-g2a7af43

Hi,

I have downloaded newest version of GATK (version 2.4-3) this week and tried to perform local realignment for my targeted sequencing data. Reference genome, SNP and indel data files were downloaded from resource bundle. However, I encountered two issues when I was doing the realignment.

First, in the step of RealignerTargetCreator. With the same command line, if I run it under version 2.4-3, I got an error message "MESSAGE: -49" (no other detail information provided); if I run it under an older version 2.3-9, it ran very well with no errors.

Second, in the step of IndelRealigner. I got error message "MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'". However, reference genome was downloaded from the bundle. I am not sure how to fix this issue.

I hope someone can help me with these issues. Let me know if more info is needed.

Thanks!

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    Can you please post the complete stack trace that is output to the console when these errors occur?

  • @Geraldine_VdAuwera said:
    Hi there,

    Can you please post the complete stack trace that is output to the console when these errors occur?

    Here is the complete stack trace for RealignerTargetCreator.

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.ArrayIndexOutOfBoundsException: -49
    at org.broadinstitute.sting.utils.BaseUtils.convertIUPACtoN(BaseUtils.java:172)
    at org.broadinstitute.sting.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:288)
    at org.broadinstitute.sting.gatk.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150)
    at org.broadinstitute.sting.gatk.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.4-3-g2a7af43):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: -49
    ERROR ------------------------------------------------------------------------------------------
  • Here is the error for IndelRealigner.

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.4-3-g2a7af43):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    OK, can you also post your command line please?

  • Here are the command lines. Thank you!

    RealignerTargetCreator:
    java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta.gz -o indel.intervals -known 1000G_phase1.indels.hg19.vcf

    IndelRealigner:
    java -Xmx4g -Djava.io.tmpdir=/temp/${file} \
    -jar GenomeAnalysisTK.jar \
    -I ${file}-rg.bam \
    -R ucsc.hg19.fasta.gz \
    -T IndelRealigner \
    -targetIntervals indel.intervals \
    -o ${file}-rai.bam \
    -known 1000G_phase1.indels.hg19.vcf \
    --consensusDeterminationModel KNOWNS_ONLY \
    -LOD 0.4

  • @Geraldine_VdAuwera said:
    OK, can you also post your command line please?

    The command lines are listed above. Thanks.

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.

  • @ebanks said:
    The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.

    Thanks, I see. I will give it a try. But I still do not understand why v2.3 works. Does v2.4 disable the usage of compressed reference files?

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    That's a great question. I just looked carefully and I don't think using compressed references was working properly in the past. It's not necessarily blowing up with errors, but I wouldn't trust the results you get using the compressed reference. Sorry to be the bearer of bad news.
    I'll update the GATK to detect this and generate a proper error message.

  • After uncompressing the reference files, it works very well. Thank you.

  • Hello,

    I have the same problem (ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in
    the provided reference: '10') with Fasta Reference Maker tool, but I'm using uncomressed reference genome which was created with Fasta Alternate Reference Maker. How can I solve this problem?

    Thanks,
    Tamara

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Tamara, can you tell me what version you're using?

  • pdexheimerpdexheimer Member, Dev

    I think this is a function of running GATK on Windows. ASCII value 10 is the line feed character, which is part of the screwy Windows two-character line ending. I'll bet GATK is only looking for the carriage return and flips out when it sees the "unknown" character

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ooh, good point. Tamara, are you running on Windows?

  • I'm using GATK 2.4-7
    OK, I will try to run GATK on the Linux Virtual Mashine

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    OK, let me know if that works. You may need to re-download the reference from our resource bundle, since the one you've been using will have the Windows characters that are tripping up the GATK.

  • This works. Thanks

  • Hi,
    I'm having the same trouble with "Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'". I downloaded the reference from the resource bundle (v37), and just edited the chr names using pico -w. I don't think any additional characters were put in.. Thanks

  • Running on the mac os x server.. thanks

  • This problem seems to be unique to the new version of GATK.. v1.6 has no trouble with this and runs great.. I attempted to remove any carriage characters using the tr command, but this fails to remove the problem.. later versions of mac os x shouldn't be burdened with this issue i thought as well.

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    v1.6 has no trouble because we only recently added the check for bad reference characters. In previous versions these characters would silently cause the tools to produce incorrect results.

  • Eric, could you tell me what GATK is looking for at the end of sequence for the next line.. i've tried using '\n' or '\r' and on the mac os x and either way i'm getting an error with this method... would it be possible to put in the bad reference character check a pass for these linefeeds or carriage characters? in theory mac osx is somewhat linux.. this will allow greater flexibility in people creating reference sequences.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    Unfortunately the check is necessary. The characters in your reference are going to cause the ref bases to be lined up incorrectly relative to their correct positions on the genome.

    I'm not sure what GATK looks for/tolerates in terms of return characters, as we use someone else's tools for loading in FastA files (Samtools/Picard). Technically you don't need to add any return characters within the sequence in a FastA record.

Sign In or Register to comment.