Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Local realignment issues for version 2.4-3-g2a7af43

wwmm933wwmm933 Posts: 6Member

Hi,

I have downloaded newest version of GATK (version 2.4-3) this week and tried to perform local realignment for my targeted sequencing data. Reference genome, SNP and indel data files were downloaded from resource bundle. However, I encountered two issues when I was doing the realignment.

First, in the step of RealignerTargetCreator. With the same command line, if I run it under version 2.4-3, I got an error message "MESSAGE: -49" (no other detail information provided); if I run it under an older version 2.3-9, it ran very well with no errors.

Second, in the step of IndelRealigner. I got error message "MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'". However, reference genome was downloaded from the bundle. I am not sure how to fix this issue.

I hope someone can help me with these issues. Let me know if more info is needed.

Thanks!

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hi there,

    Can you please post the complete stack trace that is output to the console when these errors occur?

    Geraldine Van der Auwera, PhD

  • wwmm933wwmm933 Posts: 6Member

    @Geraldine_VdAuwera said: Hi there,

    Can you please post the complete stack trace that is output to the console when these errors occur?

    Here is the complete stack trace for RealignerTargetCreator.

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.ArrayIndexOutOfBoundsException: -49 at org.broadinstitute.sting.utils.BaseUtils.convertIUPACtoN(BaseUtils.java:172) at org.broadinstitute.sting.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:288) at org.broadinstitute.sting.gatk.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150) at org.broadinstitute.sting.gatk.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:100) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.4-3-g2a7af43):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: -49
    ERROR ------------------------------------------------------------------------------------------
  • wwmm933wwmm933 Posts: 6Member

    Here is the error for IndelRealigner.

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.4-3-g2a7af43):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    OK, can you also post your command line please?

    Geraldine Van der Auwera, PhD

  • wwmm933wwmm933 Posts: 6Member

    Here are the command lines. Thank you!

    RealignerTargetCreator: java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta.gz -o indel.intervals -known 1000G_phase1.indels.hg19.vcf

    IndelRealigner: java -Xmx4g -Djava.io.tmpdir=/temp/${file} \ -jar GenomeAnalysisTK.jar \ -I ${file}-rg.bam \ -R ucsc.hg19.fasta.gz \ -T IndelRealigner \ -targetIntervals indel.intervals \ -o ${file}-rai.bam \ -known 1000G_phase1.indels.hg19.vcf \ --consensusDeterminationModel KNOWNS_ONLY \ -LOD 0.4

  • wwmm933wwmm933 Posts: 6Member

    @Geraldine_VdAuwera said: OK, can you also post your command line please?

    The command lines are listed above. Thanks.

  • ebanksebanks Posts: 671GSA Member mod

    The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • wwmm933wwmm933 Posts: 6Member

    @ebanks said: The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.

    Thanks, I see. I will give it a try. But I still do not understand why v2.3 works. Does v2.4 disable the usage of compressed reference files?

  • ebanksebanks Posts: 671GSA Member mod

    That's a great question. I just looked carefully and I don't think using compressed references was working properly in the past. It's not necessarily blowing up with errors, but I wouldn't trust the results you get using the compressed reference. Sorry to be the bearer of bad news. I'll update the GATK to detect this and generate a proper error message.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • wwmm933wwmm933 Posts: 6Member

    After uncompressing the reference files, it works very well. Thank you.

  • TamaraTamara Posts: 3Member

    Hello,

    I have the same problem (ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10') with Fasta Reference Maker tool, but I'm using uncomressed reference genome which was created with Fasta Alternate Reference Maker. How can I solve this problem?

    Thanks, Tamara

    txt
    txt
    Log.txt
    3K
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hi Tamara, can you tell me what version you're using?

    Geraldine Van der Auwera, PhD

  • pdexheimerpdexheimer Posts: 297Member, GSA Collaborator ✭✭✭

    I think this is a function of running GATK on Windows. ASCII value 10 is the line feed character, which is part of the screwy Windows two-character line ending. I'll bet GATK is only looking for the carriage return and flips out when it sees the "unknown" character

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Ooh, good point. Tamara, are you running on Windows?

    Geraldine Van der Auwera, PhD

  • TamaraTamara Posts: 3Member

    I'm using GATK 2.4-7 OK, I will try to run GATK on the Linux Virtual Mashine

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    OK, let me know if that works. You may need to re-download the reference from our resource bundle, since the one you've been using will have the Windows characters that are tripping up the GATK.

    Geraldine Van der Auwera, PhD

  • TamaraTamara Posts: 3Member
  • mgala81mgala81 Posts: 4Member

    Hi, I'm having the same trouble with "Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'". I downloaded the reference from the resource bundle (v37), and just edited the chr names using pico -w. I don't think any additional characters were put in.. Thanks

  • mgala81mgala81 Posts: 4Member

    Running on the mac os x server.. thanks

  • mgala81mgala81 Posts: 4Member

    This problem seems to be unique to the new version of GATK.. v1.6 has no trouble with this and runs great.. I attempted to remove any carriage characters using the tr command, but this fails to remove the problem.. later versions of mac os x shouldn't be burdened with this issue i thought as well.

  • ebanksebanks Posts: 671GSA Member mod

    v1.6 has no trouble because we only recently added the check for bad reference characters. In previous versions these characters would silently cause the tools to produce incorrect results.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • mgala81mgala81 Posts: 4Member

    Eric, could you tell me what GATK is looking for at the end of sequence for the next line.. i've tried using '\n' or '\r' and on the mac os x and either way i'm getting an error with this method... would it be possible to put in the bad reference character check a pass for these linefeeds or carriage characters? in theory mac osx is somewhat linux.. this will allow greater flexibility in people creating reference sequences.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hi there,

    Unfortunately the check is necessary. The characters in your reference are going to cause the ref bases to be lined up incorrectly relative to their correct positions on the genome.

    I'm not sure what GATK looks for/tolerates in terms of return characters, as we use someone else's tools for loading in FastA files (Samtools/Picard). Technically you don't need to add any return characters within the sequence in a FastA record.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.