The current GATK version is 3.2-2

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

# Local realignment issues for version 2.4-3-g2a7af43

Posts: 6Member

Hi,

I have downloaded newest version of GATK (version 2.4-3) this week and tried to perform local realignment for my targeted sequencing data. Reference genome, SNP and indel data files were downloaded from resource bundle. However, I encountered two issues when I was doing the realignment.

First, in the step of RealignerTargetCreator. With the same command line, if I run it under version 2.4-3, I got an error message "MESSAGE: -49" (no other detail information provided); if I run it under an older version 2.3-9, it ran very well with no errors.

Second, in the step of IndelRealigner. I got error message "MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'". However, reference genome was downloaded from the bundle. I am not sure how to fix this issue.

I hope someone can help me with these issues. Let me know if more info is needed.

Thanks!

Tagged:

Hi there,

Can you please post the complete stack trace that is output to the console when these errors occur?

Geraldine Van der Auwera, PhD

• Posts: 6Member

@Geraldine_VdAuwera said: Hi there,

Can you please post the complete stack trace that is output to the console when these errors occur?

Here is the complete stack trace for RealignerTargetCreator.

##### ERROR ------------------------------------------------------------------------------------------
• Posts: 6Member

Here is the error for IndelRealigner.

##### ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '13'

Geraldine Van der Auwera, PhD

• Posts: 6Member

Here are the command lines. Thank you!

RealignerTargetCreator: java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta.gz -o indel.intervals -known 1000G_phase1.indels.hg19.vcf

IndelRealigner: java -Xmx4g -Djava.io.tmpdir=/temp/${file} \ -jar GenomeAnalysisTK.jar \ -I${file}-rg.bam \ -R ucsc.hg19.fasta.gz \ -T IndelRealigner \ -targetIntervals indel.intervals \ -o \${file}-rai.bam \ -known 1000G_phase1.indels.hg19.vcf \ --consensusDeterminationModel KNOWNS_ONLY \ -LOD 0.4

• Posts: 6Member

The command lines are listed above. Thanks.

• Posts: 683GATK Developer mod

The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

• Posts: 6Member

@ebanks said: The GATK does not accept compressed reference files. You'll need to unzip it before using it with the GATK.

Thanks, I see. I will give it a try. But I still do not understand why v2.3 works. Does v2.4 disable the usage of compressed reference files?

• Posts: 683GATK Developer mod

That's a great question. I just looked carefully and I don't think using compressed references was working properly in the past. It's not necessarily blowing up with errors, but I wouldn't trust the results you get using the compressed reference. Sorry to be the bearer of bad news. I'll update the GATK to detect this and generate a proper error message.

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

• Posts: 6Member

After uncompressing the reference files, it works very well. Thank you.

• Posts: 3Member

Hello,

I have the same problem (ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10') with Fasta Reference Maker tool, but I'm using uncomressed reference genome which was created with Fasta Alternate Reference Maker. How can I solve this problem?

Thanks, Tamara

Hi Tamara, can you tell me what version you're using?

Geraldine Van der Auwera, PhD

• Posts: 344Member, GSA Collaborator ✭✭✭

I think this is a function of running GATK on Windows. ASCII value 10 is the line feed character, which is part of the screwy Windows two-character line ending. I'll bet GATK is only looking for the carriage return and flips out when it sees the "unknown" character

Ooh, good point. Tamara, are you running on Windows?

Geraldine Van der Auwera, PhD

• Posts: 3Member

I'm using GATK 2.4-7 OK, I will try to run GATK on the Linux Virtual Mashine

OK, let me know if that works. You may need to re-download the reference from our resource bundle, since the one you've been using will have the Windows characters that are tripping up the GATK.

Geraldine Van der Auwera, PhD

• Posts: 3Member

This works. Thanks

• Posts: 4Member

Hi, I'm having the same trouble with "Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'". I downloaded the reference from the resource bundle (v37), and just edited the chr names using pico -w. I don't think any additional characters were put in.. Thanks

• Posts: 4Member

Running on the mac os x server.. thanks

• Posts: 4Member

This problem seems to be unique to the new version of GATK.. v1.6 has no trouble with this and runs great.. I attempted to remove any carriage characters using the tr command, but this fails to remove the problem.. later versions of mac os x shouldn't be burdened with this issue i thought as well.

• Posts: 683GATK Developer mod

v1.6 has no trouble because we only recently added the check for bad reference characters. In previous versions these characters would silently cause the tools to produce incorrect results.

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

• Posts: 4Member

Eric, could you tell me what GATK is looking for at the end of sequence for the next line.. i've tried using '\n' or '\r' and on the mac os x and either way i'm getting an error with this method... would it be possible to put in the bad reference character check a pass for these linefeeds or carriage characters? in theory mac osx is somewhat linux.. this will allow greater flexibility in people creating reference sequences.