The current GATK version is 3.2-2

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

# VQSR INDEL output error

Posts: 2Member
edited January 2013

Hi, I am new to GATK, I have been trying to figure a strange error that I haven't been able to resolve for days.

Process so far. 1. Run UnifiedGenotyper per chr using -L option on ~ 130 samples 2. Merge all output vcf files into one. (using tabix to gz and index each vcf file, then use vcf-concat to merge all chr* files) 3. Use a perl script to sort merged vcf file based on the reference file order. i.e (chr1, 2, 3...M) 4. Split Merged.sorted.vcf file into INDEL and SNV files. 5. Run VQSR on each file (SNV and INDEL).

Error that I get: During ApplyRecalibration for INDELs I get an error in chr9 that states that a coordinate A is after Coordinate B (A < B, and A and B are different values, each time). This always happens in chr9. I checked my input Merged.sorted.indel.vcf file around coordinate A and B and its file is in order. I checked the recal file and it is also in order. So I can't figure out where the error is coming from. The strange thing is that error is reported when GATK is creating the output file, not during its computation/applying recalibration.

Has anyone encountered such a situation before?  Or  have any ideas I should try to resolve the error.  I don't get any errors with SNVs only INDEL's


Exact error message:

##### ERROR ------------------------------------------------------------------------------------------

Exact command:

/usr/java/latest/bin/java -Xmx6g -XX:-UseGCOverheadLimit -Xms512m -jar /projects/apps/alignment/GenomeAnalysisTK/latest/GenomeAnalysisTK.jar -R /data2/reference/sequence/human/ncbi/37.1/allchr.fa -et NO_ET -K /projects/apps/alignment/GenomeAnalysisTK/latest/Hossain.Asif_mayo.edu.key -mode INDEL -T ApplyRecalibration -nt 4 -input /data2/secondary/multisample/Merged.variant.INDEL.vcf.temp -recalFile /data2/secondary/multisample/temp/Merged.variant.INDEL.recal -tranchesFile /data2/secondary/multisample/temp/Merged.variant.INDEL.tranches -o /data2/secondary/multisample/Merged.variant.filter.INDEL_2.vcf


Version of GATK : 1.7 and 1.6.7

Post edited by Geraldine_VdAuwera on
Tagged:

• Posts: 683GATK Developer mod

Try removing "-nt 4"

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

• Posts: 2Member
edited October 2012

Hi, Thanks for the pointer that seem to have resolved above error however now I have another one

    ##### ERROR MESSAGE: The allele with index 33360123 is not defined in the REF/ALT columns in the record


any pointers?

Thanks

Post edited by bjaysheel on