GATK licensing moves to direct-through-Broad model -- read about it on the GATK blog

VQSR INDEL output error

bjaysheelbjaysheel Posts: 2Member
edited January 2013 in Ask the GATK team

Hi,
I am new to GATK, I have been trying to figure a strange error that I haven't been able to resolve for days.

Process so far.
1. Run UnifiedGenotyper per chr using -L option on ~ 130 samples
2. Merge all output vcf files into one. (using tabix to gz and index each vcf file, then use vcf-concat to merge all chr* files)
3. Use a perl script to sort merged vcf file based on the reference file order. i.e (chr1, 2, 3...M)
4. Split Merged.sorted.vcf file into INDEL and SNV files.
5. Run VQSR on each file (SNV and INDEL).

Error that I get:
During ApplyRecalibration for INDELs I get an error in chr9 that states that a coordinate A is after Coordinate B (A < B, and A and B are different values, each time). This always happens in chr9. I checked my input Merged.sorted.indel.vcf file around coordinate A and B and its file is in order. I checked the recal file and it is also in order. So I can't figure out where the error is coming from. The strange thing is that error is reported when GATK is creating the output file, not during its computation/applying recalibration.

Has anyone encountered such a situation before?  Or  have any ideas I should try to resolve the error.  I don't get any errors with SNVs only INDEL's

Exact error message:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Unable to merge temporary Tribble output file.
at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.mergeExistingOutput(HierarchicalMicroScheduler.java:259)
at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:103)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:248)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)
Caused by: org.broad.tribble.TribbleException$MalformedFeatureFile: We saw a record with a start of chr9:33020249 after a record with a start of chr9:34987121, for input source: /data2/bsi/secondary/multisample/Merged.variant.filter.INDEL_2.vcf
at org.broad.tribble.index.DynamicIndexCreator.addFeature(DynamicIndexCreator.java:164)
at org.broadinstitute.sting.utils.codecs.vcf.IndexingVCFWriter.add(IndexingVCFWriter.java:118)
at org.broadinstitute.sting.utils.codecs.vcf.StandardVCFWriter.add(StandardVCFWriter.java:163)
at org.broadinstitute.sting.gatk.io.storage.VCFWriterStorage.mergeInto(VCFWriterStorage.java:120)
at org.broadinstitute.sting.gatk.io.storage.VCFWriterStorage.mergeInto(VCFWriterStorage.java:26)
at org.broadinstitute.sting.gatk.executive.OutputMergeTask.merge(OutputMergeTask.java:48)
at org.broadinstitute.sting.gatk.executive.HierarchicalMicroScheduler.mergeExistingOutput(HierarchicalMicroScheduler.java:253)
... 6 more

ERROR ------------------------------------------------------------------------------------------

Exact command:

/usr/java/latest/bin/java -Xmx6g -XX:-UseGCOverheadLimit -Xms512m -jar /projects/apps/alignment/GenomeAnalysisTK/latest/GenomeAnalysisTK.jar -R /data2/reference/sequence/human/ncbi/37.1/allchr.fa -et NO_ET -K /projects/apps/alignment/GenomeAnalysisTK/latest/Hossain.Asif_mayo.edu.key -mode INDEL -T ApplyRecalibration -nt 4 -input /data2/secondary/multisample/Merged.variant.INDEL.vcf.temp -recalFile /data2/secondary/multisample/temp/Merged.variant.INDEL.recal -tranchesFile /data2/secondary/multisample/temp/Merged.variant.INDEL.tranches -o /data2/secondary/multisample/Merged.variant.filter.INDEL_2.vcf

Version of GATK : 1.7 and 1.6.7

Post edited by Geraldine_VdAuwera on
Tagged:

Answers

  • ebanksebanks Broad InstitutePosts: 684Member, Administrator, GATK Developer, Broadie, Moderator, DSDE Member, GP Member admin

    Try removing "-nt 4"

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • bjaysheelbjaysheel Posts: 2Member
    edited October 2012

    Hi,
    Thanks for the pointer that seem to have resolved above error however now I have another one

        ##### ERROR MESSAGE: The allele with index 33360123 is not defined in the REF/ALT columns in the record
    

    any pointers?

    Thanks

    Post edited by bjaysheel on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,528Administrator, GATK Developer admin

    There must be a problem with your file. Please validate all your input files before posting this kind of error to the forum. Also, you are using an old version of GATK, so I strongly recommend you upgrade to the latest version.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.