Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Variant Recalibrator Error: Track Input out of Coordinate Order on contig ....

Hi:

After running haplotypecaller on SNPs and Indels, I wanted to recalibrate the variant scores using VariantRecalibrator on SNPs and INDels simultaneously. Only 4 samples were used (just testing the pipeline right now). I am getting the following error saying one of the track inputs is out of coordinate order. Has anyone encountered a similar error?

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.4-7-g5e89f01):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: LocationAwareSeekableRODIterator: track input is out of coordinate order on contig chr4:140811083-140811084 compared to chr4:140811085
ERROR ------------------------------------------------------------------------------------------

[1]+ Exit 1 java -Xmx4g -jar $gatk -T VariantRecalibrator -R $ref2 -input ${data}/TruSeq/truseq.reduced.snps.indels.vcf --maxGaussians 6 -resource:hapmap,VCF,known=false,training=true,truth=true,prior=15.0 $hapmap -resource:omni,VCF,known=false,training=true,truth=false,prior=12.0 $omni -resource:mills,VCF,known=true,training=true,truth=true,prior=12.0 $goldstandard -resource:dbsnp,VCF,known=true,training=false,truth=false,prior=6.0 $dbsnp -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an ClippingRankSum -mode BOTH -recalFile ${data}/TruSeq/truseq.interstroke.snps.indels.reduced.recal -tranchesFile ${data}/TruSeq/truseq.interstroke.snps.indels.reduced.tranches -rscriptFile ${data}/TruSeq/truseq.interstroke.snps.indels.reduced.recal.plots.R

Thanks,

MC

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Have you checked your ROD file at the coordinates indicated in the error message? It is essential that all elements in the ROD file be in the correct genomic order.

  • chongmchongm Member

    Hi Geraldine:

    Here is a snippet of the output from my vcf file. It appears as if the same variants at 140811083 & 140811085 are being called twice in succession. Any idea as to why this might occur? I am also wondering why different scores are generated for the same variants (is this just due to downsampling?).

    chr4 140811083 . GC G 14399.88 . AC=5;AF=0.625;AN=8;BaseQRankSum=0.982;ClippingRankSum=-4.031;DP=197;FS=0.603;MLEAC=5;MLEAF=0.625;MQ=54.78;MQ0=0;MQRankSum=-2.920;QD=96.
    00;ReadPosRankSum=-2.278 GT:AD:GQ:PL 0/1:24,34:99:3364,0,1616 0/0:47,0:99:0,247,5140 1/1:1,55:99:6762,292,0 1/1:0,35:99:4314,193,0

    chr4 140811085 . TGCTGCTGCTGC T 10922.88 . AC=5;AF=0.625;AN=8;BaseQRankSum=0.934;ClippingRankSum=-3.431;DP=220;FS=5.400;MLEAC=5;MLEAF=0.625;MQ=55.13;MQ0=0;MQRankSum=-3.14
    7;QD=5.71;ReadPosRankSum=-2.798 GT:AD:GQ:PL 0/1:29,38:99:2816,0,2604 0/0:46,0:99:0,247,7691 1/1:1,65:99:5091,195,0 1/1:0,41:99:3056,123,0

    chr4 140811083 . GC G 8453.88 . AC=5;AF=0.625;AN=8;BaseQRankSum=1.530;ClippingRankSum=-2.255;DP=197;FS=5.070;MLEAC=5;MLEAF=0.625;MQ=54.78;MQ0=0;MQRankSum=-5.042;QD=56.36;ReadP
    osRankSum=-0.325 GT:AD:GQ:PL 0/1:24,24:99:2140,0,2474 0/0:47,0:99:0,247,7691 1/1:2,38:99:4028,144,0 1/1:0,26:95:2326,95,0

    chr4 140811085 . TGCTGCTGCTGC T 8453.88 . AC=5;AF=0.625;AN=8;BaseQRankSum=1.372;ClippingRankSum=-3.292;DP=220;FS=12.190;MLEAC=5;MLEAF=0.625;MQ=55.13;MQ0=0;MQRankSum=-3.726;QD=4.
    42;ReadPosRankSum=-0.721 GT:AD:GQ:PL 0/1:29,28:99:2140,0,2410 0/0:46,0:99:0,247,7691 1/1:2,48:99:4028,144,0 1/1:0,32:95:2326,95,0

    Thanks,

    MC

    Thanks,

    MC

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, that's odd. On the bright side it explains the error message -- but that shouldn't happen in a vcf output by gatk. Did this vcf come straight out of the UG, or has it gone through something like CombineVariants?

  • chongmchongm Member

    No it just came out of the haplotype caller... I am going to try to recall using the most recent version of GATK and pray that it will work somehow?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    OK, let me know if the issue still occurs with the latest version.

  • pkuertenpkuerten Member

    Any luck chongm? I am having the same issue with 2.4.7 using HaplotypeCaller. The double called variants are in the intervalpadding region which should only be covered by one of the two neighboring intervals. The interval on the other side is beyond my 50 bp padding; but not too far. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If the error persists, please upload a snippet of your bam along with the bed file so we can try to reproduce the error locally. Detailed instructions here: http://www.broadinstitute.org/gatk/guide/article?id=1894

  • chongmchongm Member

    Hi Everyone, the error still persists even with 2.4.9 haplotypecaller. I've uploaded an archived bug report with my 4 bams that I used to call variants, bai files, bed file, command text file, and the stack trace. It is in the chongmbugs folder and the archive name is variantrecalbug.tar.gz. Thanks for your help!

  • rpoplinrpoplin Member ✭✭✭

    Hi there,

    Thank you so much for uploading the bam snippets. Could you also give us the command line that you used with the HaplotypeCaller to produce the input vcf file: ${data}/TruSeq/truseq.reduced.snps.indels.vcf

    Thanks!

  • rpoplinrpoplin Member ✭✭✭
    edited March 2013

    @pkuerten said:
    Any luck chongm? I am having the same issue with 2.4.7 using HaplotypeCaller. The double called variants are in the intervalpadding region which should only be covered by one of the two neighboring intervals. The interval on the other side is beyond my 50 bp padding; but not too far. Thanks.

    I'm concerned by what you call the intervalpadding region. I suspect there is some interaction between the HC and the pipeline that is the root cause here. Could you please explain how the HaplotypeCaller is being used. Thanks!

  • chongmchongm Member

    To clarify for myself, I did not "pad" my intervals. I just used the original bed file that comes with my exome enrichment kit. Also, I will get these scripts to you tonight but I'm out of the office right now so I can't!

  • chongmchongm Member

    Hi, so the following command was used to generate the .vcf file:

    j="TruSeq/C1000/C1000.marked.realigned.recal.reduced.bam"
    l="TruSeq/10277b/10277b.marked.realigned.recal.reduced.bam"
    m="TruSeq/10361c/10361c.marked.realigned.recal.reduced.bam"
    n="TruSeq/14098/14098.marked.realigned.recal.reduced.bam"
    java -Xmx4g -jar $gatk \
    -R $ref2 \
    -T HaplotypeCaller \
    -I ${data}/${j} -I ${data}/${l} -I ${data}/${m} -I ${data}/${n} \
    --dbsnp $dbsnp135 \
    -o ${data}/TruSeq/truseq.reduced.snps.indels.vcf \
    -stand_call_conf 50.0 \
    -stand_emit_conf 30.0 \
    -A QualByDepth \
    -A MappingQualityRankSumTest \
    -A ReadPosRankSumTest \
    -A FisherStrand \
    -A RMSMappingQuality \
    -A InbreedingCoeff \
    -A ClippingRankSumTest \
    -A DepthPerAlleleBySample \
    -L $genesonly \
    -minPruning 10

  • rpoplinrpoplin Member ✭✭✭

    I've been able to replicate the problem locally. Thanks for providing the command line! I'll post here when we have a fix in place.

  • pkuertenpkuerten Member

    @rpoplin said:
    I'm concerned by what you call the intervalpadding region. I suspect there is some interaction between the HC and the pipeline that is the root cause here. Could you please explain how the HaplotypeCaller is being used. Thanks!

    I used interval padding 50 with the haplotypecaller and got this error. A colleague of mine had the same error without the interval padding. In her case, the cause turned out to be overlap in neighboring intervals in the interval file. After merging overlapping intervals everything was fine. Now instead of interval padding I am creating an updated bed file which is padded by 50bp (with overlaps merged) and rerunning the variant caller. I will post an update once it completes.

  • pkuertenpkuerten Member

    @chongm said:
    Hi, so the following command was used to generate the .vcf file:

    j="TruSeq/C1000/C1000.marked.realigned.recal.reduced.bam"
    l="TruSeq/10277b/10277b.marked.realigned.recal.reduced.bam"
    m="TruSeq/10361c/10361c.marked.realigned.recal.reduced.bam"
    n="TruSeq/14098/14098.marked.realigned.recal.reduced.bam"
    java -Xmx4g -jar $gatk \
    -R $ref2 \
    -T HaplotypeCaller \
    -I ${data}/${j} -I ${data}/${l} -I ${data}/${m} -I ${data}/${n} \
    --dbsnp $dbsnp135 \
    -o ${data}/TruSeq/truseq.reduced.snps.indels.vcf \
    -stand_call_conf 50.0 \
    -stand_emit_conf 30.0 \
    -A QualByDepth \
    -A MappingQualityRankSumTest \
    -A ReadPosRankSumTest \
    -A FisherStrand \
    -A RMSMappingQuality \
    -A InbreedingCoeff \
    -A ClippingRankSumTest \
    -A DepthPerAlleleBySample \
    -L $genesonly \
    -minPruning 10

    I used the following steps to fix the bed file.
    (to fix the names)
    awk -F"\t" 'OFS="\t" {if ($1== "chr1_gl000192_random") print "GL000192.1",$2,$3; else if ($1== "chr4_gl000194_random") print "GL000194.1",$2,$3; else if ($1== "chrUn_gl000218") print "GL000218.1",$2,$3; else print substr($1, 4, length($1)-3),$2,$3 }' S03723424_Covered.bed

    (to remove the header)
    tail -n+2 coordinates.bed >coordinates1.bed

    (to add 50 bp and merge overlapping intervals)
    slopBed -i coordinates1.bed -g genome -b 50 | mergeBed >coordinates2.bed

    (to sort)
    sort -k1,1V -k2,2g coordinates2.bed -o coordinates3.bed
    grep -v GL coordinates3.bed >coordinates4.bed
    grep GL000218.1 coordinates3.bed >>coordinates4.bed
    grep GL000194.1 coordinates3.bed >>coordinates4.bed
    grep GL000192.1 coordinates3.bed >>coordinates4.bed

    Hope it helps.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @pkuerten, it's odd that you would need to merge the overlapping intervals manually, as normally the GATK engine should do that for you.

  • pkuertenpkuerten Member
    edited April 2013

    @Geraldine_VdAuwera said:
    pkuerten, it's odd that you would need to merge the overlapping intervals manually, as normally the GATK engine should do that for you.

    You are right it did not work; still the same issue. I used

    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R .human_g1k_v37.fasta -I .A.bam -I B.bam -I C.bam -I .D.bam --dbsnp dbsnp_137.b37.vcf -L coordinates4.bed -o E.raw.snps.indels.vcf

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @pkuerten, I believe we have a fix for this in our development version now. It will be fixed in the upcoming 2.5 release which is coming very soon.

  • pkuertenpkuerten Member

    Thank you. Looking forward to the next release.

  • rpoplinrpoplin Member ✭✭✭

    @Geraldine_VdAuwera said:
    Hi pkuerten, I believe we have a fix for this in our development version now. It will be fixed in the upcoming 2.5 release which is coming very soon.

    Yes, this is fixed now and will go out with the release of GATK v2.5. Thanks for your help in tracking this down.

    Cheers,

  • vifehevifehe SpainMember

    Dear Rpoplin and Geraldine,

    I was wondering if you could explain what the issue was with this error in versions previous to 2.5. I am using current GATK 3.1, used once unified genotyper and Variant Recalibrator once in a set of 53 samples. Now I re-run those 53 with additional 42, using same command, and got the error:

    ERROR MESSAGE: LocationAwareSeekableRODIterator: track input is out of coordinate order on contig chr1:36638205 compared to chr1:36638949

    any ideas why it previously worked and not now? thanks a lot

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @vihefe,

    It sounds like one of your files is not sorted properly, or maybe the index file is corrupted or out of date (sometimes this causes the program to think there is a sorting issue even if there is not). You can try deleting the .idx files and running again; the GATK will automatically create a new index file.

    By the way, if you are working with a diploid organism, you should consider switching over to the new HaplotypeCaller GVCF-based workflow. It will give you better results than UG and is more convenient if you need to add more samples to your analysis at several points in your project.

  • gsun2_UNLgsun2_UNL Member

    Hi, I recently encountered the same error with my snp calling with HaplotypeCaller. Since I have about 300 samples, so I divided each chromosomes into 200 fragments and then used CatVariants to catenate all the g.vcf files. But when I tried to merge all the g.vcf files for chromosome 6, I got ##### ERROR MESSAGE: LocationAwareSeekableRODIterator: track variant is out of coordinate order on contig 6:174003304 compared to 6:174027955
    But for other chromosomes, it worked fine. I also tried to recall snps for that fragments which generated a new index file, but this error occurred again. I am using gatk 3.7.
    Can anyone please give me some suggestions on this issue?
    Thanks!

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @gsun2_UNL

    We have encountered this issue from another user before and here is the thread addressing this issue. Would you please try the recommendations provided in that thread and see how that works for you?
    Thank you. Let me know how it goes.

    Regards
    Bhanu

  • gsun2_UNLgsun2_UNL Member
    edited November 28
Sign In or Register to comment.