Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Advice on troubleshooting 'ERROR MESSAGE: Badly formed genome loc'

Hi I am getting the following error message using IndelRealigner in GATK 2.8.1

ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 114033418 is less than start 114033419 in contig 13

I checked the targetIntervals file for these coordinates and I see

13:114033418-114033419

which seems fine. Other samples which were processed using the exact same steps look ok. I checked the bam files and there are only a few reads overlapping this site. But this is an off capture site so I see the same thing in other samples as well. Any advice would be most helpful. Thank you.

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hmm, that's odd. Can you show a screenshot of this locus in IGV? Is there anything weird about the overlapping reads?

  • pkuertenpkuerten Member

    Thank you Geraldine. Basically there is only one read overlapping this interval with the base disagreeing with the reference. As a work around, I provided the capture coordinates which do not include this region and everything went fine. I forgot to mention earlier, the bam file is a merger of per lane indel realigned files.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I see -- it looks like we're failing to process that flanking insertion properly. We can't treat this as a priority, especially since you have a workaround, but I'll put this in our bug tracker to make sure it gets fixed eventually.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh I forgot -- could you please share a snippet of the data on which this happens so we can debug this locally? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

  • nimarafatiUUnimarafatiUU SwedenMember

    Hi,

    I am getting the same error when I run SplitNCigarReads (GATK3.1.1) in my data. There are 9 samples and 7 out 9 give the same error. I made a query last week but did not receive any feedback. I was wondering if you have found out any solution to this.
    Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @nimarafatiUU We are following up on the new thread you started. Please don't post the same question in different places. Thanks!

  • bioSGbioSG Member

    I'm experiencing the same issue for some samples. Is there any workaround/fix for this problem? GATK version I'm using is 3.2-2-gec30cee.

  • bioSGbioSG Member
    edited September 2014

    @bioSG said:
    I'm experiencing the same issue for some samples. Is there any workaround/fix for this problem? GATK version I'm using is 3.2-2-gec30cee.

    Forgot to mention, I'm mapping sample with BWA aln (0.7.3a-r367) vs hg38 reference genome.

    My bed target file looks like this:

    chr17   80116948    80117129
    chr17   80117580    80117769
    chr17   80118173    80118377
    chr17   80118633    80118825
    chr17   80119252    80119351
    chr17   94592038    94592156
    chr17   94596235    94596332
    chr17   94614422    94615104
    chr17   94619175    94619488
    chr17   94622321    94622511
    chr17   94623591    94623834
    

    Command I'm using:
    java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -I sample_1.bam -R hg38.fa -o sample1.intervals -L target.bed

    Error:
    `ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 83257441 is less than start 94592039 in contig chr17

    Seems very weird to me because 83257441 position is out of any interval found in BED file. 94592039 is the start position of the closest interval to that position.
    `

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited September 2014

    @bioSG‌

    Hi,

    This has been fixed in our nightly builds. Please download the latest nightly here: https://www.broadinstitute.org/gatk/nightly

    -Sheila

  • bioSGbioSG Member

    @sheila

    I've been testing again with nightly build:

    The Genome Analysis Toolkit (GATK) vnightly-2014-09-26-g4c7b578, Compiled 2014/09/26 00:01:19

    I'm still getting the same error.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @bioSG‌ Have you checked what happens if you extract just a few intervals around the problematic one to a new file and run with that? And have you checked that the file has consistent whitespace, meaning not a mix of spaces and tabs? This looks like it might be a formatting problem.

  • bioSGbioSG Member

    @Geraldine_VdAuwera‌ No I haven't tried to extract intervals around problematic one, because I need them. Anyways I'm gonna try it just to see what happens. And yes I've checked BED file and it's correctly formed, all tabs no white spaces or any other separator dancing in file.

  • bioSGbioSG Member
    edited October 2014

    @Sheila‌ @Geraldine_VdAuwera‌

    Following GATK error log problematic interval seems to be:
    chr17 94592038 94592156

    I've deleted the previous intervals:

    chr17   80118173    80118377
    chr17   80118633    80118825
    chr17   80119252    80119351
    

    and the following intervals:

    chr17   94596235    94596332
    chr17   94614422    94615104
    chr17   94619175    94619488
    

    So new BED looks like this:

    chr17   80113198    80113386
    chr17   80116948    80117129
    chr17   80117580    80117769
    chr17   80118173    80118377
    chr17   80118633    80118825
    chr17   94592038    94592156
    chr17   94614422    94615104
    chr17   94619175    94619488
    chr17   94622321    94622511
    chr17   94623591    94623834
    chr18   22171125    22172299
    

    I'm still getting the error.. I don't think it has to do with BED file. However, I'm trying with same intervals and a different BAM file, I'll report asap.

    If I delete the "problematic" interval then I get the error on the next one on BED, which is the first interval near the problematic position in BAM:

    ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 83257441 is less than start 94614423 in contig chr17

  • bioSGbioSG Member
    edited October 2014

    Used two different samples with both original BED intervals and modified (as I said on my previous post) and both fail with the exact same GenomeLocParser error.

    I've tried also with latest nightly build without success:

    The Genome Analysis Toolkit (GATK) vnightly-2014-10-02-g6eb7761, Compiled 2014/10/02 00:01:15

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    OK, so are you able to remove the problematic intervals and get the program running properly on the rest?

    Have you looked at the data in IGV to see if there is anything weird going on there? Validating with Picard ValidateSAMFile is also a good idea.

    Is this RNAseq data by any chance?

  • bioSGbioSG Member

    I've found why it's failing and fixed it. Ineed BED file was not correctly remade for chr17 hg38 coordinates. hg19 to hg38 coordinates and chr sizes vary a lot, thats why it was failing.

    My fault, it's now working with latest nightly build.

  • prepagamprepagam Member

    I got this error myself. It turns out the bedfile I was using had coordinates outside of the range of chromosome length in the vcf. So if you get this error, get the max chromosome length at the top of your vcf file with the highest coordinates for each chromosome in your bed.
    e.g.
    In your vcf

    contig=<ID=chr1,length=226785>

    contig=<ID=chr2,length=26708>

    contig=<ID=chr3,length=917893>

  • For what it's worth, I also encountered this error today, and upon further digging, like @bioSG I discovered that I had been trying to use a BED file of GRCh38 coordinates on a GRCh37-aligned BAM.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sounds like GRCh38 is being a bajor pain in the mutt. Is it bad that I'm relieved we don't support it yet?

  • scottyn371scottyn371 Member

    I'm getting this error too.
    GenomeLocParser complains that:
    "Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 652827 is less than start 652828 in contig chr1". It appears that I have no sequence reads in that interval. Could that be anything to do with it?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @scottyn371 GATK version, command line and stack trace (full error message with all the gross code bits) please?

  • scottyn371scottyn371 Member

    GenomeAnalysisTK-3.3-0:

    java -Xmx4g -jar $GATK_dir/GenomeAnalysisTK.jar -T BaseRecalibrator -I input_bam.marked.realigned.fixed.bam -R $hg19 -knownSites $dbsnp -o recal_data.tableclear

    INFO 13:50:34,006 HelpFormatter - --------------------------------------------------------------------------------
    INFO 13:50:34,013 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22
    INFO 13:50:34,014 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 13:50:34,015 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 13:50:34,024 HelpFormatter - Program Args: -T BaseRecalibrator -I input_bam.marked.realigned.fixed.bam -R /home/Tools/galaxy_tools/genomes/hg19_chr_order_no_un/hg19.fa -knownSites /home/Tools/galaxy_tools/genomes/dbSNP/snp142Common.filter.bed -o recal_data.table
    INFO 13:50:34,041 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.17.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_55-mockbuild_2014_04_09_11_51-b00.
    INFO 13:50:34,042 HelpFormatter - Date/Time: 2015/03/26 13:50:34
    INFO 13:50:34,043 HelpFormatter - --------------------------------------------------------------------------------
    INFO 13:50:34,043 HelpFormatter - --------------------------------------------------------------------------------
    INFO 13:50:34,258 GenomeAnalysisEngine - Strictness is SILENT
    INFO 13:50:34,688 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO 13:50:34,731 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 13:50:34,799 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06
    INFO 13:50:35,883 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 13:50:35,900 GenomeAnalysisEngine - Done preparing for traversal
    INFO 13:50:35,902 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 13:50:35,904 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 13:50:35,906 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
    INFO 13:50:36,456 BaseRecalibrator - The covariates being used here:
    INFO 13:50:36,456 BaseRecalibrator - ReadGroupCovariate
    INFO 13:50:36,457 BaseRecalibrator - QualityScoreCovariate
    INFO 13:50:36,457 BaseRecalibrator - ContextCovariate
    INFO 13:50:36,457 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
    INFO 13:50:36,457 BaseRecalibrator - CycleCovariate
    INFO 13:50:36,460 ReadShardBalancer$1 - Loading BAM index data
    INFO 13:50:36,461 ReadShardBalancer$1 - Done loading BAM index data
    INFO 13:50:40,762 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.3-0-g37228af):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 652827 is less than start 652828 in contig chr1
    ERROR ------------------------------------------------------------------------------------------
  • scottyn371scottyn371 Member

    I really have tried to fix it myself by removing the offending SNP ffrom $dbsnp but it just finds another to complain about

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @scottyn371
    Hi,

    Can you try using the latest nightly build of GATK and let us know if it works? This may be a bug that was fixed in a recent build. If it does not fix it, I will need you to submit a bug report.

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @scottyn371 You should use a VCF file instead of a BED file for the known sites.

  • scottyn371scottyn371 Member

    Thanks Geraldine, the vcf worked!

  • morgane5151morgane5151 zurichMember

    Hello, I would like to transform my output insertions.bed and deletions.bed into a single vcf to use for the realignment with the "variantToVCF" tool.
    I get the following message
    INFO 14:27:17,612 HelpFormatter - --------------------------------------------------------------------------------
    INFO 14:27:17,625 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-gf196186, Compiled 2015/06/26 10:08:42
    INFO 14:27:17,625 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 14:27:17,626 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 14:27:17,634 HelpFormatter - Program Args: -T VariantsToVCF -R /gdc_home4/morgane/Ref_withplastic_2014/ITAG2.4_ALL.fa -o insertions.vcf --variant:BED insertions.bed
    INFO 14:27:17,675 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.5.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_65-mockbuild_2014_07_14_06_19-b00.
    INFO 14:27:17,676 HelpFormatter - Date/Time: 2015/08/31 14:27:17
    INFO 14:27:17,676 HelpFormatter - --------------------------------------------------------------------------------
    INFO 14:27:17,677 HelpFormatter - --------------------------------------------------------------------------------
    INFO 14:27:17,799 GenomeAnalysisEngine - Strictness is SILENT
    INFO 14:27:17,962 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 14:27:19,330 RMDTrackBuilder - Writing Tribble index to disk for file /gdc_home4/morgane/RNAseq_2015/Cutadapt_treatment/Pool2/trim_not_left/trim_r2/A19/tophat_out/insertions.bed.idx
    INFO 14:27:22,681 GenomeAnalysisEngine - Preparing for traversal
    INFO 14:27:22,689 GenomeAnalysisEngine - Done preparing for traversal
    INFO 14:27:22,690 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 14:27:22,691 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 14:27:22,691 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 14:27:24,520 GATKRunReport - Uploaded run statistics report to AWS S3
    ....

    ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 3060 is less than start 3061 in contig Mito_Scaffold_001

    The command line is following
    java -jar /usr/local/gatk-protected-3.4-20150626/executable/GenomeAnalysisTK.jar -T VariantsToVCF -R /gdc_home4/morgane/Ref_withplastic_2014/ITAG2.4_ALL.fa -o insertions.vcf --variant:BED insertions.bed

    The file I process looks like that:
    [[email protected] tophat_out]$ head insertions.bed
    track name=insertions description="TopHat insertions"
    Mito_Scaffold_001 3060 3060 G 1
    Mito_Scaffold_001 5022 5022 A 2
    Mito_Scaffold_001 5028 5028 T 5
    Mito_Scaffold_001 5213 5213 T 4
    Mito_Scaffold_001 5216 5216 AT 4
    Mito_Scaffold_001 5217 5217 TA 1
    Mito_Scaffold_001 5218 5218 AC 2
    Mito_Scaffold_001 9911 9911 A 4
    Mito_Scaffold_001 9986 9986 A 4

    Very similar problem as mentionned in previous messages but I can't solve it.
    Do you have any idea to avoid this problem, or another strategy to produce de vcf file?

    Thanks a lot
    Cheers
    Morgane

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @morgane5151
    Hi Morgane,

    The BED format is 0-based for the start coordinates, so coordinates taken from 1-based formats should be offset by 1. I am not sure if you have done that, but usually this error arises from that mistake.

    -Sheila

  • morgane5151morgane5151 zurichMember

    Thank you Sheila, I got this but I can make this offset in a quick and correct way?
    Morgane

  • morgane5151morgane5151 zurichMember

    I managed to change that but it produces

    • for the deletions.bed seems to work fine, processes all sites and produces an empty file
    • the same error with the insertions.bed file
      before offset:
      ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 1281 is less than start 1282 in contig Mito_Scaffold
      after offset
    ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 1282 is less than start 1283 in contig Mito_Scaffold

    I do not have access to a database with indels "ready to use".
    I wonder why I can't find any answer to my problem when so many people use tophat and do the realignment...
    Thank you for your help.
    Morgane

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @morgane5151
    Hi Morgane,

    Yes, unfortunately, we recommend STAR aligner, so we cannot help you with using a different aligner. https://www.broadinstitute.org/gatk/guide/article?id=3891

    -Sheila

  • 5581681555816815 TNMember

    I am also experiencing this error today. Our alignment is merged from bwa and STAR. Although @Sheila mentioned that it is currently not a supported aligner, I hope it would be on your to-do list so in the future it does not depend on specific alignment tool...

    GATK version 3.5-0-g36282e4

    Program Args: -T SplitNCigarReads -rf ReassignOneMappingQuality -R mm9.fa -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS -fixNDN -I in.bam -o out.bam

    ERROR MESSAGE: Badly formed genome location: Parameters to GenomeLocParser are incorrect:The stop position 70462216 is less than start 70462217 in contig 12

    Issue · Github
    by Sheila

    Issue Number
    836
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @55816815 What do you mean by "Our alignment is merged from bwa and STAR"? Did you merge bam files produced with different aligners? That is a really bad idea. You need to reprocess the data correctly. If what you have is DNA and RNAseq, then they should be processed differently and separately. We also recommend doing the variant calling separately, because DNA and RNAseq datasets have different technical properties so the variant callsets should be filtered in different ways.

  • arshiarshi Member
    edited May 2017

    Hi everyone,

    I am getting this error when I run MuTect. I think MuTect uses its own GATK copy? Since I did not provide path information to my local GATK copy? However, I get a full stats.out file with some variants as KEEP filter.

    INFO 02:17:12,717 MuTect - [MUTECT] Inspected 197000 potential candidates

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.1-0-g72492bb):
    ERROR
    ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The stop position 65686 is less than start 65687 in contig chrX
    ERROR ------------------------------------------------------------------------------------------

    I am using hg19 as ref

    Any advice?
    Many Thanks!
    Arshi

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @arshi
    Hi Arshi,

    Are you using version MuTectv1? Is there any way you can use MuTect2?

    Can you please post the exact command you ran? Also, can you validate your input BAM file with ValidateSamFile?

    Thanks,
    Sheila

  • arshiarshi Member

    Hi @Sheila

    See below:
    java -Xmx2g -jar mutect-1.1.7.jar --version
    3.1-0-g72492bb
    I believe this is the latest one?

    ValidateSamFile output -

    java -Xmx2g -jar /path/Picard-1.100/ValidateSamFile.jar I=/path/R5.CCP.bam MODE=SUMMARY
    [Sun May 14 15:10:14 EDT 2017] net.sf.picard.sam.ValidateSamFile
    INPUT=/path/R5.CCP.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Sun May 14 15:10:14 EDT 2017] Executing as [email protected] on Linux 4.4.0-75-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_121-b00; Picard version: 1.100(1571)
    INFO 2017-05-14 15:12:16 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:02:01s. Time for last 10,000,000: 121s. Last read position: chr6:153,894,874
    INFO 2017-05-14 15:14:17 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:04:02s. Time for last 10,000,000: 121s. Last read position: chr17:59,938,830
    No errors found

    Mutect command used -

    java -Xmx2g -jar $MUTECT \
    --analysis_type MuTect \
    --reference_sequence hg19.fasta \
    --cosmic $COSMIC \
    --dbsnp $DBSNP \
    --input_file:normal $Normal \
    --input_file:tumor $Tumor \
    --out $saveas.mutect.stats.out \
    --coverage_file $saveas.mutect.coverage.wig.txt \
    -U ALLOW_SEQ_DICT_INCOMPATIBILITY

    COSMIC="my.withchr.b37_cosmic_v54_120711.vcf"
    DBSNP="genomes/hg19_broad/dbsnp_138.hg19.vcf"

    Thanks for your help!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @arshi
    Hi Arshi,

    The new MuTect2 in GATK4 is coming out soon. This may be a bug, but there is no way the team will devote time to fixing issues in MuTect1. The best thing to do is upgrade to the latest version of MuTect2.

    -Sheila

Sign In or Register to comment.