Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

The stop position is less than start for Broad.human.exome.b37.scattered.txt

MigwellMigwell Member
edited March 11 in Ask the GATK team
I was running a test with the the gatk3 germline workflow (located at `gatk-workflows/gatk3-germline-snps-indels` on GitHub), but since I'm only interested in exome performance I used the `Broad.human.exome.b37.scattered.txt`, located at `gs://gatk-test-data/intervals/Broad.human.exome.b37.scattered.txt`, rather than the default intervals file.

However, running the workflow with this intervals file results in the following error:

```
2019-03-11 03:51:27,464 cromwell-system-akka.dispatchers.engine-dispatcher-21 ERROR - WorkflowManagerActor Workflow 91edb9e9-0f44-4c5c-8995-2c77090c7022 failed (during ExecutingWorkflowState): Job HCV_3.HaplotypeCaller:13:1 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: s3://cromwell-results/cromwell-execution/best_practise/91edb9e9-0f44-4c5c-8995-2c77090c7022/call-HCV_3/haplotype.HCV_3/ccf7ae57-4d04-4f16-b4e0-02450bcd4aca/call-HaplotypeCaller/shard-13/HaplotypeCaller-13-stderr.log.
Using GATK jar /usr/gitc/gatk4/gatk-package-4.beta.5-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -Xms2g -jar /usr/gitc/gatk4/gatk-package-4.beta.5-local.jar PrintReads -I /cromwell_root/cromwell-results/cromwell-execution/best_practise/91edb9e9-0f44-4c5c-8995-2c77090c7022/call-GPPW/processing.GPPW/f6ef85cb-7488-4248-b31c-ba42addfcc7d/call-GBF/NA12878.bam --interval_padding 500 -L /cromwell_root/genovic-cromwell-inputs/reference_data/b37/intervals/Broad.human.exome.scattered/Broad.human.exome.b37_21.bed -O local.sharded.bam
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/cromwell-results/cromwell-execution/best_practise/91edb9e9-0f44-4c5c-8995-2c77090c7022/call-HCV_3/haplotype.HCV_3/ccf7ae57-4d04-4f16-b4e0-02450bcd4aca/call-HaplotypeCaller/shard-13/tmp.66897de2
[March 11, 2019 3:42:49 AM UTC] PrintReads --output local.sharded.bam --intervals /cromwell_root/genovic-cromwell-inputs/reference_data/b37/intervals/Broad.human.exome.scattered/Broad.human.exome.b37_21.bed --interval_padding 500 --input /cromwell_root/cromwell-results/cromwell-execution/best_practise/91edb9e9-0f44-4c5c-8995-2c77090c7022/call-GPPW/processing.GPPW/f6ef85cb-7488-4248-b31c-ba42addfcc7d/call-GBF/NA12878.bam --interval_set_rule UNION --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20 --disableToolDefaultReadFilters false
[March 11, 2019 3:42:49 AM UTC] Executing as [email protected] on Linux 4.14.97-74.72.amzn1.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2~bpo8+1-b14; Version: 4.beta.5
[March 11, 2019 3:42:51 AM UTC] org.broadinstitute.hellbender.tools.PrintReads done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2058354688
***********************************************************************

A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect:The stop position 19506651 is less than start 19506652 in contig 21

***********************************************************************
```

You can understand why this happens by looking at the file `gs://gatk-test-data/intervals/Broad.human.exome.scattered/Broad.human.exome.b37_21.bed`, which is referenced by this intervals file. Lines like this cause GATK to fail:
```
21 19506651 19506651 + new_exome_1.1_content
```
Here, the start and end position are the same. I'm not really sure what the point of this is, but that's definitely the cause of the issue.

Answers

  • bshifawbshifaw Member, Broadie, Moderator admin

    Hi @Migwell

    It might be linked to the contents of the gs://gatk-test-data/intervals/Broad.human.exome.b37.scattered.txt being a list of bed files.
    "We also accept the widely-used BED format, where intervals are in the form <chr> <start> <stop>, with fields separated by tabs. However, you should be aware that this file format is 0-based for the start coordinates, so coordinates taken from 1-based formats (e.g. if you're cooking up a custom interval list derived from a file in a 1-based format) should be offset by 1. The GATK engine recognizes the .bed extension and interprets the coordinate system accordingly." source
    try either offsetting the bed files by 1 or using .interval_list file.

Sign In or Register to comment.