Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Interval File Errors

100kristine100kristine UC BerkeleyMember
edited October 2013 in MuTect v1

Hi,

I'm been having a lot of trouble with the format for the interval file input to GATK. I've gotten 3 different errors from different versions of interval files that I've tried. It would be really helpful to get some pointers for where I might be going wrong.

Error 1. Interval file could not be parsed in any supported format. caused by Failed to parse Genome Location string: @HD VN:1.0 SO:coordinate

The file that I was using for this is hg19.list.txt (when I run the tool, the file extension is .list, but the forum wouldn't let me upload it). MuTect has been fine running with an example file that I pulled off of the ftp site (hg19examplelist.txt), but when I compare both files, I don't see a major difference in the format. To create hg19.list.txt I followed the directions from the FAQ and pulled the header information from my reference dict file and added my intervals. MuTect returns Error 1 with hg19examplelist.txt, but not hg19list.txt.

Error 2. INFO 15:33:54,965 GenomeAnalysisEngine - Processing 0 bp from intervals
WARN 15:33:54,967 GenomeAnalysisEngine - The given combination of -L and -XL options results in an empty set. No intervals to process.
INFO 15:33:54,981 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 15:33:54,982 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 15:33:55,032 Walker - [REDUCE RESULT] Traversal result is: 0
INFO 15:33:55,034 ProgressMeter - done 0.00e+00 0.1 s 14.3 h 0.0% Infinity w Infinity w
INFO 15:33:55,034 ProgressMeter - Total runtime 0.05 secs, 0.00 min, 0.00 hours
INFO 15:33:56,851 GATKRunReport - Uploaded run statistics report to AWS S3

I've mostly gotten this error when I tried to run the intervals information in bed format. After seeing this, I'm thinking that I might be misunderstanding what the interval file is for, even after looking at both FAQs. Are these the intervals that MuTect will be looking at or something generated from the sequencing process? If it helps any, I did not create these bam files, they are from someone else in my lab.

and

  1. Comparison method violates its general contract!

I'm running java 1.7 (this error does not always show up). It would be helpful to know why this might occur with some .list files and not others.

Thank you for any help that you can provide!

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Have you looked at the documentation on input files, which includes interval lists? http://www.broadinstitute.org/gatk/guide/article?id=1204

  • 100kristine100kristine UC BerkeleyMember

    Yes, I tried files in two of the formats mentioned by the FAQ: picard style interval list and bed. I'm getting the @HD VN:1.0 SO:coordinate error even though that line is present in the example in the FAQ. Also neither FAQ states where the intervals themselves come from or how they are used by mutect. Thank you for your help.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Kristine,

    Make sure your intervals list is named with either extension .bed or .list as appropriate; it cannot end in .txt. The program gets confused, thinks header lines are intervals and doesn't parse the file correctly. For the record, the simplest format for intervals (which I prefer, personally) is the <chr>:<start>-<stop> format, which doesn't require a sequence dictionary.

    The intervals list specifies which regions of the genome the analysis will be run on. I can't comment on how it's used in MuTect, but in GATK it's typically used to restrict analysis to exome capture targets, or to particular regions of interest.

  • 100kristine100kristine UC BerkeleyMember

    Thank you, Geraldine. I did try the bed format earlier, but kept getting a different error (below). The file is named intervals.bed, so the ending isn't the problem. (The files I uploaded earlier have a .txt extension because the forum wouldn't allow me to upload them as a .list file.)

    I tried two formats:

    chr1:0-249250620
    

    and

    1:0-249250620
    

    Both come up with the same error. I'm using ucsc's hg19 reference genome.``

        WARN  13:41:41,742 IntervalUtils - The interval file ../hg19param/intervals.bed contains no intervals that could be parsed. 
            INFO  13:41:41,743 GenomeAnalysisEngine - Processing 0 bp from intervals 
            WARN  13:41:41,744 GenomeAnalysisEngine - The given combination of -L and -XL options results in an empty set.  No intervals to process. 
            INFO  13:41:41,758 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
            INFO  13:41:41,758 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
            INFO  13:41:41,810 Walker - [REDUCE RESULT] Traversal result is: 0 
            INFO  13:41:41,811 ProgressMeter -            done        0.00e+00    0.1 s       14.5 h      0.0%      Infinity w  Infinity w 
            INFO  13:41:41,812 ProgressMeter - Total runtime 0.05 secs, 0.00 min, 0.00 hours 
            INFO  13:41:42,979 GATKRunReport - Uploaded run statistics report to AWS S3 
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    That's odd. Can you tell me how you are generating the intervals file? Are you doing it in the shell using Unix tools, a GUI text editor, or something else?

  • 100kristine100kristine UC BerkeleyMember

    I've been using vi. I don't think there are any extra characters, I'm attaching a screenshot of how it looks in the editor. Thank you!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh, if you have the intervals in that format the extension needs to be .interval_list or .list, not bed. You'll need to change the starting zeroes to ones. Sorry, the formatting requirements are a bit obscure, I realize.

  • 100kristine100kristine UC BerkeleyMember

    Thank you, that file is working now. I'm really sorry for all of the questions, but now I'm getting the third error in my original post - Comparison method violates its general contract!. Is it having trouble with a part of the bam file at chr1:564568 or is this still an input error? Thank you!

    INFO 09:37:06,162 HelpFormatter - Date/Time: 2013/10/29 09:37:06 INFO 09:37:06,162 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:37:06,162 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:37:06,183 ArgumentTypeDescriptor - Dynamically determined type of ../hg19param/dbsnp_137.hg19.vcf to be VCF INFO 09:37:06,187 ArgumentTypeDescriptor - Dynamically determined type of ../gatk/cosmic_hg19.vcf to be VCF INFO 09:37:06,191 GenomeAnalysisEngine - Strictness is SILENT INFO 09:37:06,263 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE Target Coverage: 1000 INFO 09:37:06,273 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 09:37:06,301 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03 INFO 09:37:06,319 RMDTrackBuilder - Loading Tribble index from disk for file ../hg19param/dbsnp_137.hg19.vcf INFO 09:37:06,612 RMDTrackBuilder - Loading Tribble index from disk for file ../gatk/cosmic_hg19.vcf INFO 09:37:06,668 GenomeAnalysisEngine - Processing 3036303846 bp from intervals INFO 09:37:06,682 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 09:37:06,682 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 09:37:13,899 MuTect - [MUTECTOR] Processed 1000041 reads in 5916 ms Error processing chr1:564568 java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeLo(TimSort.java:747) at java.util.TimSort.mergeAt(TimSort.java:483) at java.util.TimSort.mergeCollapse(TimSort.java:410) at java.util.TimSort.sort(TimSort.java:214) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:480) at org.broadinstitute.cga.tools.gatk.walkers.cancer.mutect.MuTect.map(MuTect.java:34) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:243) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:231)

  • 100kristine100kristine UC BerkeleyMember

    Thank you, Geraldine and Kcibul - I'm not allowed to run Java 6 on my university computer, so I tried to build from the source.

    I hit another error when I tried to build (just following the instructions on the git page, I put the bcel jars in the lib folder). Thank you again for the help

    ` [unjar] Expanding: /Users/kyoshihara/oservData/mutect-src/gatk-protected/dist/xml-apis-1.0.b2.jar into /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging

    [unjar] Unable to expand to file /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging/license/LICENSE.dom-documentation.txt
    
    [unjar] Unable to expand to file /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging/license/LICENSE.dom-software.txt
    
    [unjar] Unable to expand to file /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging/license/LICENSE.sax.txt
    
    [unjar] Unable to expand to file /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging/license/LICENSE.txt
    
    [unjar] Unable to expand to file /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging/license/README.dom.txt
    
    [unjar] Unable to expand to file /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging/license/README.sax.txt
    
    [unjar] Unable to expand to file /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging/license/README.txt
    
    [unjar] Expanding: /Users/kyoshihara/oservData/mutect-src/gatk-protected/dist/xmlbeans-2.3.0.jar into /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging
    
    [unjar] Expanding: /Users/kyoshihara/oservData/mutect-src/gatk-protected/dist/GenomeAnalysisTK.jar into /Users/kyoshihara/oservData/mutect-src/gatk-protected/staging
    

    require.executable:

    package:

    [mkdir] Created dir: /Users/kyoshihara/oservData/mutect-src/gatk-protected/dist/packages
    

    BUILD FAILED

    /Users/kyoshihara/oservData/mutect-src/gatk-protected/build.xml:936: no resources specified

    Total time: 2 minutes 8 seconds

    `

  • 100kristine100kristine UC BerkeleyMember

    Never mind, it seems to be working now. If anyone else gets this same error, the problem was that I was using Ant 1.9 and not Ant 1.8.X.

Sign In or Register to comment.