Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Picard/GATK MergeVcfs throws errors

TestorTestor GermanyMember

Dear all,
I am following your guidelines for germline SNP detection in GATK 4. Nevertheless, I cannot complete the concatenation of region-wise gvcfs.
Using GATK MergeVcfs I get the following error:
/package/sequencer/java/8/bin/java -jar -XX:+UseSerialGC -verbose:GC -Xmx8g -Djava.io.tmpdir=/scratch/cluster/seqcore/temp/smith/package/sequencer/gatk/current/gatk-package-4.0.1.1-local.jar MergeVcfs --INPUT ./03_GATK/core_L11935-2_Mystique.chrEBV.gvcf --INPUT ./03_GATK/core_L11935-2_Mystique.chrUn_KI270742v1.gvcf --OUTPUT ./03_GATK/core_L11935-2_Mystique.gvcf

[Fri Feb 09 13:20:55 CET 2018] MergeVcfs --INPUT ./03_GATK/core_L11935-2_Mystique.chrEBV.gvcf --INPUT ./03_GATK/core_L11935-2_Mystique.chrUn_KI270742v1.gvcf --OUTPUT ./03_GATK/core_L11935-2_Mystique.gvcf --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 1 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Fri Feb 09 13:20:55 CET 2018] Executing as [email protected] on Linux 4.14.17.mx64.205 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17; Deflater: Intel; Inflater: Intel; Picard version: Version:4.0.1.1

java.lang.IllegalArgumentException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
at java.net.URI.create(URI.java:852)
at htsjdk.samtools.util.IOUtil.getPath(IOUtil.java:1134)
at htsjdk.samtools.util.IOUtil.lambda$unrollPaths$2(IOUtil.java:1088)
at htsjdk.samtools.util.IOUtil$$Lambda$29/1967434886.accept(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at htsjdk.samtools.util.IOUtil.unrollPaths(IOUtil.java:1085)
at htsjdk.samtools.util.IOUtil.unrollFiles(IOUtil.java:1050)
at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:164)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:24)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
at org.broadinstitute.hellbender.Main.main(Main.java:277)
Caused by: java.net.URISyntaxException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parse(URI.java:3067)
at java.net.URI.(URI.java:588)
at java.net.URI.create(URI.java:850)

Applying the picard commands I get the following:
/package/sequencer/java/8/bin/java -jar -XX:+UseSerialGC -verbose:GC -Xmx8g -Djava.io.tmpdir=/scratch/cluster/seqcore/temp/smith/package/sequencer/picard-tools/current/picard.jar MergeVcfs INPUT=./03_GATK/core_L11935-2_Mystique.chrEBV.gvcf INPUT=./03_GATK/core_L11935-2_Mystique.chrUn_KI270742v1.gvcf OUTPUT= ./03_GATK/core_L11935-2_Mystique.gvcf

13:24:51.701 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/package/sequencer/picard-tools/2.12.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Feb 09 13:24:51 CET 2018] MergeVcfs INPUT=[./03_GATK/core_L11935-2_Mystique.chrEBV.gvcf, ./03_GATK/core_L11935-2_Mystique.chrUn_KI270742v1.gvcf] OUTPUT=./03_GATK/core_L11935-2_Mystique.gvcf VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: /project/seqcore-cluster/data/superhero/chrUn_KI270742v1 186727 . C .. END=186739 GT:DP:GQ:MIN_DP:PL 0/0:9:0:4:0,0,0
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:347)
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:334)
at htsjdk.samtools.util.IOUtil.unrollFiles(IOUtil.java:948)
at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:98)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

I appreciate any help on this issue.
Best
Stefan

Best Answer

Answers

  • TestorTestor GermanyMember

    Nobody an idea?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Testor
    Hi Stefan,

    Can you try GatherVcfs? I think that may be the tool you need to use.

    -Sheila

  • TestorTestor GermanyMember

    Hi Sheila,
    thanks. I have tried GatherVcfs and get the same error...

    Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: /project/seqcore-cluster/data/superhero/chrEBV 1 . A . . END=171823 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0

    Best
    Stefan

  • sklagessklages Member
    edited February 2018

    The same here when using gatk wrapper:

    /path/to/4.0.1.2/gatk GatherVcfs --spark-runner LOCAL --INPUT /path/to/file1.gvcf  --INPUT /path/to/file2.gvcf --VALIDATION_STRINGENCY SILENT --VERBOSITY DEBUG --OUTPUT xxx.gvcf
    

    resulted in:

    [..]
    java.lang.IllegalArgumentException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
    [..]
    

    Run with different combinations of --INPUT, -I= and -I: etc.

    The latter one gives just another error:

    [..]
    No value found for tagged argument: I:/path/to/file1.gvcf
    Tool returned:
    1
    

    So either I use this tool in a completely wrong way or there is something missing in the docs.

    Any ideas what's going wrong? :/

    best,
    Sven

  • sklagessklages Member

    hmm, nobody here to point me into the right direction? I am sure this is something very simple, but I don't get it :-(

  • SheilaSheila Broad InstituteMember, Broadie admin
    edited February 2018

    @Testor
    Hi Stefan,

    Hmm. Can you confirm your input VCFs are valid with ValidateVariants? If so, can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @sklages
    Hi Sven,

    See my above comment to Stefan. I am thinking this may be a bug. If you could submit a test case, that would be helpful.

    Thanks,
    Sheila

  • jfarrelljfarrell Member ✭✭

    I use either GATK CombineGVCFs or the new GATK4 GenomicsDBImport for gVCFs. These tools are specialized for the GATK gVCF format.

    https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineGVCFs.php

    https://gatkforums.broadinstitute.org/gatk/discussion/10061/using-genomicsdbimport-to-prepare-gvcfs-for-input-to-genotypegvcfs-in-gatk4

    Be aware the syntax has changed slightly for the parameters since this GenomicsDBImport article.

  • ArtamirArtamir Member
    edited March 12
    Sorry for digging up this old thread.
    Sadly GatherVcfs is causing similar problems for me and according to Google this is the only suitable source for this particular problem on the internet and I hope you can help.

    So let's start from the beginning:

    I'm using GATK4 installed via Bioconda (gatk4-4.0.11.0-0). I've been using HaplotypeCaller on an interval approach (scatter gather) to significantly speed up variant detection. In the next step I want to combine the interval based gVCFs files into a single gVCFs for one organism (the gather step).
    According to the Tool Documentation GatherVcfs should be the right tool for this task:
    "Gathers multiple VCF files from a scatter operation into a single VCF file."

    So I used this command as an initial test:
    gatk GatherVcfs -O test.gvcf -I ../file_1.gvcf -I ../file_2.gvcf

    Sadly I only got the following Java stack trace:

    ....
    To get help, see broadinstitute.github.io/picard/index.html#GettingHelp
    java.lang.IllegalArgumentException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
    at java.net.URI.create(URI.java:852)
    at htsjdk.samtools.util.IOUtil.getPath(IOUtil.java:1117)
    at htsjdk.samtools.util.IOUtil.lambda$unrollPaths$2(IOUtil.java:1071)
    ....

    So the problem is already within the first line of the VCF-file.
    I've absolutely no idea about how to solve this problem.

    As a workaround I've been trying CombineGVCFs for the moment, which is working. But According to the Tool Documentation GombineGVCFs serves a different purpose:
    "Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file"

    I don't intend to create a multi-sample gVCF file at this step. It's only about combining different intervals of a single sample.

    So any idea or help about this particular problem would be highly appreciated on my side.

    Thanks,
    Jochen

    Edit: btw is there a way to format a post? I didn't see any formatting tools like bold, italics and so on.
  • ArtamirArtamir Member
    Dear bhanuGandham,

    thank you very much for your reply! This helps a lot!
    The plethora of programs is sometimes confusing to me as a novice user of GATK. Especially having a non model organism, requiring adaption of the best practices at many steps.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @Artamir

    I am happy to help!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited April 29

    Hi @RicardoHarripaul

    We have a firecloud/cromwell team that is dedicated to answering WDL related question. I am moving this question to the firecloud team so that they can help you out.

Sign In or Register to comment.