How to use multiple g.VCF files in GATK4.beta.1 GenotypeGVCFs?

Hi,
I tried to use GenotypeGVCFs from GATK4.beta.1, but there seems to be still a bug with the --variants statement. At first I gave a list of my g.VCF files to it (ending .list as it worked in GATK3.7), but got an error message that no suitable codecs have been found. Giving multiple statements containing always one of my input files produced the error that I am only allowed to set this option once, but running GenotypeGVCFs with only one input g.VCF worked (no longer as I tried to use just this sample within the input list). Was there a change since 3.7 or is this a bug?
In addition, I'm woundering how to get the full stack trace, as -DGATK_STACKTRACE_ON_USER_EXCEPTION was somehow recogniced as -D (A USER ERROR has occurred: Argument '[D, dbsnp]' cannot be specified more than once.) and -GATK_STACKTRACE_ON_USER_EXCEPTION just produced no change in the log.
Thanks in advance
Johannes

Using GATK jar /home/uni08/geibel/software/gatk-4.beta.1/gatk-package-4.beta.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -Xmx220g -jar /home/uni08/geibel/software/gatk-4.beta.1/gatk-package-4.beta.1-local.jar GenotypeGVCFs -R /home/uni08/geibel/chicken/chickenrefgen/galGal5_Dec2015/galGal5.fa --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/VCF_test/input_chr26.list --useNewAFCalculator -L /home/uni08/geibel/chicken/chickenrefgen/galGal5_Dec2015/contigs_chr26.intervals --dbsnp /home/uni08/geibel/chicken/chickenrefgen/ENSEMBL_20170106/Gallus_gallus.updated.vcf -O /usr/users/geibel/chicken/pool_sequence_nov2016/data/VCF_test/IndandPool_chr26.raw.vcf
[July 7, 2017 2:35:20 PM CEST] GenotypeGVCFs  --output /usr/users/geibel/chicken/pool_sequence_nov2016/data/VCF_test/IndandPool_chr26.raw.vcf --useNewAFCalculator true --dbsnp /home/uni08/geibel/chicken/chickenrefgen/ENSEMBL_20170106/Gallus_gallus.updated.vcf --variant /usr/users/geibel/chicken/pool_sequence_nov2016/data/VCF_test/input_chr26.list --intervals /home/uni08/geibel/chicken/chickenrefgen/galGal5_Dec2015/contigs_chr26.intervals --reference /home/uni08/geibel/chicken/chickenrefgen/galGal5_Dec2015/galGal5.fa  --annotateNDA false --heterozygosity 0.001 --indel_heterozygosity 1.25E-4 --heterozygosity_stdev 0.01 --standard_min_confidence_threshold_for_calling 10.0 --max_alternate_alleles 6 --max_genotype_count 1024 --sample_ploidy 2 --group StandardAnnotation --onlyOutputCallsStartingInIntervals false --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --disableToolDefaultReadFilters false
[July 7, 2017 2:35:20 PM CEST] Executing as geibel@gwdu101 on Linux 3.10.0-327.36.3.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15; Version: 4.beta.1
[July 7, 2017 2:35:33 PM CEST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.22 minutes.
Runtime.totalMemory()=985661440
***********************************************************************

A USER ERROR has occurred: Cannot read /usr/users/geibel/chicken/pool_sequence_nov2016/data/VCF_test/input_chr26.list because no suitable codecs found

***********************************************************************
Use -DGATK_STACKTRACE_ON_USER_EXCEPTIONto print the stack trace.

Issue · Github
by Geraldine_VdAuwera

Issue Number
2259
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answers

Answers

  • jogeibjogeib GermanyMember
  • jogeibjogeib GermanyMember

    Hi @shlee,
    sorry for asking again, but I did not really manage to get a VCF out of GenomicsDBImport. Is this somehow possible or can I give this gendb folder or one of the files in it into GenotypeGVCFs?

  • shleeshlee CambridgeMember, Broadie, Moderator

    @jogeib,

    I think you will find the GenomicDB repo's wiki helpful. Here are two pages pertaining to producing a combined GVCF, much like GATK3's CombineGVCFs:

  • laehnemannlaehnemann University of DuesseldorfMember

    Aah, I was also trying all kinds of ways to specify multiple files -- mostly because the docs for GATK4.0 beta still give that as valid syntax: https://software.broadinstitute.org/gatk/documentation/tooldocs/4.beta.1/org_broadinstitute_hellbender_tools_walkers_GenotypeGVCFs.php

    GenomicsDB looks like yet another dependency for me, so I think I'll just try bcftools merge for now. Or is there something to worry about when not using GenomicsDB (BTW, a wiki for that is here: https://github.com/Intel-HLS/GenomicsDB/wiki).

  • shleeshlee CambridgeMember, Broadie, Moderator

    @laehnemann, sorry about the state of the docs. Both the GATK4 release and its docs are in beta so to speak and are currently in the process of update. In the meanwhile, while we wait for the team to return from the UK workshop, can you try using GATK3.7's CombineGVCFs to create a single VCF to use with GenotypeGVCFs?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @laehnemann Using bcftools is not going to work, sorry -- it won't understand how to merge records appropriately.

    GenomicsDB isn't a dependency, at least not in the sense that you have to do anything special. The GenomicsDBImport tool takes in one or more single-sample GVCFs and imports data over a single interval into GenomicsDB. The output of the tool is a directory containing a GenomicsDB database with combined multi-sample data. GenotypeGVCFs can then read from the created GenomicsDB directly and output a VCF.

    Here are example commands to use it:

    gatk-launch GenomicsDBImport \
        -V data/gvcfs/mother.g.vcf \
        -V data/gvcfs/father.g.vcf \
        -V data/gvcfs/son.vcf \
        --genomicsDBWorkspace my_database \
        --intervals 20
    

    That generates a directory called my_database containing the combined gvcf data.

    Then you run joint genotyping; note the gendb:// prefix to the database input directory path.

    gatk-launch GenotypeGVCFs \
        -R data/ref/ref.fasta \
        -V gendb://my_database \
        -G StandardAnnotation -newQual \
        -O test_output.vcf 
    

    And that's all there is to it.

    There are two caveats:

    1. You can't add data to an existing database; you have to keep the original GVCFs around and reimport them all together when you get new samples. For very large numbers of samples, there are some batching options.

    2. At the moment you can only run GenomicsDBImport on a single genomic interval (ie max one contig). This will probably change because we'd like to enable running one more intervals in one go, but for now you need to run on each interval separately. We recommend scripting this of course.

  • jogeibjogeib GermanyMember

    Hi @Geraldine_VdAuwera,

    thanks that worked quite well and GenotypeGVCFs started. Unfortunately I got a java.lang.ArrayIndexOutOfBoundsException: 6
    error. Do you have an Idea what's the problem behind that?
    It is the pooled data, I had problems with using GATK 3.7, too.

    https://gatkforums.broadinstitute.org/gatk/discussion/9008/genotypegvcfs-max-genotype-count-not-working#latest

    Is this due to that problem, or could you solve it by now?

    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsn
    18:26:53.843 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/uni08/geibel/software/gatk-4.beta.1/gatk-package-4.beta.1-local.jar!/com/inte
    [July 14, 2017 6:26:53 PM CEST] GenotypeGVCFs  --output /usr/users/geibel/chicken/pool_sequence_nov2016/data/VCF_test/IndandPool_chr26.raw.vcf --useNewAFCalculator true
    [July 14, 2017 6:26:53 PM CEST] Executing as geibel@gwda022 on Linux 3.10.0-327.36.3.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15; Version: 4.beta.1
    18:26:53.882 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    18:26:53.882 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    18:26:53.883 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    18:26:53.883 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    18:26:53.883 INFO  GenotypeGVCFs - Deflater: IntelDeflater
    18:26:53.883 INFO  GenotypeGVCFs - Inflater: IntelInflater
    18:26:53.883 INFO  GenotypeGVCFs - Initializing engine
    18:26:57.071 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/uni08/geibel/chicken/chickenrefgen/ENSEMBL_20170106/Gallus_gallus.updated.vcf
    WARNING: No valid combination operation found for INFO field DB - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field HaplotypeScore - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DB - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field HaplotypeScore - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    18:27:49.536 INFO  GenotypeGVCFs - Done initializing engine
    log4j:WARN No appenders could be found for logger (org.broadinstitute.hellbender.utils.MathUtils$Log10Cache).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    18:27:49.993 WARN  PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
    18:27:50.731 WARN  PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
    18:27:57.069 INFO  ProgressMeter - Starting traversal
    18:27:57.070 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
    WARNING: No valid combination operation found for INFO field DB - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field HaplotypeScore - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    18:37:02.860 INFO  GenotypeGVCFs - Shutting down engine
    GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),0.021572665999999994,Cpu time(s),0.014992331999999997
    [July 14, 2017 6:37:02 PM CEST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 10.15 minutes.
    Runtime.totalMemory()=1806172160
    java.lang.ArrayIndexOutOfBoundsException: 6
            at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.generatePL(ReferenceConfidenceVariantContextMerger.java:456)
            at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.mergeRefConfidenceGenotypes(ReferenceConfidenceVariantContextMerger.java:
            at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:92)
            at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:212)
            at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
            at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
            at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
            at java.util.Iterator.forEachRemaining(Iterator.java:116)
            at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
            at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
            at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
            at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
            at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
            at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
            at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
            at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:838)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
            at org.broadinstitute.hellbender.Main.main(Main.java:230)
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Can you please validate your GVCF inputs with ValidateVariants, with the -gvcf option? (in the 3.7 version)

    Was the error you got with 3.7 similar, or a different message?

  • jogeibjogeib GermanyMember

    Hi @Geraldine_VdAuwera,
    I validated the g.VCF files, Validate Variants didn't find any faults. I tried also to run one of the g.VCF files directly - worked. But importing this file into a GenomicsDB and than using it directly from there produced again the error. So it seems to me beeing a problem with using the GenomicsDB. How does this exactly work? Is GTK using external dependencies for GenomicsDB, or doesn't it need them? The dependencies of GenomicsDB are curently not installed on our system, as I would need to ask our admins to do that for me.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @jogeib There are no external dependencies for using GenomicsDB. However some major parts of the code in GenotypeGVCFs have changed, and that is likely to be the cause of your problem.

    I suspect the problem may be related to ploidy; since you mention you are running on pooled data, I assume you are using a non-diploidy --ploidy setting in HaplotypeCaller? I'm not certain that the non-diploid model is supported in the new version of GenotypeGVCFs. I'm waiting for a response from the developers on that question.

    You mentioned you got errors with 3.7 too; can you clarify what those errors were?

  • jogeibjogeib GermanyMember

    The errors have been due to the pooled data. The Option --maxGenotypeCounts seemed somehow not to work properly. Due to that I sended in this chromosome and got the answer, that the problem will be solved in 4.0. In the meantime I called the data with ploidy 2, but as I'm interested in frequencies of rare alleles, this seems not to be the final solution.

    Issue · Github
    by shlee

    Issue Number
    2303
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I'll make sure the team understands we need to support non-diploid cases and that --maxGenotypeCounts is needed for many studies. I can't guarantee when it will get done, but I'm hopeful it will be ready for the general release slated for September.

  • KrzysztofMKozakKrzysztofMKozak Cambridge, UKMember

    Hello,

    I have followed the above recommendation and used GATK 3.8 (latest build on 14/11/17) to run CombineGVCFs on 12 GVCFs generated by GATK 4 HaplotypeCaller. I have Java 1.8. Every time I try this, the process produces a bit of a combined VCF and then fails with a core dump. I am already at 100GB RAM (128 requested from the server to allow an overhead). Is this a Java issue or an incompatibility between the 3.8 and 4.0 version of GATK?

    Picked up _JAVA_OPTIONS: -XX:+UseSerialGC
    INFO 19:42:09,083 HelpFormatter - ---------------------------------------------------------------------------------------------
    INFO 19:42:09,085 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
    INFO 19:42:09,085 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO 19:42:09,085 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
    INFO 19:42:09,086 HelpFormatter - [Wed Nov 15 19:42:09 EST 2017] Executing on Linux 2.6.32-696.6.3.el6.centos.plus.x86_64 amd64
    INFO 19:42:09,086 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14
    INFO 19:42:09,088 HelpFormatter - Program Args: -T CombineGVCFs -R /home/kozakk/ref/Herato_final.fasta -o test12.combined.vcf.gz --variant amazonaKK190.raw.snps.indels.g.vcf.gz --variant amazonaKK217.raw.snps.indels.g.vcf.gz --variant amphitriteKK491.raw.snps.indels.g.vcf.gz --variant amphitriteKK497.raw.snps.indels.g.vcf.gz --variant amphitriteKK520.raw.snps.indels.g.vcf.gz --variant amphitriteKK522.raw.snps.indels.g.vcf.gz --variant amphitriteKK528.raw.snps.indels.g.vcf.gz --variant guaricaM3121.raw.snps.indels.g.vcf.gz --variant guaricaM3125.raw.snps.indels.g.vcf.gz --variant hydaraHE1.raw.snps.indels.g.vcf.gz --variant phyllisKK102.raw.snps.indels.g.vcf.gz --variant phyllisKK104.raw.snps.indels.g.vcf.gz
    INFO 19:42:09,098 HelpFormatter - Executing as kozakk@compute-9-4.local on Linux 2.6.32-696.6.3.el6.centos.plus.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14.
    INFO 19:42:09,098 HelpFormatter - Date/Time: 2017/11/15 19:42:09
    INFO 19:42:09,098 HelpFormatter - ---------------------------------------------------------------------------------------------
    INFO 19:42:09,098 HelpFormatter - ---------------------------------------------------------------------------------------------
    ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/kozakk/bin/gatk3/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
    ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
    INFO 19:42:09,402 GenomeAnalysisEngine - Deflater: IntelDeflater
    INFO 19:42:09,402 GenomeAnalysisEngine - Inflater: IntelInflater
    INFO 19:42:09,403 GenomeAnalysisEngine - Strictness is SILENT
    INFO 19:42:09,508 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    WARN 19:42:09,850 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,850 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,850 IndexDictionaryUtils - Track variant3 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant4 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant5 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant6 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant7 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant8 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant9 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant10 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant11 doesn't have a sequence dictionary built in, skipping dictionary validation
    WARN 19:42:09,851 IndexDictionaryUtils - Track variant12 doesn't have a sequence dictionary built in, skipping dictionary validation
    INFO 19:42:09,919 GenomeAnalysisEngine - Preparing for traversal
    INFO 19:42:09,921 GenomeAnalysisEngine - Done preparing for traversal
    INFO 19:42:09,921 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 19:42:09,921 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 19:42:09,921 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    WARN 19:42:10,196 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
    WARN 19:42:10,197 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
    #

    A fatal error has been detected by the Java Runtime Environment:

    #

    SIGSEGV (0xb) at pc=0x00002b426d63ca20, pid=191437, tid=47672406116096

    #

    JRE version: Java(TM) SE Runtime Environment (8.0_45-b14) (build 1.8.0_45-b14)

    Java VM: Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode linux-amd64 )

    Problematic frame:

    V [libjvm.so+0x641a20] InstanceKlass::oop_oop_iterate_nv(oopDesc*, FastScanClosure*)+0x1b0

    #

    Core dump written. Default location: /pool/genomics/kozakk/erato_april/gvcf/core or core.191437

    #

    An error report file with more information is saved as:

    /pool/genomics/kozakk/erato_april/gvcf/hs_err_pid191437.log

    #

    If you would like to submit a bug report, please visit:

    http://bugreport.java.com/bugreport/crash.jsp

    #
    /opt/gridengine/default/spool/compute-9-4/job_scripts/707135: line 18: 191437 Aborted (core dumped) java -Xmx100G -Xms4G -jar /home/kozakk/bin/gatk3/GenomeAnalysisTK.jar -T CombineGVCFs -R /home/kozakk/ref/Herato_final.fasta -o test12.combined.vcf.gz --variant amazonaKK190.raw.snps.indels.g.vcf.gz --variant amazonaKK217.raw.snps.indels.g.vcf.gz --variant amphitriteKK491.raw.snps.indels.g.vcf.gz --variant amphitriteKK497.raw.snps.indels.g.vcf.gz --variant amphitriteKK520.raw.snps.indels.g.vcf.gz --variant amphitriteKK522.raw.snps.indels.g.vcf.gz --variant amphitriteKK528.raw.snps.indels.g.vcf.gz --variant guaricaM3121.raw.snps.indels.g.vcf.gz --variant guaricaM3125.raw.snps.indels.g.vcf.gz --variant hydaraHE1.raw.snps.indels.g.vcf.gz --variant phyllisKK102.raw.snps.indels.g.vcf.gz --variant phyllisKK104.raw.snps.indels.g.vcf.gz

  • KrzysztofMKozakKrzysztofMKozak Cambridge, UKMember

    The above problem was solved by switching to GATK 3.7 for Combine GVCFs. Looks like a bug in 3.8, like here: https://gatkforums.broadinstitute.org/gatk/discussion/10367/indelrealigner-crashing-java-fatal-error

  • KrzysztofMKozakKrzysztofMKozak Cambridge, UKMember

    @shlee said:
    @laehnemann, [...] can you try using GATK3.7's CombineGVCFs to create a single VCF to use with GenotypeGVCFs?

    May I ask for a clarification, please? I see conflicting recommendations on a few discussion threads. Is it OK to take GVCFs generated with GATK v4, run Combine GVCFs in v3.7, and then genotype in v4, or is it absolutely necessary to combine using GenomicsDBImport (and then have to concatenate the intervals)?

  • shleeshlee CambridgeMember, Broadie, Moderator

    @KrzysztofMKozak,

    If you want to use CombineGVCFs, you will have to use GATK v3.7 or the v3.8 latest nightly release as the tool has not (yet) been ported to GATK4 and there is a bug in the tool in v3.8.

    If you wish to use GenomicsDBImport, it is available in GATK v4.beta.

  • KrzysztofMKozakKrzysztofMKozak Cambridge, UKMember

    Dear shlee, my question was more whether this is a correct practice. In this post Geraldine says that GenomicsDBImport is the only valid way to merge GVCFs for GATK4:
    https://gatkforums.broadinstitute.org/gatk/discussion/10061/using-genomicsdbimport-to-prepare-gvcfs-for-input-to-genotypegvcfs-in-gatk4

    You seem to be contradicting this statement. Or are you saying that "if you want to use CombineGVCFs, you have to use GATK 3.7 for all steps, from HaplotypeCaller to GenotypeGVCFs"?

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @KrzysztofMKozak,

    GenomicsDB only supports diploid data and its my understanding our tools are meant to support workflows with alternate ploidies. I see the sentence you are referring to:

    Although there are several tools in the GATK and Picard toolkits that provide some type of VCF or GVCF merging functionality, for this use case there is only one valid way to do it: with GenomicsDBImport.

    At the time the linked document was written, the CombineGVCFs tool had not been ported (aka available) to GATK4, a program in BETA release. So within GATK4, GenomicsDB is the only option for combining GVCFs.

    To be able to use CombineGVCFs, you had (and still have) to use it from the GATK version 3 program. It is my understanding that going forward this tool, CombineGVCFs, will become available in GATK4.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @KrzysztofMKozak, to confirm, I did indeed mean to say that GenomicsDBImport was at the time the only way of merging GVCFs within GATK4. This was not intended to exclude the option of doing it with CombineGVCFs from GATK3, which is technically possible, although we prefer to avoid mixing versions as much as possible. In this case, I expect it should work fine, aside from CombineGVCFs' inherent flaws (which are what we're trying to avoid by using GenomicsDBImport).

Sign In or Register to comment.