Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

combine GVCFs error

KG_Fun_GenKG_Fun_Gen Member
edited March 2018 in Ask the GATK team

I am trying to combine 240 gvcf files to run joint GenotypeGVCFs. I created 12 meta-merged-GVCFs by combining 20 samples into one.and I did this separately for each chromosome. When I combine 12 metamerge files for each chromosome, i get this error
09:54:21.509 INFO CombineGVCFs - Shutting down engine
[March 27, 2018 9:54:21 AM EDT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 1.10 minutes.
Runtime.totalMemory()=6771179520
java.lang.IllegalArgumentException: Unexpected base in allele bases '*AAAAAAAAC'
at htsjdk.variant.variantcontext.Allele.(Allele.java:165)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:239)
at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.extendAllele(ReferenceConfidenceVariantContextMerger.java:406)
at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.remapAlleles(ReferenceConfidenceVariantContextMerger.java:178)
at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:70)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.endPreviousStates(CombineGVCFs.java:340)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.createIntermediateVariants(CombineGVCFs.java:189)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.apply(CombineGVCFs.java:134)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:73)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:118)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
at org.broadinstitute.hellbender.Main.main(Main.java:277)

on greping

File1

zgrep "AAAAAAAAC" ///sc/orga/projects/psychgen/scratch/meta-merged.1.chr22.gvcf.gz
chr22 16190298 . AAAAAAAAAAC A, . . DP=20;ExcessHet=3.01;RAW_MQ=23109.00 GT:AD:DP:GQ:MIN_DP:PL:SB ./.:0,4,0:4:12:.:127,12,0,127,12,127:0,0,3,1 ./.:0,2,0:2:6:.:74,6,0,74,6,74:0,0,2,0 ./.:.:2:3:2:0,3,45,3,45,45 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:1:3:1:0,3,27,3,27,27 ./.:.:2:3:2:0,3,45,3,45,45 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:1:3:1:0,3,16,3,16,16 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:0,4,0:4:12:.:125,12,0,125,12,125:0,0,2,2 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:1:0:1:0,0,0,0,0,0 ./.:.:1:0:1:0,0,0,0,0,0 ./.:.:2:3:2:0,3,45,3,45,45
^C

File 2

zgrep "AAAAAAAAC" ///sc/orga/projects/psychgen/scratch/meta-merged.101.chr22.gvcf.gz
chr22 16190298 . AAAAAAAAAAC A, . . DP=12;ExcessHet=3.01;RAW_MQ=12436.00 GT:AD:DP:GQ:MIN_DP:PL:SB ./.:.:0:0:0:0,0,0,0,0,0 ./.:0,3,0:3:9:.:135,9,0,135,9,135:0,0,2,1 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:2:6:2:0,6,34,6,34,34 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:1:3:1:0,3,35,3,35,35 ./.:0,2,0:2:6:.:77,6,0,77,6,77:0,0,1,1 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:3:0:2:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:1:3:1:0,3,36,3,36,36 ./.:.:1:3:1:0,3,37,3,37,37 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0
chr22 16190299 . AAAAAAAAAC A,*, . . DP=10;ExcessHet=3.01;RAW_MQ=3600.00 GT:AD:DP:GQ:MIN_DP:PL:SB ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:0,0,3,0:3:9:.:135,135,135,9,9,0,135,135,9,135:0,0,2,1 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:2:3:1:0,3,35,3,35,35,3,35,35,35 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:1:3:1:0,3,18,3,18,18,3,18,18,18 ./.:0,0,2,0:2:6:.:77,77,77,6,6,0,77,77,6,77:0,0,1,1 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:0,1,0,0:1:3:.:45,3,0,45,3,45,45,3,45,45:0,0,1,0 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:1:3:1:0,3,36,3,36,36,3,36,36,36 ./.:.:1:3:1:0,3,37,3,37,37,3,37,37,37 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0,0,0,0,0

zgrep "AAAAAAAAC" ///sc/orga/projects/psychgen/scratch/meta-merged.121.chr22.gvcf.gz
chr22 16190298 . AAAAAAAAAAC A, . . DP=10;ExcessHet=3.01;RAW_MQ=5882.00 GT:AD:DP:GQ:MIN_DP:PL:SB ./.:.:1:3:1:0,3,34,3,34,34 ./.:.:2:0:2:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:1:3:1:0,3,36,3,36,36 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:0,2,0:2:6:.:90,6,0,90,6,90:0,0,2,0 ./.:.:2:6:2:0,6,70,6,70,70 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0./.:.:0:0:0:0,0,0,0,0,0 ./.:.:1:3:1:0,3,18,3,18,18 ./.:.:1:3:1:0,3,37,3,37,37 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0 ./.:.:0:0:0:0,0,0,0,0,0

and when I run
java -jar $GenomeAnalysisTK_jar ValidateVariants -V $FILE1 -R /hg19/chr22.fa -gvcf

java -jar $GenomeAnalysisTK_jar ValidateVariants -V $VCF -R /hpc/users/girdhk01/psychencode/resources/hg19/chr22.fa -gvcf
10:17:45.571 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/hpc/packages/minerva-common/gatk/4.0.1.2/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:17:45.740 INFO ValidateVariants - ------------------------------------------------------------
10:17:45.740 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.0.1.2
10:17:45.740 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
10:17:45.740 INFO ValidateVariants - Executing as [email protected] on Linux v2.6.32-358.6.2.el6.x86_64 amd64
10:17:45.741 INFO ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_111-b14
10:17:45.741 INFO ValidateVariants - Start Date/Time: March 27, 2018 10:17:45 AM EDT
10:17:45.741 INFO ValidateVariants - ------------------------------------------------------------
10:17:45.741 INFO ValidateVariants - ------------------------------------------------------------
10:17:45.742 INFO ValidateVariants - HTSJDK Version: 2.14.1
10:17:45.742 INFO ValidateVariants - Picard Version: 2.17.2
10:17:45.742 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 1
10:17:45.742 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:17:45.742 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:17:45.743 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:17:45.743 INFO ValidateVariants - Deflater: IntelDeflater
10:17:45.743 INFO ValidateVariants - Inflater: IntelInflater
10:17:45.743 INFO ValidateVariants - GCS max retries/reopens: 20
10:17:45.743 INFO ValidateVariants - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
10:17:45.743 INFO ValidateVariants - Initializing engine
10:17:46.560 INFO FeatureManager - Using codec VCFCodec to read file file:///sc/orga/projects/psychgen/scratch/meta-merged.41.chr22.gvcf.gz
10:17:46.619 INFO ValidateVariants - Done initializing engine
10:17:46.623 WARN ValidateVariants - GVCF format is currently incompatible with allele validation. Not validating Alleles.
10:17:46.623 INFO ProgressMeter - Starting traversal
10:17:46.623 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
10:17:56.628 INFO ProgressMeter - chr22:19214664 0.2 1650000 9897030.9
10:18:06.626 INFO ProgressMeter - chr22:22421630 0.3 3455000 10363445.5
10:18:16.631 INFO ProgressMeter - chr22:25287649 0.5 5290000 10577179.4
10:18:26.631 INFO ProgressMeter - chr22:27996611 0.7 7141000 10709358.1
10:18:36.635 INFO ProgressMeter - chr22:30653901 0.8 8935000 10719427.3
10:18:46.637 INFO ProgressMeter - chr22:33277050 1.0 10735000 10732495.8
10:18:56.642 INFO ProgressMeter - chr22:35995428 1.2 12542000 10747522.1
10:19:06.641 INFO ProgressMeter - chr22:38521573 1.3 14322000 10739083.7
10:19:16.645 INFO ProgressMeter - chr22:41203498 1.5 16145000 10760702.9
10:19:26.645 INFO ProgressMeter - chr22:43888896 1.7 17965000 10776629.1
10:19:36.647 INFO ProgressMeter - chr22:46816339 1.8 19807000 10801559.7
10:19:46.649 INFO ProgressMeter - chr22:50017181 2.0 21660000 10827654.0
10:19:50.658 INFO ProgressMeter - chr22:51238008 2.1 22414832 10842826.0
10:19:50.659 INFO ProgressMeter - Traversal complete. Processed 22414832 total variants in 2.1 minutes.
10:19:50.660 INFO ValidateVariants - Shutting down engine
[March 27, 2018 10:19:50 AM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 2.09 minutes.
Runtime.totalMemory()=4264034304


A USER ERROR has occurred: A GVCF must cover the entire region. Found 3044389417 loci with no VariantContext covering it. The first uncovered segment is:chr1:1-249250621


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

can you help me with this?

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @KG_Fun_Gen
    Hi,

    This is a bug that has been reported. You can keep track of it here.

    -Sheila

  • I think that CombineGVCFs doesn't like when each GVCF already has multiple samples in it? You might try going straight from your 240 GVCFs into one GVCF, skipping the meta-merging step.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @neato_nick
    Hi,

    Yes, I think that was part of the issue. In any case, this should be fixed soon.

    -Sheila

  • Can I just add that I have come across the same problem? Running on gatk-4.0.2.1.

  • manolismanolis Member ✭✭
    edited May 2018

    Hi, I'm using gatk-4.0.3.0 and I have the same error (first in MergeVCFs and then going back to the gvcfs files).

    MergeVCFs, after HaplotypeCaller

    Using GATK jar /share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx24g -jar /share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-loca
    l.jar MergeVcfs -I 15_1132_00001_g.vcf.gz -I 15_1132_00002_g.vcf.gz... ... (21589 input files) ... ... 
    -I 15_1132_21588_g.vcf.gz -I 15_1132_21589_g.vcf.gz -O /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/storage/15_1132_g.vcf.gz
    
    
    [Wed May 09 01:59:48 CEST 2018] MergeVcfs  --INPUT 15_1132_00001_g.vcf.gz --INPUT 15_1132_00002_g.vcf.gz ... ... (21589 input files) --INPUT 15_1132_21588_g.vcf.gz --INPUT 15_1132_21589_g.vcf.gz --OUTPUT /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/storage/15_1132_g.vcf.gz  --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed May 09 01:59:48 CEST 2018] Executing as [email protected] on Linux 3.5.0-36-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14; Deflater: Intel; Inflater: Intel; Picard version: Version:4.0.3.0
    [Wed May 09 02:02:01 CEST 2018] picard.vcf.MergeVcfs done. Elapsed time: 2.22 minutes.
    Runtime.totalMemory()=12311855104
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to create BasicFeatureReader using feature file , for input source: file:///home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:113)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:74)
        at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:117)
        at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:68)
        at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:176)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Caused by: java.io.FileNotFoundException: /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz.tbi (Too many open files)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at htsjdk.samtools.seekablestream.SeekableFileStream.<init>(SeekableFileStream.java:47)
        at htsjdk.samtools.seekablestream.SeekableStreamFactory$DefaultSeekableStreamFactory.getStreamFor(SeekableStreamFactory.java:99)
        at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:287)
        at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:165)
        at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:129)
        at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:80)
        at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:106)
        ... 9 more
    

    ValidateVariants

    Then, going back to the singles gvcfs ...

            java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.u_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5g -jar /share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar ValidateVariants -V 15_1132_21589_g.vcf.gz -R /home/shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -gvcf
            08:05:20.448 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
            08:05:20.799 INFO  ValidateVariants - ------------------------------------------------------------
            08:05:20.799 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.0.3.0
            08:05:20.800 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
            08:05:20.800 INFO  ValidateVariants - Executing as [email protected] on Linux v3.5.0-36-generic amd64
            08:05:20.801 INFO  ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_91-b14
            08:05:20.801 INFO  ValidateVariants - Start Date/Time: May 9, 2018 8:05:20 AM CEST
            08:05:20.801 INFO  ValidateVariants - ------------------------------------------------------------
            08:05:20.802 INFO  ValidateVariants - ------------------------------------------------------------
            08:05:20.803 INFO  ValidateVariants - HTSJDK Version: 2.14.3
            08:05:20.803 INFO  ValidateVariants - Picard Version: 2.17.2
            08:05:20.803 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
            08:05:20.803 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
            08:05:20.804 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
            08:05:20.804 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
            08:05:20.804 INFO  ValidateVariants - Deflater: IntelDeflater
            08:05:20.804 INFO  ValidateVariants - Inflater: IntelInflater
            08:05:20.804 INFO  ValidateVariants - GCS max retries/reopens: 20
            08:05:20.804 INFO  ValidateVariants - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
            08:05:20.805 INFO  ValidateVariants - Initializing engine
            08:05:22.146 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_21589_g.vcf.gz
            08:05:22.419 INFO  ValidateVariants - Done initializing engine
            08:05:22.446 WARN  ValidateVariants - GVCF format is currently incompatible with allele validation. Not validating Alleles.
            08:05:22.447 INFO  ProgressMeter - Starting traversal
            08:05:22.448 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
            08:05:22.510 INFO  ProgressMeter -             unmapped              0.0                     1           1132.1
            08:05:22.511 INFO  ProgressMeter - Traversal complete. Processed 1 total variants in 0.0 minutes.
            08:05:22.557 INFO  ValidateVariants - Shutting down engine
            [May 9, 2018 8:05:22 AM CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.04 minutes.
            Runtime.totalMemory()=2362441728
            ***********************************************************************
    
            A USER ERROR has occurred: A GVCF must cover the entire region. Found 3217344190 loci with no VariantContext covering it. The first uncovered segment is:chr1:1-248956422
    
            ***********************************************************************
            Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    

    IN the gatk-4.0.4.0 is fixed ?

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    MergeVCFs will not work on GVCFs.

  • manolismanolis Member ✭✭

    That mean that I have to use "CombineGVCFs", instead "MergeVCF" right?

    If I'm correct in th BestPractice say that:

    # HaplotypeCaller per-sample in GVCF mode
         ${gatk_path} --java-options "-Xmx${command_mem_gb}G ${java_opt}" \
              HaplotypeCaller \
              -R ${ref_fasta} \
              -I ${input_bam} \
              -L ${interval_list} \
              -O ${output_filename} \
        -contamination ${default=0 contamination} ${true="-ERC GVCF" false="" make_gvcf}
    

    then ..

    # Merge GVCFs generated per-interval for the same sample
    ${gatk_path} --java-options "-Xmx${command_mem_gb}G"  \
    MergeVcfs \
    --INPUT ${sep=' --INPUT ' input_vcfs} \
    --OUTPUT ${output_filename}
    

    What I'm doing wrong?

    I also have an error during validation of the gVCFs (ValidateVariants), after HaplotypeCaller... do you know what about this?

    Many thanks

  • manolismanolis Member ✭✭
    edited May 2018

    Hi, I fixed ValidateVariants,

    but I still having problem with MergeVcfs.

    Considering the error message "htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to create BasicFeatureReader using feature file , for input source: file:///home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz"

    I went back to re-create the specific gVCF with HC:

            Using GATK jar /share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar
            Running:
                java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_l
            evel=2 -Xmx10g -jar /share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar HaplotypeCaller -R /home/shared/resources/hgRef/hg38/Homo_sapiens_assembly38.
            fasta -I 15_1132_bqsr.bam -O /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz -L chr1:234527890-234531779 -ip 100 -contamina
            tion 0 --max-alternate-alleles 2 -ERC GVCF
            10:06:30.933 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/g
            kl/native/libgkl_compression.so
            10:06:31.137 INFO  HaplotypeCaller - ------------------------------------------------------------
            10:06:31.137 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.3.0
            10:06:31.137 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
            10:06:31.138 INFO  HaplotypeCaller - Executing as [email protected] on Linux v3.5.0-36-generic amd64
            10:06:31.138 INFO  HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_91-b14
            10:06:31.138 INFO  HaplotypeCaller - Start Date/Time: May 8, 2018 10:06:30 AM CEST
            10:06:31.138 INFO  HaplotypeCaller - ------------------------------------------------------------
            10:06:31.138 INFO  HaplotypeCaller - ------------------------------------------------------------
            10:06:31.139 INFO  HaplotypeCaller - HTSJDK Version: 2.14.3
            10:06:31.139 INFO  HaplotypeCaller - Picard Version: 2.17.2
            10:06:31.139 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
            10:06:31.139 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
            10:06:31.139 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
            10:06:31.140 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
            10:06:31.140 INFO  HaplotypeCaller - Deflater: IntelDeflater
            10:06:31.140 INFO  HaplotypeCaller - Inflater: IntelInflater
            10:06:31.140 INFO  HaplotypeCaller - GCS max retries/reopens: 20
            10:06:31.140 INFO  HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java
            /tree/dr_all_nio_fixes
            10:06:31.140 INFO  HaplotypeCaller - Initializing engine
            10:06:32.189 INFO  IntervalArgumentCollection - Processing 4090 bp from intervals
            10:06:32.209 INFO  HaplotypeCaller - Done initializing engine
            10:06:32.285 INFO  HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
            10:06:32.285 INFO  HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
            10:06:33.270 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
            10:06:33.272 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
            10:06:33.325 WARN  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
            10:06:33.326 INFO  IntelPairHmm - Available threads: 64
            10:06:33.326 INFO  IntelPairHmm - Requested threads: 4
            10:06:33.326 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
            10:06:33.461 INFO  ProgressMeter - Starting traversal
            10:06:33.462 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
            10:06:33.957 INFO  HaplotypeCaller - 6 read(s) filtered by: ((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
              6 read(s) filtered by: (((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter)
                  6 read(s) filtered by: ((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter)
                      6 read(s) filtered by: (((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter)
                          6 read(s) filtered by: ((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter)
                              6 read(s) filtered by: NotDuplicateReadFilter 
    
            10:06:33.960 INFO  ProgressMeter -       chr1:234530490              0.0                    14           1697.0
            10:06:33.960 INFO  ProgressMeter - Traversal complete. Processed 14 total regions in 0.0 minutes.
            10:06:33.978 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
            10:06:33.978 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
            10:06:33.978 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
            10:06:33.979 INFO  HaplotypeCaller - Shutting down engine
            [May 8, 2018 10:06:33 AM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.05 minutes.
            Runtime.totalMemory()=2319450112
    

    Then "zcat" the file..

            #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  15_1132
            chr1    234527790   .   A   <NON_REF>   .   .   END=234529998   GT:DP:GQ:MIN_DP:PL  0/0:0:0:0:0,0,0
            chr1    234529999   .   C   <NON_REF>   .   .   END=234530127   GT:DP:GQ:MIN_DP:PL  0/0:2:6:2:0,6,39
            chr1    234530128   .   T   <NON_REF>   .   .   END=234530703   GT:DP:GQ:MIN_DP:PL  0/0:0:0:0:0,0,0
            chr1    234530704   .   A   <NON_REF>   .   .   END=234530751   GT:DP:GQ:MIN_DP:PL  0/0:1:3:1:0,3,35
            chr1    234530752   .   G   <NON_REF>   .   .   END=234530807   GT:DP:GQ:MIN_DP:PL  0/0:2:6:2:0,6,39
            chr1    234530808   .   C   <NON_REF>   .   .   END=234530833   GT:DP:GQ:MIN_DP:PL  0/0:3:9:3:0,9,84
            chr1    234530834   .   G   <NON_REF>   .   .   END=234530882   GT:DP:GQ:MIN_DP:PL  0/0:2:6:2:0,6,70
            chr1    234530883   .   C   <NON_REF>   .   .   END=234530898   GT:DP:GQ:MIN_DP:PL  0/0:2:3:1:0,3,35
            chr1    234530899   .   G   <NON_REF>   .   .   END=234530909   GT:DP:GQ:MIN_DP:PL  0/0:2:6:2:0,6,70
            chr1    234530910   .   T   <NON_REF>   .   .   END=234530937   GT:DP:GQ:MIN_DP:PL  0/0:3:9:3:0,9,84
            chr1    234530938   .   T   <NON_REF>   .   .   END=234530975   GT:DP:GQ:MIN_DP:PL  0/0:2:6:2:0,6,49
            chr1    234530976   .   A   <NON_REF>   .   .   END=234530977   GT:DP:GQ:MIN_DP:PL  0/0:3:9:3:0,9,74
            chr1    234530978   .   C   <NON_REF>   .   .   END=234530978   GT:DP:GQ:MIN_DP:PL  0/0:1:3:1:0,3,35
            chr1    234530979   .   C   <NON_REF>   .   .   END=234530980   GT:DP:GQ:MIN_DP:PL  0/0:3:9:3:0,9,74
            chr1    234530981   .   G   <NON_REF>   .   .   END=234530981   GT:DP:GQ:MIN_DP:PL  0/0:1:3:1:0,3,35
            chr1    234530982   .   C   <NON_REF>   .   .   END=234530986   GT:DP:GQ:MIN_DP:PL  0/0:3:6:3:0,6,90
            chr1    234530987   .   A   <NON_REF>   .   .   END=234530987   GT:DP:GQ:MIN_DP:PL  0/0:1:3:1:0,3,35
            chr1    234530988   .   C   <NON_REF>   .   .   END=234530993   GT:DP:GQ:MIN_DP:PL  0/0:3:6:3:0,6,90
            chr1    234530994   .   G   <NON_REF>   .   .   END=234530994   GT:DP:GQ:MIN_DP:PL  0/0:1:3:1:0,3,35
            chr1    234530995   .   C   <NON_REF>   .   .   END=234530995   GT:DP:GQ:MIN_DP:PL  0/0:3:6:3:0,6,90
            chr1    234530996   .   A   <NON_REF>   .   .   END=234530996   GT:DP:GQ:MIN_DP:PL  0/0:1:3:1:0,3,35
            chr1    234530997   .   C   <NON_REF>   .   .   END=234530998   GT:DP:GQ:MIN_DP:PL  0/0:3:6:3:0,6,90
            chr1    234530999   .   G   <NON_REF>   .   .   END=234530999   GT:DP:GQ:MIN_DP:PL  0/0:1:3:1:0,3,35
            chr1    234531000   .   C   <NON_REF>   .   .   END=234531013   GT:DP:GQ:MIN_DP:PL  0/0:2:6:2:0,6,70
            chr1    234531014   .   A   <NON_REF>   .   .   END=234531014   GT:DP:GQ:MIN_DP:PL  0/0:2:0:2:0,0,0
            chr1    234531015   .   C   <NON_REF>   .   .   END=234531040   GT:DP:GQ:MIN_DP:PL  0/0:2:6:2:0,6,59
            chr1    234531041   .   T   <NON_REF>   .   .   END=234531106   GT:DP:GQ:MIN_DP:PL  0/0:1:3:1:0,3,35
            chr1    234531107   .   T   <NON_REF>   .   .   END=234531879   GT:DP:GQ:MIN_DP:PL  0/0:0:0:0:0,0,0
    

    Then validate the gVCF file with ValidateVariants

            Using GATK jar /share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar
            Running:
                java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5g -jar /share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar ValidateVariants -V 15_1132_02042_g.vcf.gz -R /home/shared/resources/hgRef/hg38/Homo_sapiens_assembly38.fasta -L chr1:234527890-234531779 -gvcf -Xtype ALLELES -D /home/shared/resources/gatk4hg38db/Homo_sapiens_assembly38.dbsnp138.vcf
            23:02:03.381 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/bio/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
            23:02:03.580 INFO  ValidateVariants - ------------------------------------------------------------
            23:02:03.581 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.0.3.0
            23:02:03.581 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
            23:02:03.581 INFO  ValidateVariants - Executing as [email protected] on Linux v3.5.0-36-generic amd64
            23:02:03.581 INFO  ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_91-b14
            23:02:03.582 INFO  ValidateVariants - Start Date/Time: May 9, 2018 11:02:03 PM CEST
            23:02:03.582 INFO  ValidateVariants - ------------------------------------------------------------
            23:02:03.582 INFO  ValidateVariants - ------------------------------------------------------------
            23:02:03.583 INFO  ValidateVariants - HTSJDK Version: 2.14.3
            23:02:03.583 INFO  ValidateVariants - Picard Version: 2.17.2
            23:02:03.584 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
            23:02:03.584 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
            23:02:03.584 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
            23:02:03.584 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
            23:02:03.584 INFO  ValidateVariants - Deflater: IntelDeflater
            23:02:03.585 INFO  ValidateVariants - Inflater: IntelInflater
            23:02:03.585 INFO  ValidateVariants - GCS max retries/reopens: 20
            23:02:03.585 INFO  ValidateVariants - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
            23:02:03.585 INFO  ValidateVariants - Initializing engine
            23:02:04.387 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/shared/resources/gatk4hg38db/Homo_sapiens_assembly38.dbsnp138.vcf
            23:02:04.694 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz
            23:02:04.918 INFO  IntervalArgumentCollection - Processing 3890 bp from intervals
            23:02:04.993 INFO  ValidateVariants - Done initializing engine
            23:02:05.011 INFO  ProgressMeter - Starting traversal
            23:02:05.012 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
            23:02:05.189 INFO  ProgressMeter -             unmapped              0.0                    28           9600.0
            23:02:05.189 INFO  ProgressMeter - Traversal complete. Processed 28 total variants in 0.0 minutes.
            23:02:05.203 INFO  ValidateVariants - Shutting down engine
            [May 9, 2018 11:02:05 PM CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.03 minutes.
            Runtime.totalMemory()=2324168704
    

    At the end I merged all files (intevals) of the same sample, including the 5_1132_02042_g.vcf.gz file... and I still have the following error...

            To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
            htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to create BasicFeatureReader using feature file , for input source: file:///home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz
                    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:113)
                    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:74)
                    at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:117)
                    at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:68)
                    at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:176)
                    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
                    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
                    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
                    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
                    at org.broadinstitute.hellbender.Main.main(Main.java:289)
            Caused by: java.io.FileNotFoundException: /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz.tbi (Too many open files)
                    at java.io.RandomAccessFile.open0(Native Method)
                    at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
                    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
                    at htsjdk.samtools.seekablestream.SeekableFileStream.<init>(SeekableFileStream.java:47)
                    at htsjdk.samtools.seekablestream.SeekableStreamFactory$DefaultSeekableStreamFactory.getStreamFor(SeekableStreamFactory.java:99)
                    at htsjdk.tribble.readers.TabixReader.readIndex(TabixReader.java:287)
                    at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:165)
                    at htsjdk.tribble.readers.TabixReader.<init>(TabixReader.java:129)
                    at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:80)
                    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:106)
                    ... 9 more
    
    Post edited by manolis on
  • manolismanolis Member ✭✭
    edited May 2018

    What is the meaning of

    "Caused by: java.io.FileNotFoundException: /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz.tbi (Too many open files)"

  • manolismanolis Member ✭✭
    edited May 2018

    If I remove from my list the 15_1132_02042_g.vcf.gz file, then I have the same problem with the file 15_1132_02043_g.vcf.gz (next interval) _

    Combining the first 02041 intervals, everything works!

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    You need to increase ulimit to unlimited. That variable limits the number of open files by the applications.

  • manolismanolis Member ✭✭

    Thank @SkyWarrior ,

    we increased it from 1024 to 32768 but still doesn't work! Our administrator doesn't want to set up it to "unlimited".

    A question:
    for example, if I have 1000 gvcf files to merge, can I merge 100 files per time, creating in this way 10 "small-merged" files... and at the end to merge all 10 "small-merged" files in one final file?
    Is it correct?

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Use docker image if you can.

  • manolismanolis Member ✭✭

    I don't have any idea about docker image _ ... I sent an e-mail to our server administrator about this...

    For now I will try the example reported by me above...

  • manolismanolis Member ✭✭
    edited May 2018

    Hi, I fixed the part of MergeVCF (as in my example above). After the merge of all files I run ValidateVariants without the option -L and it doesn't work.

        ...
        Runtime.totalMemory()=3181903872
        ***********************************************************************
    
        A USER ERROR has occurred: A GVCF must cover the entire region. Found 1901176245 loci with no VariantContext covering it. The first uncovered segment is:chr1:1-11772
    
        ***********************************************************************
        org.broadinstitute.hellbender.exceptions.UserException: A GVCF must cover the entire region. Found 1901176245 loci with no VariantContext covering it. The first uncovered segment is:chr1:1-11772
            at org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants.onTraversalSuccess(ValidateVariants.java:269)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:894)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)
    

    It means that I have to validate one single file/inteval per time, before merge them?

    Thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @manolis
    Hi,

    No, something went wrong while merging. I am a little confused. Can you tell me what you had before you ran MergeVcfs? Do you have GVCFs from HaplotypeCaller that you are trying to combine before inputting to GenotypeGVCFs? Can you try GenomicsDBImport or CombineGVCFs?

    -Sheila

  • manolismanolis Member ✭✭
    edited May 2018

    @Sheila

    Hi,

    I ran HaplotypeCaller in one sample using 21589 intervals. In this way I generated 21589 gVCF.
    Then I tried to merge all files to one file (MergeVCF) ...

    In this merging step I had this errror

    tsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to create BasicFeatureReader using feature file , for input source: file:///home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz
    ....
    ....
    "Caused by: java.io.FileNotFoundException: /home/manolis/GATK4/IlluminaExomePairEnd/5.gVCF/processing/15_1132_02042_g.vcf.gz.tbi (Too many open files)"
    

    Considering this type of error we increased "ulimit" from 1024 to 32768, but still doesn't work!

    Then I tried to merge 1000 file per time... in this way I created 22 merged files. At the end I merged all 22 files in one (all_gVCFs). In this one file I ran ValidateVariants...

    Runtime.totalMemory()=3181903872
        ***********************************************************************
    
        A USER ERROR has occurred: A GVCF must cover the entire region. Found 1901176245 loci with no VariantContext covering it. The first uncovered segment is:chr1:1-11772
    
        ***********************************************************************
        org.broadinstitute.hellbender.exceptions.UserException: A GVCF must cover the entire region. Found 1901176245 loci with no VariantContext covering it. The first uncovered segment is:chr1:1-11772
    

    The chr1:1-11772 region is an off-target interval, excluded from my interval list!

    In this last way (multi-merge steps) MergeVCF works without any error, but ValidateVariants (without the option -L) not.
    If I use the option -L in ValidateVariants with one of the 21589 intervals, using as input file the "all_gVCFs" file, everything is ok.

    Tomorrow I will run GenomicsDBImport with some samples and I will let you know.
    Many thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @manolis
    Hi,

    I see.

    If I use the option -L in ValidateVariants with one of the 21589 intervals, using as input file the "all_gVCFs" file, everything is ok.

    In this case, it sounds like everything is okay. Just make sure ValidateVariants passes with all input intervals.

    It sounds like you have one GVCF that you can input straight to GenotypeGVCFs now?

    -Sheila

  • manolismanolis Member ✭✭
    edited May 2018

    Hi Sheila,

    I will have 78 gVCF (WES)... I was set up the pipeline because I changed the interval list (in the BQSR and HC steps: before few intervals = big contigs; now many contings = almost close to the real targeted regions), and also I was adding several check points.

    I don't have any error or WARN when I run GenomicsDBImport with the above gVCF file, just at the end "Tool returned: true"

    What is the meaning of "Tool returned" output in the BQSR step? "Tool returned: 243"

    2:10:13.343 INFO  ProgressMeter - Starting traversal
    02:10:13.344 INFO  ProgressMeter -        Current Locus  Elapsed Minutes       Reads Processed     Reads/Minute
    02:10:14.025 INFO  BaseRecalibrator - 1890 read(s) filtered by: ((((((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND WellformedReadFilter)
      1890 read(s) filtered by: (((((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter)
          1890 read(s) filtered by: ((((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter)
              1864 read(s) filtered by: (((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter)
                  1864 read(s) filtered by: ((MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter)
                      1864 read(s) filtered by: (MappingQualityNotZeroReadFilter AND MappingQualityAvailableReadFilter)
                          1864 read(s) filtered by: MappingQualityNotZeroReadFilter 
              26 read(s) filtered by: NotDuplicateReadFilter 
    
    02:10:14.028 INFO  ProgressMeter -             unmapped              0.0                   243          21378.3
    02:10:14.029 INFO  ProgressMeter - Traversal complete. Processed 243 total reads in 0.0 minutes.
    02:10:14.136 INFO  BaseRecalibrator - Calculating quantized quality scores...
    02:10:14.200 INFO  BaseRecalibrator - Writing recalibration report...
    02:10:15.398 INFO  BaseRecalibrator - ...done!
    02:10:15.398 INFO  BaseRecalibrator - Shutting down engine
    [May 15, 2018 2:10:15 AM CEST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.10 minutes.
    Runtime.totalMemory()=2323120128
    Tool returned:
    243
    

    Thank you very much!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @manolis
    Hi,

    I was set up the pipeline because I changed the interval list (in the BQSR and HC steps: before few intervals = big contigs; now many contings = almost close to the real targeted regions)

    It looks like 243 error is returned from invalid intervals. Have you tried using the same interval list throughout your entire pipeline?

    Thanks,
    Sheila

  • manolismanolis Member ✭✭
    edited May 2018

    Changing the interval, the "Tool returned:" number change ... In the BQSR and HC step I use the same interval list (21.589 intervals). After BQSR I'm going to validate the bam file and everything is ok. Next, bam file is used as input file in the HC and I don't have any error.

    In a normal situation, what number must be the "Tool returned:" in the BQSR step? Do I miss many reads during this BQSR step or the recalibration is bad?

    Do you have any link to read about "Tool returned:" values? If I'm correct: =0 is ok, =1 error, =true (in HC) is ok... ???

    I will try to run BQSR using chr as interval and then an another list of 404 intervals... to check the "Tool returned:" value. I will let you know

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @manolis
    Hi,

    We don't really have any documentation on what the tool should return, but a return of 0 is good. Anything other than 0 means something went wrong.

    Can you run ValidateVariants on your input GVCFs?

    I will try to run BQSR using chr as interval and then an another list of 404 intervals... to check the "Tool returned:" value. I will let you know

    I am not sure I understand what you mean? But, I will wait for you to report back :smile:

    -Sheila

  • manolismanolis Member ✭✭

    "Tool returned:" value change in each interval... maybe is the number of variants/reads ?

    About gVCF validation, I will report the errors/warnings in this thread ... just to avoid duplicates.

    Thanks

  • YangyxtYangyxt Member

    Dear @Sheila ,

    I have recently encountered a combineGVCF error. I kind of have a combined g.vcf file consisting of all the gvcfs from old samples.

    Everytime I process a new batch of samples, I combine these gvcfs to the whole collection gvcf file. Using commands shown as below:
    time "$gatk" CombineGVCFs \ -R "$ref_gen"/ucsc.hg19.fasta \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/9879.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/9879.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18002617LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18002617LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18002618LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18002618LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18002619LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18002619LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18025096LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18025096LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18025097LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18025097LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18025098LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/P18025098LU01.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/backup_gvcfs/all_exome_samples.g.vcf \ -O /paedwy/disk1/yangyxt/wes/backup_gvcfs/all_exome_samples_plus_${sample_batch}.g.vcf && echo "Combine_gvcfs done"

    And I have encountered an error of illegalArgumentException: Feature inputs must be unique.
    Here I show the erro log:
    bwa/0.7.17 is loaded java/8.0_161 is loaded samtools/1.8 is loaded Using GATK jar /home/yangyxt/software/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -D 15:28:57.619 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/yangyxt/software/gatk-4.1.0.0/gatk-package-4.1 15:28:59.373 INFO CombineGVCFs - ------------------------------------------------------------ 15:28:59.373 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.1.0.0 15:28:59.373 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/ 15:28:59.374 INFO CombineGVCFs - Executing as [email protected] on Linux v3.10.0-957.10.1.el7.x86_64 amd64 15:28:59.374 INFO CombineGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_161-b12 15:28:59.374 INFO CombineGVCFs - Start Date/Time: August 2, 2019 3:28:57 PM HKT 15:28:59.374 INFO CombineGVCFs - ------------------------------------------------------------ 15:28:59.374 INFO CombineGVCFs - ------------------------------------------------------------ 15:28:59.375 INFO CombineGVCFs - HTSJDK Version: 2.18.2 15:28:59.375 INFO CombineGVCFs - Picard Version: 2.18.25 15:28:59.375 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2 15:28:59.376 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 15:28:59.376 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 15:28:59.376 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 15:28:59.376 INFO CombineGVCFs - Deflater: IntelDeflater 15:28:59.376 INFO CombineGVCFs - Inflater: IntelInflater 15:28:59.376 INFO CombineGVCFs - GCS max retries/reopens: 20 15:28:59.376 INFO CombineGVCFs - Requester pays: disabled 15:28:59.376 INFO CombineGVCFs - Initializing engine 15:28:59.863 INFO FeatureManager - Using codec VCFCodec to read file file:///paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/9879.H 15:29:00.047 INFO CombineGVCFs - Shutting down engine [August 2, 2019 3:29:00 PM HKT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 0.04 minutes. Runtime.totalMemory()=1881669632 Runtime.totalMemory()=1881669632 java.lang.IllegalArgumentException: Feature inputs must be unique: /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/9879.HC.g.vcf.gz at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$initializeDrivingVariants$0(MultiVariantWalker.java:67) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at org.broadinstitute.hellbender.engine.MultiVariantWalker.initializeDrivingVariants(MultiVariantWalker.java:63) at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:57) at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:638) at org.broadinstitute.hellbender.engine.MultiVariantWalker.onStartup(MultiVariantWalker.java:46) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205) at org.broadinstitute.hellbender.Main.main(Main.java:291)

    Much appreciated if you can give me a hint where went wrong?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited August 2

    Hi @Yangyxt

    Is there a reason each input gvcf is specified twice?
    For example: /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/9879.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/9879.HC.g.vcf.gz \ -V ?

    Looks like that might be why you see this error.

  • YangyxtYangyxt Member

    @bhanuGandham said:
    Hi @Yangyxt

    Is there a reason each input gvcf is specified twice?
    For example: /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/9879.HC.g.vcf.gz \ -V /paedwy/disk1/yangyxt/wes/samples_from_sh_Brian/gvcfs/9879.HC.g.vcf.gz \ -V ?

    Looks like that might be why you see this error.

    Ahhh, thanks for the heads up. I haven't read through the scripts, I used awk to generate scripts for execution.

Sign In or Register to comment.