To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GenotypeGVCFs error: does not inflate expected amount

jenmodjenmod durhamMember
edited June 2016 in Ask the GATK team

Hi,

I am running into a very confusing problem. I am running GenotypeGVCFs on ~30 gzipped and .tbi indexed vcf files. The program starts running ok, but then stops pretty quickly into the run with the following error message.

A few other notes:

I used Picard to create a sequence dictionary, so I am not sure why it is complaining about the sequence dictionary.

I originally was using vcf files that were gzipped and indexed by Haplotype Caller. That gave me an error message about an invalid gzip header. I then indexed the vcf files using tabix and this is my new error -- why is it mentioning a bam/cram error?

Please help!

Thanks!


INFO 14:28:48,994 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:28:48,998 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
INFO 14:28:48,998 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 14:28:48,998 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 14:28:49,003 HelpFormatter - Program Args: -T GenotypeGVCFs -R /dscrhome/jlm50/genome/Mguttatus_256_v2.0.hardmasked.fa --variant /dscrhome/jlm50/fss_2016/vcf_list.list --out ALL.Mim.GATK.variants.RAW.1.vcf --heterozygosity 0.025 -L scaffold_1 --standard_min_confidence_threshold_for_emitting 10 --includeNonVariantSites
INFO 14:28:49,024 HelpFormatter - Executing as jlm50@dscr-econ-20 on Linux 2.6.32-573.18.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_95-mockbuild_2016_01_18_22_31-b00.
INFO 14:28:49,025 HelpFormatter - Date/Time: 2016/06/07 14:28:48
INFO 14:28:49,025 HelpFormatter - --------------------------------------------------------------------------------
////
WARN 14:28:52,165 IndexDictionaryUtils - Track variant30 doesn't have a sequence dictionary built in, skipping dictionary validation
INFO 14:28:52,307 GenomeAnalysisEngine - Preparing for traversal
INFO 14:28:52,308 GenomeAnalysisEngine - Done preparing for traversal
INFO 14:28:52,309 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 14:28:52,309 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 14:28:52,310 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
WARN 14:28:52,758 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to ca
lculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN 14:28:52,759 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO 14:28:52,760 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
WARN 14:28:52,963 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs
WARN 14:29:12,807 ExactAFCalculator - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at scaffold_1:23977 has 11 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument. This warning message is output just once per run and further warnings will be suppressed unless the DEBUG logging level is used.
INFO 14:29:22,356 ProgressMeter - scaffold_1:38401 0.0 30.0 s 49.6 w 0.3% 3.0 h 3.0 h
////
INFO 14:37:48,625 ProgressMeter - scaffold_1:98501 0.0 8.9 m 886.8 w 0.7% 20.6 h 20.5 h
INFO 14:38:48,697 ProgressMeter - scaffold_1:130301 0.0 9.9 m 986.1 w 1.0% 17.3 h 17.2 h
INFO 14:38:58,990 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A BAM/CRAM ERROR has occurred (version 3.5-0-g36282e4):
ERROR
ERROR This means that there is something wrong with the BAM/CRAM file(s) you provided.
ERROR The error message below tells you what is the problem.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum until you have followed these instructions:
ERROR - Make sure that your BAM file is well-formed by running Picard's validator on it
ERROR (see http://picard.sourceforge.net/command-line-overview.shtml#ValidateSamFile for details)
ERROR - Ensure that your BAM index is not corrupted: delete the current one and regenerate it with 'samtools index'
ERROR - Ensure that your CRAM index is not corrupted: delete the current one and regenerate it with
ERROR 'java -jar cramtools-3.0.jar index --bam-style-index --input-file --reference-fasta-file '
ERROR (see https://github.com/enasequence/cramtools/tree/v3.0 for details)
ERROR
ERROR MESSAGE: Did not inflate expected amount
ERROR ------------------------------------------------------------------------------------------

(END)

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @jenmod
    Hi,

    Can you tell us the exact command you ran to produce the input GVCFs? Also, can you try running ValidateVariants on your input GVCFs?
    I'm a little confused why you are getting a CRAM file error, because you should not be inputting any CRAM files into GenotypeGVCFs.

    Thanks,
    Sheila

  • jenmodjenmod durhamMember

    Hi Sheila,

    I used the command below for diploids and changed the ploidy to 4 for some tetraploid samples. The GenotypeGVCFs command was run on all samples (diploids and tetraploids).


    java -Xmx24g -jar ~/GenomeAnalysisTK.jar -T HaplotypeCaller -R ~/genome/Mguttatus_256_v2.0.hardmasked.fasta -I $gnome.sort.FM.MD.RG.indel.bam -o GATK_HC.SNPs.$gnome.raw.mmq10.1.vcf.gz -L scaffold_1 -gt_mode DISCOVERY --heterozygosity 0.025 -writeFullFormat -ploidy 2 -out_mode EMIT_ALL_SITES --emitRefConfidence BP_RESOLUTION -mmq 10 -stand_emit_conf 10

    Thanks!
    Jen

  • jenmodjenmod durhamMember

    Also...I could try the validate variants tool, but I would have to regenerate the vcf files -- since posting this question I saw that I was using the now outdated GATK version 3.5, and so I am now in the process of trying to re-run the HaplotypeCaller step.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @jenmod
    Hi,

    A couple of things:

    1) You are running on OpenJDK. We only support OracleJDK.
    2) -out_mode EMIT_ALL_SITES and --emitRefConfidence BP_RESOLUTION are not compatible. You should only use --emitRefConfidence BP_RESOLUTION as -out_mode EMIT_ALL_SITES does not work properly.
    3) If I recall correctly, there were some issues with outputting a .gz file. Can you try outputting a .g.vcf file instead?

    Thanks,
    Sheila

  • jenmodjenmod durhamMember

    Hi Sheila,

    So I re-ran the HaplotypeCaller before you mentioned the out_mode/emitRefConfidence conflict -- I also accidentally forgot to not use .gz files...

    But the good news is that upgrading to 3.6 seems to have solved the problem entirely...! :smile:

    This was the command given for GenotypeGVCFs


    INFO 16:05:00,826 HelpFormatter - [Mon Jun 13 16:05:00 EDT 2016] Executing on Linux 2.6.32-573.22.1.el6.x86_64 amd64
    INFO 16:05:00,828 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 JdkDeflater
    INFO 16:05:00,832 HelpFormatter - Program Args: -T GenotypeGVCFs -R /dscrhome/jlm50/genome/Mguttatus_256_v2.0.hardmasked.fa --variant /dscrhome/jlm50/fss_2016/vcf_list.list --out ALL.Mim.GATK.variants.RAW.1.vcf --heterozygosity 0.025 -L scaffold_1 --standard_min_confidence_threshold_for_emitting 10 --includeNonVariantSites
    INFO 16:05:00,874 HelpFormatter - Executing as jlm50@dscr-core-23 on Linux 2.6.32-573.22.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Wonderful news!

Sign In or Register to comment.