CombineGVCFs java.lang.IllegalArgumentException: Unexpected base in allele bases '*AACC'

Hi,

Hoping to work around the limitations of GenomicsDBImport I've used CombineGVCFs to combine my data into batches of 200 and then combined them again into a master GVCFs for genotyping. Unfortunately I seem to have run into an exception when attempting to combine my 200 sample batch GVCFs prior to genotyping.

Using GATK jar /lustre/scratch115/realdata/mdt2/projects/gdap-wgs/gvcf-4.0/scripts/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar
Running:
    /software/jre1.8.0_74/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Djava.io.tmpdir=/lustre/scratch115/projects/gdap-wgs/gvcf-4.0/tmp -XX:-UsePerfData -Xrs -Xmx3200m -jar /lustre/scratch115/realdata/mdt2/projects/gdap-wgs/gvcf-4.0/scripts/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar CombineGVCFs -R /lustre/scratch115/resources/ref/Homo_sapiens/HS38DH/hs38DH.fa -V /tmp/tmp.GXoF3ghQt3.list -O output/1.g.vcf.gz -L /lustre/scratch115/resources/ref/Homo_sapiens/HS38DH/intervals/arvados/wgs_calling_regions.hg38.interval_list.1_of_200.interval_list
12:08:21.902 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustre/scratch115/realdata/mdt2/projects/gdap-wgs/gvcf-4.0/scripts/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:08:22.587 INFO  CombineGVCFs - ------------------------------------------------------------
12:08:22.587 INFO  CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.0.1.2
12:08:22.587 INFO  CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
12:08:22.588 INFO  CombineGVCFs - Executing as [email protected] on Linux v3.2.0-105-generic amd64
12:08:22.588 INFO  CombineGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_74-b02
12:08:22.588 INFO  CombineGVCFs - Start Date/Time: 23 February 2018 12:08:21 GMT
12:08:22.589 INFO  CombineGVCFs - ------------------------------------------------------------
12:08:22.589 INFO  CombineGVCFs - ------------------------------------------------------------
12:08:22.589 INFO  CombineGVCFs - HTSJDK Version: 2.14.1
12:08:22.590 INFO  CombineGVCFs - Picard Version: 2.17.2
12:08:22.590 INFO  CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 1
12:08:22.590 INFO  CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:08:22.590 INFO  CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:08:22.590 INFO  CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:08:22.590 INFO  CombineGVCFs - Deflater: IntelDeflater
12:08:22.591 INFO  CombineGVCFs - Inflater: IntelInflater
12:08:22.597 INFO  CombineGVCFs - GCS max retries/reopens: 20
12:08:22.597 INFO  CombineGVCFs - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
12:08:22.597 INFO  CombineGVCFs - Initializing engine
12:08:25.101 INFO  FeatureManager - Using codec VCFCodec to read file file:///lustre/scratch115/projects/gdap-wgs/gvcf-4.0/gvcf-pcr_combined/1_1.g.vcf.gz
12:08:25.520 INFO  FeatureManager - Using codec VCFCodec to read file file:///lustre/scratch115/projects/gdap-wgs/gvcf-4.0/gvcf-pcrfree_combined/1_1.g.vcf.gz
12:08:26.136 INFO  FeatureManager - Using codec VCFCodec to read file file:///lustre/scratch115/projects/gdap-wgs/gvcf-4.0/gvcf-pcrfree_combined/1_2.g.vcf.gz
12:08:26.463 INFO  FeatureManager - Using codec VCFCodec to read file file:///lustre/scratch115/projects/gdap-wgs/gvcf-4.0/gvcf-pcrfree_combined/1_3.g.vcf.gz
12:09:15.365 INFO  IntervalArgumentCollection - Processing 14112327 bp from intervals
12:09:15.534 INFO  CombineGVCFs - Done initializing engine
12:09:17.050 INFO  ProgressMeter - Starting traversal
12:09:17.051 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
12:09:22.280 INFO  CombineGVCFs - Shutting down engine
[23 February 2018 12:09:22 GMT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 1.01 minutes.
Runtime.totalMemory()=2233991168
java.lang.IllegalArgumentException: Unexpected base in allele bases '*AACC'
    at htsjdk.variant.variantcontext.Allele.<init>(Allele.java:165)
    at htsjdk.variant.variantcontext.Allele.create(Allele.java:239)
    at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.extendAllele(ReferenceConfidenceVariantContextMerger.java:406)
    at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.remapAlleles(ReferenceConfidenceVariantContextMerger.java:178)
    at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:70)
    at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.endPreviousStates(CombineGVCFs.java:340)
    at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.createIntermediateVariants(CombineGVCFs.java:189)
    at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.apply(CombineGVCFs.java:134)
    at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:73)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
    at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:118)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
    at org.broadinstitute.hellbender.Main.main(Main.java:277)

Issue · Github
by Sheila

Issue Number
4525
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
cmnbroad

Best Answers

Answers

  • TechnicalVaultTechnicalVault Cambridge, UKMember

    Any suggestions?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @TechnicalVault
    Hi,

    Sorry for the delay. Just to confirm, the input GVCFs all validate with ValidateVariants? Also, this thread and this thread may have some helpful tips, although they are older. I may need you to submit bug report if nothing helps.

    Thanks,
    Sheila

  • cwischcwisch Member
    edited March 2

    I'm having the same issue with CombineGVCFs, both of my test files pass a run of ValidateVariants. Although one does have this warning:

     WARN  ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location PA1.0_Chr1:5584 the annotation MLEAC=[2, 0] was not a numerical value and was ignored
    

    Also, if I run CombineGVCFs on an individual file it does not crash immediately, although I have to stop it as the files are quite large.

    I'm using The Genome Analysis Toolkit (GATK) v4.0.1.2

  • TechnicalVaultTechnicalVault Cambridge, UKMember

    They do indeed validate but that's unsurprising since they all came from GATK 4. Is there an option I can feed GATK to get it to dump the lines it's choking on?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @cwisch @TechnicalVault
    Hi,

    @cwisch I assume you created the GVCF that errors with HaplotypeCaller and did no processing on it before feeding it into CombineGVCFs? @TechnicalVault I figured that would be the case, but I had to make sure :smile:

    It would be great if one or both of you can submit a bug report. Instructions are here.

    Thanks,
    Sheila

  • cwischcwisch Member

    Hi @Sheila ,

    Correct, I haven't modified the outputs of HaplotypeCaller. Now I was able to circumvent the problem by using GenomicsImportDB, which appears to be the way I was supposed to run it in the first place.

    I'll try to give the most complete bug report I can. I'm not able to share the FASTA.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @cwisch
    Hi,

    Glad to hear GenomicsDBImport works for you.

    For CombineGVCFs, I do need the reference to properly test this.

    I hope @TechnicalVault can submit a bug report so we can get to the bottom of this.

    -Sheila

  • TechnicalVaultTechnicalVault Cambridge, UKMember
    edited March 6

    Hi @Sheila,
    I've just been trying to generate a minimal testcase, and actually found that of my 4 input files different combinations produce different alleles with the same errors (all INDELS with *'s somehow ending up combined into the sequence). I am wondering whether GATK 4's standard test suite actually includes any examples where you combine two sub-batch gVCFs? If it's the fundamental issue I think it is (CombineGVCFs not being designed to combine gVCFS with more than one sample) I might see if I can create a testcase using Illumina's Polaris sequencing of 1000G samples.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @TechnicalVault
    Hi,

    I am not sure about the test suite (I am assuming they should test the combining of two sub-batch GVCFs), but if you could submit the 4 input files you are using, I think that would be great.

    Thanks,
    Sheila

  • TechnicalVaultTechnicalVault Cambridge, UKMember

    Testcase uploaded as: sanger_exception_bugreport.tar.bz2

    Issue · Github
    by Sheila

    Issue Number
    2990
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @TechnicalVault
    Hi again,

    This should be fixed soon (within a week or so). Your other bug may take a bit longer but the team is aware of it, and it should get fixed as a side effect of some new code :smiley:

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    Accepted Answer

    Hi everyone,

    There were a few others interested in this, but this is now fixed :smile:

    -Sheila

  • alcam1alcam1 Member
    edited April 30

    Hi @Sheila, what was the fix for this? I am getting the error "WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location chr1:762109 the annotation MLEAC=[1, 0] was not a numerical value and was ignored" when running combineGVCFs with GATK-4.0.2.1 (variants called using haplotype caller)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @alcam1
    Hi,

    This was a different error message that was thrown when combining spanning deletions. Your WARN statement looks like a different issue.

    In your case does the tool run to completion, or does it throw an error that causes it to stop? Can you post position chr1:762109? Also, can you try with the very latest version?

    Thanks,
    Sheila

  • wkyanwkyan Member
    edited September 17

    @Sheila
    when i use GenotypeGVCFs on a single single-sample GVCF

    gatk GenotypeGVCFs \
    -R genome.fa \
    -V R97.sorted.markdup.realign.BQSR.g.vcf \
    -O R97.sorted.markdup.realign.BQSR.vcf

    I am getting the WARN "10:04:24.236 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location chr00:60 the annotation MLEAC=[1, 0] was not a numerical value and was ignored"

    in my case the tool run to completion, it's just WARN that doesn't causes it to stop

    chr00 60 . C T, 157.77 . BaseQRankSum=1.675;DP=13;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.823;RAW_MQ=-1.823;RAW_MQ=25654.00;ReadPosRankSum=1.468 GT:AD:DP:GQ:PGT:PID:PL:SB 0/1:8,5,0:13:99:0|1:60_C_T:186,0,593,210,608,818:5,3,3,2

    GATK version is gatk-4.0.8.1

Sign In or Register to comment.