# Error with the HaplotypeCaller when genotyping given alleles

edited March 2013

Hi,

I have a number of indels that were called using different methods (GATK UG and PINDEL mainly). I am now attempting to genotype them using the GATK HaplotypeCaller and am running into the following error for some variants, e.g.:

1 768116 . AGTTTTGTTTTGTTTTGTTTT AGTTTTGTTTTGTTTTGTTTTGTTTT,A

The command line I use is:

java -Xmx2g -jar ~/tools/GenomeAnalysisTK-2.4-7-g5e89f01/GenomeAnalysisTK.jar  \
-T HaplotypeCaller \
--genotyping_mode GENOTYPE_GIVEN_ALLELES \
-R /target/gpfs2/gcc/resources/hg19/indices/human_g1k_v37.fa \
-minPruning 5 \
-I /target/gpfs2/gcc/groups/gonl/projects/trio-analysis/intermediate/BQSR2/A102.human_g1k_v37.trio_realigned.recal.bam \
-L 1:1-10000000 \
-L ~/jobs/trio-analysis/ug/comp/hc/test.mu.vcf \
-o ~/jobs/trio-analysis/ug/comp/hc/test.out.vcf \
-isr INTERSECTION \
-alleles ~/results/trio-analysis/ug/gonl.merged.ug_pindel_asm_clever.left.biallelic.20bp.N2.vcf


Any help on how to overcome this would be much appreciated!

Below is the stack trace:

##### ERROR stack trace

java.lang.IllegalStateException: BUG: GenomeLoc 1:768116-768116 has a size == 1 but the variation reference allele has length 21 this = [VC UG_call @ 1:768116 Q136.86 of type=INDEL alleles=[AGTTTTGTTTTGTTTTGTTTT*, A, AGTTTTGTTTTGTTTTGTTTTGTTTT] attr={} GT=[[A102a AGTTTTGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTTGTTTT GQ 6 PL 116,6,0,137,21,881],[A102b AGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTT GQ 0 PL 0,0,0,3,3,165],[A102c AGTTTTGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTTGTTTT GQ 3 PL 45,3,0,60,15,885]]

##### ERROR ------------------------------------------------------------------------------------------

Edit: Excluding all multi-allelic sites, it seems like I am not getting this type of errors any more. However I am now getting another error on a bi-allelic site. I haven't checked which site it is as the error doesn't specify it but could look into it if you think it is valuable.

Cheers,
Laurent

##### ERROR ------------------------------------------------------------------------------------------
Hi Laurent,

Could you upload a bam snippet so we can reproduce the error locally? Instructions here:

Geraldine Van der Auwera, PhD

Hi Geraldine,

2013_03_08.BUG_GenomeLoc_HC.tar.gz

They correspond to the 2 errors posted in this thread. Note that I managed to reproduce the first one on a bi-allelic site after all, so I thought I'd post it too.

The first file uses 1 bam file containing 1 trio, the second one 1 bam file containing 248 families (trios,quartets) as I could not isolate the problematic one so far but could attempt to if necessary.

Another thing I should have mentioned earlier is the pipeline these bam files were generated with as it is fairly unusual:

1. BWA aln [per lane]
2. BWA sampe [per lane]
3. Picard MarkDuplicates [per lane]
4. GATK Indel realignment using known indels using 1KG pilot [per sample]
5. GATK BQSR v1 [per sample]
6. GATK Indel realignment using known indels from 1KG Phase1 + from the reads [per family]
7. GATK BQSR v2 [per family]

Maybe the pipeline above is to blame for the errors I am encountering.

Thanks a lot!
Laurent

Hi Laurent, thanks for the files. Others have encountered the "Bad likelihoods" issue so it's probably not related to your pipeline. Not sure about the other one -- we'll take a look. FYI we're focusing on the "bad likelihoods" bug first since it seems to affect more people. I'll let you know when we have a fix.

Geraldine Van der Auwera, PhD

Sounds great, thanks a lot!

Hi Laurent,

FYI we have a theoretical fix for the "Bad likelihoods detected error" and are now working on implementing it.

Geraldine Van der Auwera, PhD

Great! Thanks for the update

Regarding the BUG: GenomeLoc issue, I found out the problem: I had the "END" annotation with the wrong coordinates in my VCF for some positions. I am not sure where it happened in the pipeline but simply removing the annotation got rid of the problem. Maybe this will also be useful for others...

Cheers,
Laurent

Can you elaborate a bit more? You are saying that your end position number was incorrect?
You had to manually change the position?

I am running into a similar error when trying to merge GATK compatible (supposidly) vcf files. Thank you.