The current GATK version is 3.6-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# Error with the HaplotypeCaller when genotyping given alleles

Member, Collaborator Posts: 43 ✭✭
edited March 2013

Hi,

I have a number of indels that were called using different methods (GATK UG and PINDEL mainly). I am now attempting to genotype them using the GATK HaplotypeCaller and am running into the following error for some variants, e.g.:

1 768116 . AGTTTTGTTTTGTTTTGTTTT AGTTTTGTTTTGTTTTGTTTTGTTTT,A

The command line I use is:

java -Xmx2g -jar ~/tools/GenomeAnalysisTK-2.4-7-g5e89f01/GenomeAnalysisTK.jar  \
-T HaplotypeCaller \
--genotyping_mode GENOTYPE_GIVEN_ALLELES \
-R /target/gpfs2/gcc/resources/hg19/indices/human_g1k_v37.fa \
-minPruning 5 \
-I /target/gpfs2/gcc/groups/gonl/projects/trio-analysis/intermediate/BQSR2/A102.human_g1k_v37.trio_realigned.recal.bam \
-L 1:1-10000000 \
-L ~/jobs/trio-analysis/ug/comp/hc/test.mu.vcf \
-o ~/jobs/trio-analysis/ug/comp/hc/test.out.vcf \
-isr INTERSECTION \
-alleles ~/results/trio-analysis/ug/gonl.merged.ug_pindel_asm_clever.left.biallelic.20bp.N2.vcf


Any help on how to overcome this would be much appreciated!

Below is the stack trace:

##### ERROR stack trace

java.lang.IllegalStateException: BUG: GenomeLoc 1:768116-768116 has a size == 1 but the variation reference allele has length 21 this = [VC UG_call @ 1:768116 Q136.86 of type=INDEL alleles=[AGTTTTGTTTTGTTTTGTTTT*, A, AGTTTTGTTTTGTTTTGTTTTGTTTT] attr={} GT=[[A102a AGTTTTGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTTGTTTT GQ 6 PL 116,6,0,137,21,881],[A102b AGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTT GQ 0 PL 0,0,0,3,3,165],[A102c AGTTTTGTTTTGTTTTGTTTTGTTTT/AGTTTTGTTTTGTTTTGTTTTGTTTT GQ 3 PL 45,3,0,60,15,885]]

##### ERROR ------------------------------------------------------------------------------------------

Edit: Excluding all multi-allelic sites, it seems like I am not getting this type of errors any more. However I am now getting another error on a bi-allelic site. I haven't checked which site it is as the error doesn't specify it but could look into it if you think it is valuable.

Cheers,
Laurent

##### ERROR ------------------------------------------------------------------------------------------
Post edited by Laurent on
Tagged:

Hi Laurent,

Could you upload a bam snippet so we can reproduce the error locally? Instructions here:

Geraldine Van der Auwera, PhD

• Member, Collaborator Posts: 43 ✭✭

Hi Geraldine,

2013_03_08.BUG_GenomeLoc_HC.tar.gz

They correspond to the 2 errors posted in this thread. Note that I managed to reproduce the first one on a bi-allelic site after all, so I thought I'd post it too.

The first file uses 1 bam file containing 1 trio, the second one 1 bam file containing 248 families (trios,quartets) as I could not isolate the problematic one so far but could attempt to if necessary.

Another thing I should have mentioned earlier is the pipeline these bam files were generated with as it is fairly unusual:

1. BWA aln [per lane]
2. BWA sampe [per lane]
3. Picard MarkDuplicates [per lane]
4. GATK Indel realignment using known indels using 1KG pilot [per sample]
5. GATK BQSR v1 [per sample]
6. GATK Indel realignment using known indels from 1KG Phase1 + from the reads [per family]
7. GATK BQSR v2 [per family]

Maybe the pipeline above is to blame for the errors I am encountering.

Thanks a lot!
Laurent

Hi Laurent, thanks for the files. Others have encountered the "Bad likelihoods" issue so it's probably not related to your pipeline. Not sure about the other one -- we'll take a look. FYI we're focusing on the "bad likelihoods" bug first since it seems to affect more people. I'll let you know when we have a fix.

Geraldine Van der Auwera, PhD

• Member, Collaborator Posts: 43 ✭✭

Sounds great, thanks a lot!

Hi Laurent,

FYI we have a theoretical fix for the "Bad likelihoods detected error" and are now working on implementing it.

Geraldine Van der Auwera, PhD

• Member, Collaborator Posts: 43 ✭✭

Great! Thanks for the update

• Member, Collaborator Posts: 43 ✭✭

Regarding the BUG: GenomeLoc issue, I found out the problem: I had the "END" annotation with the wrong coordinates in my VCF for some positions. I am not sure where it happened in the pipeline but simply removing the annotation got rid of the problem. Maybe this will also be useful for others...

Cheers,
Laurent

• Member Posts: 51

Can you elaborate a bit more? You are saying that your end position number was incorrect?
You had to manually change the position?

I am running into a similar error when trying to merge GATK compatible (supposidly) vcf files. Thank you.