Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

HaplotypeCaller Incorrectly making Heterozygous Calls (Again)

aeonsimaeonsim Posts: 44Member
edited March 2013 in Ask the team

Hi

For GATK: GenomeAnalysisTK-2.4-7-g5e89f01

It would appear that the issue with the HaplotypeCaller making incorrect Het calls when it should be Hom has turned up again (if it ever actually went away). Note this appears to be the same issue I reported last time: http://gatkforums.broadinstitute.org/discussion/1805/haplotype-caller-incorrectly-calling-blocks-of-variants-heterozygous but considering the time since that report, and that this is from a different sample set and different versions of GATK I thought it best to create a new post. If your prefer to merge them please do so.

So in this occasion we've been looking at a single animal (~16x) using the HaplotypeCaller & UnifiedGenotyper and once again we are finding that the HaplotypeCaller is making Heterozygous calls where there is no support for them in the BAM.

Example Regions, BosTau6 reference: chr18:55,432,023-55,432,220 chr18:55,350,724-55,351,079

Attached you will find images showing this: For the first position & image chr18:55,432,023-55,432,220 the tracks in IGV are: HaplotypeCaller VCF, Population UnifiedGenotyper VCF, Sample BAM, Sample ReducedReads BAM. If we look at this call we have 9x depth, all reads with mapQ 60, Cigar 101M and BaseQ between 21 & 33, a good balance between forward and reverse and ALL 9 reads contain the Alternate allele. Which means the Site should have been called Alt/Alt not Alt/Ref, but for some reason even through there are no reference reads the HaplotypeCaller has called the site Ref/Alt. Note the UnifiedGenotyper correctly calls this site Alt/Alt.

For the second image, position chr18:55,350,724-55,351,079 the tracks in IGV are: Haplotype Caller VCF, UnifiedGenotyper VCF, UG Population VCF, Sample BAM (PCRdedup, IR, BQSR), Sample ReduceReads Bam In this example we have 4 Variants in a cluster (chr18:55,350,895-55,350,975) that the HaplotypeCaller has called as Het (Ref/Alt) when there is no support for this call in the Reads, secondly the UnifiedGenotyper has successfully called each site as Homozygous (Alt/Alt). The reads are a bit more complex at this site but in each case there are 11 reads all of which are Alt alleles and no Reference allele.

Note: HaplotypeCaller & UnifiedGenotyper were run on the full bams, not the ReducedRead bams.

I will upload the region of the BAM file and the VCF files as: Chr18-HC-Het-issues-ULG.tar.gz

HET-Mistake2-chr18-55432101-55432140.png
1648 x 1021 - 61K
Het-mistakes1-chr18-55350744-55351099.png
1652 x 1021 - 75K
Post edited by aeonsim on

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hi there,

    OK, we'll take a closer look at this. Note that we were never able to replicate any of your issues in the past, which is probably why they were never fixed.

    Thanks for uploading the data files; we'll also need the exact command line that your are using with the latest stable version, and the BosTau6 reference files too. Looks like they were deleted off of our FTP from the last time you uploaded them.

    Geraldine Van der Auwera, PhD

  • aeonsimaeonsim Posts: 44Member

    Ok, is your FTP currently working I'm getting errors about service not available when attempting to upload the files.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    It should be working, yes. Are you sure you're using the correct credentials? They are different for uploading vs. downloading.

    Otherwise there is currently an issue that prevents direct connection through shell protocols, so if you're trying that you may need to switch to a GUI client.

    Geraldine Van der Auwera, PhD

  • aeonsimaeonsim Posts: 44Member

    Well I've tested it on the extracted BAM I'm trying to upload with the command:

     java -jar ../../tools/GenomeAnalysisTK-2.4-7-g5e89f01/GenomeAnalysisTK.jar -R ../../refs/bosTau6.fasta -T HaplotypeCaller -I Chr18-53m-55m.bam -L chr18:53970861-55552981 -o test.mara.vcf
    

    And gotten exactly the same result as in the Screenshots and previous VCFs so hopefully you'll be able to replicate it this time.

  • aeonsimaeonsim Posts: 44Member

    Ok that's an annoying bug I was using cli ftp, filezilla works. Upload of the BAM and VCFs is completed, reference will be up in the next 30mins or so.

  • aeonsimaeonsim Posts: 44Member

    Reference: bosTau6.fasta.bz2 uploaded, hopefully your able to replicate the issue this time. Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Good news: we've been able to track down your issues.

    The first region is due to a bug that is fixed in our internal development version. It will be available in the next release, or you can use our nightly builds (see downloads page) if you can't wait. The bug was that we were accidentally filling in the bases for the nearby deletion which caused the wrong haplotype to be used for genotyping.

    The second region is an assembly/pruning problem that should be fixed by upcoming changes to the HC's assembly method. This should also be available in the next release. It is not currently available in the nightly builds however.

    Thanks for uploading your files and helping us fix these bugs!

    Geraldine Van der Auwera, PhD

  • aeonsimaeonsim Posts: 44Member

    Excellent, I will look forward to the update/release.

  • mseidelmseidel Posts: 8Member

    Hi,

    sorry for hijacking this thread but I thought this might be a related issue. I was using GATK for variant calling on maize and stumbled across the following call:

    chr_1 8823 ZmSYNBREED_10000_294 AC A 40626.35 . AC=58;BaseQRankSum=3.583;MQRankSum=1.436;InbreedingCoeff=0.4577;AF=0.935;HRun=1;AN=62;MQ0=0;FS=0.491;MQ=55.30;QD=21.19;HaplotypeScore=312.4568;SB=-19795.84;DS;DP=1931;ReadPosRankSum=0.686 GT:AD:DP:GQ:PL 0/0:14,0:14:42.14:0,42,635 1/1:26,28:55:95.13:1213,95,0 1/1:2,19:21:60.19:811,60,0 1/1:3,21:24:72.23:975,72,0 1/1:21,19:42:99:1004,101,0 1/1:84,126:214:99:5980,532,0 1/1:0,22:22:66.22:911,66,0 1/1:77,19:96:99:1386,193,0 1/1:53,46:102:99:2426,229,0 1/1:46,31:78:99:1676,178,0 1/1:39,11:52:94.14:747,94,0 1/1:10,4:15:12.72:208,13,0 1/1:68,7:76:99:617,134,0 1/1:4,22:26:78.19:994,78,0 0/1:11,15:26:29.99:606,0,30 1/1:18,19:37:40.93:868,41,0 1/1:52,17:70:99:1145,121,0 1/1:63,16:79:99:1177,158,0 1/1:89,112:202:99:5408,495,0 1/1:32,14:46:22.64:800,23,0 1/1:29,32:61:99:1723,159,0 1/1:11,25:36:93.85:1127,94,0 1/1:14,13:27:66:639,66,0 1/1:117,49:167:99:2834,368,0 1/1:19,20:39:58.17:853,58,0 1/1:45,8:53:97.28:576,97,0 1/1:60,18:77:99:1115,147,0 1/1:54,12:66:99:951,136,0 1/1:1,19:20:60.20:827,60,0 1/1:38,16:54:99:969,118,0 0/1:32,0:33:60:60,0,644

    What appears odd to me are the genotypes given the actual read depths, e.g. the very last sample: 32x reference, 0x alternative => heterozygous or the very first sample: 26x ref, 28x alt => homozygous alt. When looking into the BAM file for this example, I foudn that all reads are mapped with good mapQ and baseQ. Is there any obvious explanation for this behaviour that I am missing?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hi @mseidel,

    That sounds like the same bug aeonsim reported. Were you also using HaplotypeCaller?

    Geraldine Van der Auwera, PhD

  • mseidelmseidel Posts: 8Member

    Ok, I feared so. I am actually using the UnifiedGenotyper with reference, my bamfiles and --dbsnp, --genotype_likelihoods_model BOTH -L -o .

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hmm, might not be the same issue then since aeonsim's bug is specific to HaplotypeCaller. Are you using the latest version of GATK?

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Note that your problem might simply be due to the AD annotation not being properly computed for indels. We have seen that happen before. You should check the pileup of reads (eg in a genome browser like IGV) and see for yourself what is the allele depth at that site.

    Geraldine Van der Auwera, PhD

  • mseidelmseidel Posts: 8Member

    I was using GenomeAnalysisTK-1.6-5-g557da77 which at the time of the analysis was the latest version but I just found this oddity today. Any chance this is a known bug which has been fixed in more recent versions?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Oh, that's a very old version. It is extremely likely that if that was a bug, it has been fixed since then. You should definitely upgrade to a more recent version. We are currently on 2.4-9.

    Geraldine Van der Auwera, PhD

  • mseidelmseidel Posts: 8Member

    Ok, I'll rerun with the latest version and report back if the problem persists. Sorry for bothering with an issue from an out-of-date version :S

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hah, no worries :) Good luck!

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hi @aeonsim, FYI we've been able to confirm that the bug you saw in your second region is fixed in our development build, and the fix is now available in the nightly builds. The official release containing all these fixes should come out in a couple of weeks.

    Geraldine Van der Auwera, PhD

  • prepagamprepagam Posts: 8Member

    I ran Haplotype Caller with 2.4-9 and I have the issue with incorrect heterozygous calls. Specifically many of the SNPs, are showing up as heterzygous - but the SNP is absent in the bam files and in Unified Genotyper (run on the same bam files).image

    HapCaller.png
    1571 x 639 - 29K
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    We believe this is due to a bug that has been fixed internally. You can either use the latest nightly builds to test whether the problem still occurs for you, or wait for the upcoming 2.5 release.

    Geraldine Van der Auwera, PhD

  • mseidelmseidel Posts: 8Member

    Hi,

    sorry to bother again. I reran my analysis using GenomeAnalysisTK-2.4-9-g532efad and still find various odd calls, e.g. the following:

    chr_8 132298485 . CAGCCAGCA C 7720.52 . AC=41;AF=0.707;AN=58;BaseQRankSum=10.626;DP=202;FS=20.148;InbreedingCoeff=0.5379;MLEAC=41;MLEAF=0.707;MQ=42.23;MQ0=0;MQRankSum=-8.882;QD=7.26;ReadPosRankSum=-2.279 GT:AD:DP:GQ:PL [...] 0/1:0,19:21:50:1451,0,50 [...]

    The sample is clearly homozygous alt but gets called as het. Other samples are:

    0/1:0,25:27:2:1915,0,2 0/0:0,2:21:51:0,51,511 1/1:4,0,0:53:0:298,27,0,27,0,0

    Is there some filtering after the call has been made that removes coverage or is there something else that I might be missing?

    I noticed that those issues usually appear at sites with indels. I also notice that the PL difference in the first example is rather low (0 vs 2) while the overal quality at this position is extremly high with 14639.66.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    We are aware that the odd het calls issue still occurs in 2.4-9. The fix for the problem will be available in version 2.5, which we aim to release within the next few days. Otherwise you can try the latest nightly build (see Download page) to check that your issue no longer occurs with our latest dev fixes.

    Geraldine Van der Auwera, PhD

  • ZaagZaag Posts: 6Member

    Could this also lead to a not called heterozygous SNP?

    I have a heterozygous SNP that get's called with a depth of around 600 with the Unifiedgenotyper but not at all with the Haplotypecaller. All steps done with GATK 2.4-9

    I could provide more info, but if 2.5 or the nightly build might solve it I'll try that first.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    You can go ahead and try with the nightly build, sure. If it's still not called then please post the VCF line of the UG call, and an IGV screenshot is always helpful as well.

    Geraldine Van der Auwera, PhD

  • ZaagZaag Posts: 6Member

    Nightly build called it correct. Just to be complete, here is the line from the HaplotypeCaller and an IGV shot.

    chr10 88478529 . G A 1900.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-2.806;ClippingRankSum=-1.819;DP=256;FS=8.994;MLEAC=1;MLEAF=0.500;MQ=47.09;MQ0=0;MQRankSum=-1.022;QD=7.42;ReadPosRankSum=0.678 GT:AD:GQ:PL 0/1:109,95:99:1929,0,1564

    Thanks for your answer :)

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Glad to see it's working for you! Nice IGV shot :)

    Geraldine Van der Auwera, PhD

  • mike_boursnellmike_boursnell Posts: 69Member

    Hi - I'm getting HaplotypeCaller (version 2.8-1-g932cd3a) calling heterozygotes where there is no evidence in IGV as well. It is also calling a homozygote when it looks like a heterozygote.

    What's the latest on this one? (see attached screenshots for incorrect heterozygote)

    bmp
    bmp
    Variant callers 7.bmp
    5M
    bmp
    bmp
    Variant callers 8.bmp
    5M
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    @mike_boursnell, could you please upload a test snippet for us to debug locally? We'll try to sort this out asap for you.

    Geraldine Van der Auwera, PhD

  • mike_boursnellmike_boursnell Posts: 69Member

    How do I make a snippet? With samtools view?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    You can do it easily with PrintReads, just specify the interval that you want the snippet from.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.