HaplotypeCaller may fail to detect variant with the same reads with a different composition.

maikedamaikeda JapanMember
edited February 15 in Ask the GATK team

I have experienced a variant detection issue with confusion. The png file attached is the result of exact same NextSeq experiment but the read extraction range is different.

NextSeq2_point.bam: bam is composed of the reads which cover chr16: 89100686 position only.

NextSeq2_region.bam: bam is composed of the reads which cover the region of chr16: 89100686 +-100bp.
On position chr16:89100686, I presume T>C should be detected, but HaplotypeCaller failed to detect the variant with NextSeq2_region.bam.

NextSeq2_point.vcf:
chr16 89100686 . T C,<NON_REF> 7397.77 . DP=199;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=716400.00 GT:AD:DP:GQ:PL:SB 1/1:0,199,0:199:99:7426,599,0,7426,599,7426:0,0,155,44

NextSeq2_region.vcf:
chr16 89100686 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:0,199:199:0:0,0,0

What causes the difference and why?

--- GATK Version (Docker latest)
    Using GATK wrapper script /gatk/build/install/gatk/bin/gatk
    Running:
        /gatk/build/install/gatk/bin/gatk HaplotypeCaller --version
    Version:4.0.1.2
---

--- Command used
gatk HaplotypeCaller -I /temp/NextSeq2_region.bam -O /temp/NextSeq2_region.vcf -R /temp/genome.fa -L /temp/only16.bed --debug true --output-mode EMIT_ALL_SITES --all-site-pls true --dont-trim-active-regions true --emit-ref-confidence BP_RESOLUTION
---
--- Genome Version: hg38
--- bed
chr16   89100681    89101347    NM_174917.4_cds_2_0_chr16_89100682_f    0   +

If you need the bams and vcfs, I can post them here.

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @maikeda
    Hi,

    Can you confirm the base qualities are all good/high in the region BAM file? Are the base qualities better in the point BAM file?

    Thanks,
    Sheila

  • maikedamaikeda JapanMember

    Thank you, Sheila.
    Two bam's reads are identical. The "point BAM" is a subset of the "region BAM." The base qualities of the read are high enough, I think. The image below shows the overview of base quality. Shaded base's quality is less than 30 here.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @maikeda
    Hi,
    Just to confirm, when you say shaded bases, you mean basically all the bases in your screenshot? Can you check if the base qualities for the missed sites are higher than 20? If they are, I may need you to submit a bug report.

    Thanks,
    Sheila

  • maikedamaikeda JapanMember

    Thank you. Sheila.

    Almost all the bases have base quality value over 20.

    The following shows a pileup sequence of "NextSeq2_region.bam."

    $ samtools mpileup -r chr16:89100686-89100686 NextSeq2_region.bam
    chr16   89100686    N   199 CcCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCcccccCCCCcccCCCCCCCCCCCcccCCCCCCCCCCCCcCcCCCCCCCCCCCCCCCCCCCCCcccccCCCCCCCCCCCCCcCcccccccCCCCCCCCcCCCCCCCCCCCCCcccccCCCCCCCCcccCCCCCCCCCCCccccCCCCCCccc^]C^]C^]C^]C^]C^]C^]C^]C^]C^]C^]c   D<[email protected]<<<;;EEED<<<EEEEDEEDEDE=<=DEEEEDEDEDDE<D:DDDEEEDDDEEEEDDDEDEDD<<<<<DDEDEEEDEDDEE:D<<<===5DEDEEDDE;DEEDEDDDEEEEE<<<<<DDBBAAAA;:;[email protected]@@@[email protected]@[email protected]:
    

    Small fraction of bases are fallen under base quality value 20.

    As a reference, version of the samtools to create pileup is following.

    $ samtools --version
    samtools 1.7-5-g2bd9d6b
    Using htslib 1.7-23-g9f9631d
    Copyright (C) 2018 Genome Research Ltd.
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @maikeda
    Hi,

    Alright, I am not sure what is going on. Can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

  • maikedamaikeda JapanMember

    Thank you. Sheila.

    I put a report on the ftp server. The filename is 11436_gatk_report.tar.gz, just in case.

    The older version of GATK's results are also included.
    Based on this result, gatk_3.2-2 and gatk_3.3-0 could detect the variant I mentioned, but gatk 3.4-46 and newer failed to detect it.

    Issue · Github
    by shlee

    Issue Number
    3008
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    sooheelee
  • shleeshlee CambridgeMember, Broadie, Moderator

    Thanks, @maikeda. We've received your bug report data and will follow up with you soon.

  • shleeshlee CambridgeMember, Broadie, Moderator
    edited March 22

    Hi @maikeda, I'm just letting you know I can recapitulate your observations. I am consulting with the team currently. Our developer is away at a workshop currently and can take a look next week. Thanks for your patience.

    Post edited by shlee on
  • maikedamaikeda JapanMember

    Thank you, Shell.
    I'm glad to know you could replicate the result. I am looking forward to hearing from you next week.
    Have a nice weekend.

Sign In or Register to comment.