We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GVCF of amplicon sequencing

Hi,

I'm trying to generate a GVCF with HaplotypeCaller based on amplicon (PCR) sequencing, and bases towards the end of the reads (~10 bases last bases) consistently get genotype quality of 0, even when the base quality is high. I tried simulating reads, and even if the position is covered by reads from both strands, with a large number of reads, and high base quality the genotype quality of the reference is 0. If there is a SNP in that position, it does get called with reasonable genotype quality, only the reference bases get quality 0. I tried changing various parameters but nothing seems to make a difference? Is there a parameter I'm missing? should I submit a detailed bug report?

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited August 2019

    Hi @ulitskyi

    1) What version of gatk are you using.
    2)Please post the exact command you are using
    3) Have you taken a look at the bamout for the regions with GQ = 0?
    4) Please post records of the variants with GQ = 0

    Post edited by bhanuGandham on
  • ulitskyiulitskyi Member ✭✭

    1) I'm using 4.1.3.0
    2) java -Xmx6g -jar ../gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar HaplotypeCaller -R ~/gatk/hg19.unmasked.fa -G StandardAnnotation -mbq 17 --standard-min-confidence-threshold-for-calling 25 --max-reads-per-alignment-start 0 --disable-read-filter NotDuplicateReadFilter --min-dangling-branch-length 5 --allow-non-unique-kmers-in-ref --kmer-size 30 --kmer-size 10 --kmer-size 15 -ERC GVCF -I read1.sampe.bam -O small.i.vcf -L small.intervals --recover-all-dangling-branches --assembly-region-out test.txt --dont-trim-active-regions --min-assembly-region-size 28

    3) The bamout does seem to include the region of interest where GT=0 (see image https://us.v-cdn.net/5019796/uploads/editor/ye/ty6e3xa8vqom.png "")

    4) this it the relevant region in the VCF:
    chr1 14155328 . G . . END=14155401 GT:DP:GQ:MIN_DP:PL 0/0:10:30:10:0,30,385
    chr1 14155402 . G A, 414.02 . DP=10;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=36000,10 GT:AD:DP:GQ:PL:SB 1/1:0,10,0:10:30:428,30,0,428,30,428:0,0,10,0
    chr1 14155403 . T . . END=14155440 GT:DP:GQ:MIN_DP:PL 0/0:10:30:10:0,30,426
    chr1 14155441 . G . . END=14155596 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0

    This is a region if I'm altering the read to include a SNP in the region where there is a GT=0:
    chr1 14155328 . G . . END=14155401 GT:DP:GQ:MIN_DP:PL 0/0:100:99:100:0,120,1800
    chr1 14155402 . G A, 4486.03 . DP=100;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=360000,100 GT:AD:DP:GQ:PL:SB 1/1:0,100,0:100:99:4500,301,0,4500,301,4500:0,0,100,0
    chr1 14155403 . T . . END=14155445 GT:DP:GQ:MIN_DP:PL 0/0:100:99:100:0,120,1800
    chr1 14155446 . C . . END=14155446 GT:DP:GQ:MIN_DP:PL 0/0:100:0:100:0,0,0
    chr1 14155447 . T G, 1829.60 . BaseQRankSum=0.000;DP=100;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=360000,100;ReadPosRankSum=0.00GT:AD:DP:GQ:PL:SB 0/1:50,50,0:100:99:1837,0,1837,1988,1988,3976:50,0,50,0
    chr1 14155448 . T . . END=14155596 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0

    Issue · Github
    by bhanuGandham

    Issue Number
    6100
    State
    open
    Last Updated
    Assignee
    Array
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ulitskyi

    I am sorry I meant GQ, genotype quality = 0

  • ulitskyiulitskyi Member ✭✭

    That's ok I meant GQ as well :) - you can see that chr1 14155446 and chr1 14155448-chr1 14155455 get GQ=0 here, even though they are all covered by hiqh quality reads, both in the input BAM and in the bamout. Please advise...

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Can you please send me your gvcf and bamout using instructions provided here: https://software.broadinstitute.org/gatk/guide/article?id=1894

  • ulitskyiulitskyi Member ✭✭

    uploaded ulitsky.zip - please let me know if you see it

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @ulitskyi

    I am looking into this and will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ulitskyi

    This might be a bug in the code specific to amplicon sequencing. I have opened a github issue ticket for it and you can track its progress here: https://github.com/broadinstitute/gatk/issues/6100

  • huwenhuwen Member
    Hi, I had a similar problem: The base site has high depth and good quality, but the result is GQ=0.
    version: GATK4.1.4.1
    java -Xmx16g -Djava.io.tmpdir=./tmp -jar gatk-package-4.1.4.1-local.jar HaplotypeCaller -R hs37d5.fa -L MT_Hypervariable_region.bed -D hotspot_MT.vcf --max-reads-per-alignment-start 50000 -I MT_aftercut.sort.bam -O MT.vcf -ERC GVCF --indel-size-to-eliminate-in-ref-model 0

    MT 16313 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2547,559:3106:0:0,0,70414
    MT 16314 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2532,571:3103:0:0,0,68676
    MT 16315 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2408,618:3026:0:0,0,63009
    MT 16316 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2502,571:3073:0:0,0,67973
    MT 16317 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:3042,40:3082:99:0,120,1800
    MT 16318 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2503,34:2537:99:0,120,1800
    MT 16319 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2488,49:2537:99:0,120,1800
    MT 16320 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:1249,1201:2450:0:0,0,0
    MT 16321 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2481,42:2523:99:0,120,1800
    MT 16322 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2532,24:2556:99:0,120,1800
    MT 16323 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2440,68:2508:99:0,120,1800
    MT 16324 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2428,79:2507:99:0,120,1800
    MT 16325 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2503,41:2544:99:0,120,1800
    MT 16326 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2525,27:2552:99:0,120,1800
    MT 16327 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2481,52:2533:99:0,120,1800
    MT 16328 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2523,29:2552:99:0,120,1800
  • DerekCADerekCA Member, Administrator, Broadie, Moderator admin

    Hi, @huwen. Have you tried lowering the --indel-size-to-eliminate-in-ref-model from the default value?

    If you have read through the Github ticket that was linked in relation to this issue, it appears that this is not necessarily related to a bug, but to the conservativeness of the method being applied.

  • huwenhuwen Member
    Yes, I've read though the discussion on github ticket, so I've tried lowering parameter, or even set it to 0, but it still doesn't work. Moreover, the 16320 variant site is still 20bp away from the reads terminal, so I think it is far enough to detect.
  • DerekCADerekCA Member, Administrator, Broadie, Moderator admin

    Thanks for the clarification, @huwen. I'll look into this issue and get back to you shortly.

  • DerekCADerekCA Member, Administrator, Broadie, Moderator admin

    Okay, @huwen. First of all, it should be noted that GQ=0 is a common readout for when an assembly fails.

    The original issue that was posted in this thread required some manual debugging to sort out, but I think that your issue may be simpler to resolve.

    It looks like you are using mitochrondrial data, which isn't typically diploid and is not best suited for HaplotypeCaller. Even if we resolved the error, you would find issues with the output data that would be problematic.

    Instead, try using Mutect running in mitochondrial mode, instead of HaplotypeCaller, in order to call mitochondrial reads.

    There is a mitochondrial pipeline being developed for Mutect2 which you may be interested to read about here.

Sign In or Register to comment.