Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GVCF of amplicon sequencing

Hi,

I'm trying to generate a GVCF with HaplotypeCaller based on amplicon (PCR) sequencing, and bases towards the end of the reads (~10 bases last bases) consistently get genotype quality of 0, even when the base quality is high. I tried simulating reads, and even if the position is covered by reads from both strands, with a large number of reads, and high base quality the genotype quality of the reference is 0. If there is a SNP in that position, it does get called with reasonable genotype quality, only the reference bases get quality 0. I tried changing various parameters but nothing seems to make a difference? Is there a parameter I'm missing? should I submit a detailed bug report?

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited August 14

    Hi @ulitskyi

    1) What version of gatk are you using.
    2)Please post the exact command you are using
    3) Have you taken a look at the bamout for the regions with GQ = 0?
    4) Please post records of the variants with GQ = 0

    Post edited by bhanuGandham on
  • ulitskyiulitskyi Member ✭✭

    1) I'm using 4.1.3.0
    2) java -Xmx6g -jar ../gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar HaplotypeCaller -R ~/gatk/hg19.unmasked.fa -G StandardAnnotation -mbq 17 --standard-min-confidence-threshold-for-calling 25 --max-reads-per-alignment-start 0 --disable-read-filter NotDuplicateReadFilter --min-dangling-branch-length 5 --allow-non-unique-kmers-in-ref --kmer-size 30 --kmer-size 10 --kmer-size 15 -ERC GVCF -I read1.sampe.bam -O small.i.vcf -L small.intervals --recover-all-dangling-branches --assembly-region-out test.txt --dont-trim-active-regions --min-assembly-region-size 28

    3) The bamout does seem to include the region of interest where GT=0 (see image https://us.v-cdn.net/5019796/uploads/editor/ye/ty6e3xa8vqom.png "")

    4) this it the relevant region in the VCF:
    chr1 14155328 . G . . END=14155401 GT:DP:GQ:MIN_DP:PL 0/0:10:30:10:0,30,385
    chr1 14155402 . G A, 414.02 . DP=10;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=36000,10 GT:AD:DP:GQ:PL:SB 1/1:0,10,0:10:30:428,30,0,428,30,428:0,0,10,0
    chr1 14155403 . T . . END=14155440 GT:DP:GQ:MIN_DP:PL 0/0:10:30:10:0,30,426
    chr1 14155441 . G . . END=14155596 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0

    This is a region if I'm altering the read to include a SNP in the region where there is a GT=0:
    chr1 14155328 . G . . END=14155401 GT:DP:GQ:MIN_DP:PL 0/0:100:99:100:0,120,1800
    chr1 14155402 . G A, 4486.03 . DP=100;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=360000,100 GT:AD:DP:GQ:PL:SB 1/1:0,100,0:100:99:4500,301,0,4500,301,4500:0,0,100,0
    chr1 14155403 . T . . END=14155445 GT:DP:GQ:MIN_DP:PL 0/0:100:99:100:0,120,1800
    chr1 14155446 . C . . END=14155446 GT:DP:GQ:MIN_DP:PL 0/0:100:0:100:0,0,0
    chr1 14155447 . T G, 1829.60 . BaseQRankSum=0.000;DP=100;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=360000,100;ReadPosRankSum=0.00GT:AD:DP:GQ:PL:SB 0/1:50,50,0:100:99:1837,0,1837,1988,1988,3976:50,0,50,0
    chr1 14155448 . T . . END=14155596 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0

    Issue · Github
    by bhanuGandham

    Issue Number
    6100
    State
    open
    Last Updated
    Assignee
    Array
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ulitskyi

    I am sorry I meant GQ, genotype quality = 0

  • ulitskyiulitskyi Member ✭✭

    That's ok I meant GQ as well :) - you can see that chr1 14155446 and chr1 14155448-chr1 14155455 get GQ=0 here, even though they are all covered by hiqh quality reads, both in the input BAM and in the bamout. Please advise...

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Can you please send me your gvcf and bamout using instructions provided here: https://software.broadinstitute.org/gatk/guide/article?id=1894

  • ulitskyiulitskyi Member ✭✭

    uploaded ulitsky.zip - please let me know if you see it

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @ulitskyi

    I am looking into this and will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ulitskyi

    This might be a bug in the code specific to amplicon sequencing. I have opened a github issue ticket for it and you can track its progress here: https://github.com/broadinstitute/gatk/issues/6100

Sign In or Register to comment.