Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

PL scores for non-variant sites

KatieKatie United StatesMember ✭✭

Hi,
I have called variants with HaplotypeCaller and genotyped with GenotypeGVCFs and have noticed that in ~4000 sites of a 4.4-bp genome, there are 'no-calls' or '.' called as the genotype for non-variant sites. At these positions, there is variable coverage, but at some sites for which there is sufficient coverage, the PL scores are unexpectedly low and the output genotype call is '.' . I am wondering if this is due to low quality bases or changes in read coverage along the genome because I've noticed this sometimes occurs at sites of different coverage relative to neighboring sites. Any ideas would be great, because I'd like to have confidence about which sites are truly 'no-call' and can't be called any base.

Thank you!

# call variants with HaplotypeCaller
gatk  --java-options "-Xmx50g" HaplotypeCaller \
-R ${REF_DIR}${ref} \
-ploidy 1 \
-I ${BAMS_DIR}${bam} \
-ERC GVCF \
-O ${vcf}

# genotype variants with GenotypeGVCFs
gatk --java-options '-Xmx10g' GenotypeGVCFs \
-R ${REF_DIR}${ref} \
--variant ${gvcf} \
-ploidy ${ploidy} \
--include-non-variant-sites true \
--output ${gvcf%.g.vcf}".vcf"

A snippet of the gVCF file: see site 171703.

NC_000962.3     171702  .       G       <NON_REF>       .       .       END=171702      GT:DP:GQ:MIN_DP:PL      0:19:99:19:0,717
NC_000962.3     171703  .       A       <NON_REF>       .       .       END=171708      GT:DP:GQ:MIN_DP:PL      0:19:0:19:0,0
NC_000962.3     171709  .       T       <NON_REF>       .       .       END=171710      GT:DP:GQ:MIN_DP:PL      0:20:99:19:0,672
NC_000962.3     171711  .       C       <NON_REF>       .       .       END=171711      GT:DP:GQ:MIN_DP:PL      0:20:0:20:0,0

A snippet of the VCF file: see site 171703. why is QUAL 33.01?

NC_000962.3     171703  .       A       .       33.01   .       DP=19   GT:AD:DP        .:19,0:19
NC_000962.3     171704  .       C       .       33.01   .       DP=19   GT:AD:DP        .:19,0:19
NC_000962.3     171705  .       G       .       33.01   .       DP=19   GT:AD:DP        .:19,0:19
NC_000962.3     171706  .       G       .       33.01   .       DP=19   GT:AD:DP        .:19,0:19
NC_000962.3     171707  .       C       .       33.01   .       DP=19   GT:AD:DP        .:19,0:19
NC_000962.3     171708  .       A       .       33.01   .       DP=19   GT:AD:DP        .:19,0:19
NC_000962.3     171709  .       T       .       Infinity        .       DP=19   GT:AD:DP:RGQ    0:19,0:19:672
NC_000962.3     171710  .       G       .       Infinity        .       DP=19   GT:AD:DP:RGQ    0:19,0:19:672
NC_000962.3     171711  .       C       .       33.01   .       DP=20   GT:AD:DP        .:20,0:20

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @Katie

    ./. or no-call genotypes occur when the tool does not have enough information to make a proper genotype call. This could be due to low coverage or bad quality data or messy regions in general.

Sign In or Register to comment.