Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

it's not accurate to use only one HaplotypeCaller gVCF as the input of GenotypeGVCF ?

YingLiuYingLiu ChinaMember

HI ,
it's not accurate to use only one HaplotypeCaller gVCF as the input of GenotypeGVCF ?
if yes , how to prepare more gVCFs for my sample ? I just care the only one sample gVCF .

Tagged:

Answers

  • FabienneFabienne ParisMember

    Hello
    We have noticed some differences between genotypes coming from genotypeGVCF on only one individual GVCF and the genotypes of the same individual called through a combineGVCF + genotypeGVCF on a cohort of people. If we look only on the common sites between the wo vcf files, the genotypes are sometimes different for the same variant (especially when DP is low). Is it on purpose ? Which is the most reliable calling ?
    Thank you for your answer

    Fabienne

  • YingLiuYingLiu ChinaMember

    @Fabienne
    I need look unique sites with individual ,not only common sites.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited April 2017

    @YingLiu
    Hi,

    If you have only one sample and you are only interested in that sample's variants, you can either run HaplotypeCaller in normal mode or in GVCF mode. The GVCF workflow helps when you have many samples to analyze together because it saves compute or when you will get more samples to add to your analysis later on. Have a look at this article.

    -Sheila

    P.S. If you would like to add more samples to your analysis, you can get data from the 1000Genomes project.

    Edit: Also, have a look at this thread.

    Post edited by Sheila on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Fabienne
    Hi Fabienne,

    Have a look at this article, specifically after "Why "almost always"?".

    It is true that for low DP sites the tool will not be able to emit a high-confidence genotype. That would explain the differences you are seeing. The rest of the article I linked to on top will explain some other reasons. You may also be interested in the new QUAL calculation that helps with the missing singletons. You can invoke it with --useNewAFCalculator.

    -Sheila

Sign In or Register to comment.