We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Discovering singletons with GenotypeGVCFs?

Hi,

I have several samples that I ran HaplotypeCaller (in normal mode) with that I am looking to discover germline variants from. I read that GenotypeGVCFs isn't good with discovering singletons, and it is likely that there will be many singletons in the samples that I have. Does anyone have a solution to this? I was planning on running GenotypeGVCFs on each sample individually so as to prevent singletons from being lost.

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited October 2019

    Hi,

    The GATK support team will primarily focus on resolving questions about GATK tool errors or abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and the tools.

    We cannot guarantee a reply, however we ask other community members to help out if you know the answer.

    For more information:

    https://software.broadinstitute.org/gatk/blog?id=24419

    https://gatkforums.broadinstitute.org/gatk/discussion/24417/what-types-of-questions-will-the-gatk-frontline-team-answer/p1?new=1

    Post edited by bhanuGandham on
  • gauthiergauthier Member, Broadie, Dev ✭✭✭

    Where did you read that GenotypeGVCFs "isn't good with discovering singletons"? A computational experiment I did years ago shows that there is no loss of singleton sensitivity with increasing cohort size: https://drive.google.com/open?id=0BzI1CyccGsZiTFYzeXMzNUYxU2s

    Analysis should never be performed on ungenotyped GVCFs. They contain a lot of low quality variants that are likely false positives and get removed by GenotypeGVCFs, which requires enough evidence that there is less than a 1/1000 chance of a false positive, by default. (About one in every thousand bases in the human genome is variant, so we require more confidence than a 1/1000 chance of FP.) Running GenotypeGVCFs on a multi-sample GVCF will increase the discovery power if there are variants with AC > 1 and variants with AC = 1 should be unaffected.

Sign In or Register to comment.