Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

I have 50 exome samples belong to 25 families. Do I run GenotypeVCFs on familywise or 50 together?

NandaNanda CanadaMember

We have exome sequenced data for 50 samples in total for a cardiac disease. But they have been sequenced in different batches. Even some of the batches were 2 years old. We have relationship information available for these 50 samples. So these 50 samples have been grouped to 25 families, that is each family has 2 samples. Each family relationship can be any one of the following: siblings, sisters, brothers, father & son, and mother & daughter. **Currently, I have GVCFs available for 50 samples. **

As per the article "GATK Tutorial: Variant Callset Evaluation & Filtering", there are two requirements for Variant Quality Score Recalibration (VQSR)
1) GATK requires atleast 30 exome samples or more or 1 whole genome sample
2) Known variant databases

Case1: If I run GenotypeVCFs on each family wise, then I won't be able to filter using VQSR. I need to go for hard filtering. (because I have only 2 exome samples under each family)

Case2: If I run GenotypeVCFs on 50 samples together, then I can filter using VQSR.

Do I need to run "GenotypeVCFs (Joint Calling)" on each family individually or 50 samples together?
If I opt for case2, won't I miss family specific mutations?

Best Answer

Answers

  • NandaNanda CanadaMember
    edited April 2017

    Thanks, Sheila. I read the article you provided. Last Thursday, I started GenotypeGVCFs for 50 samples together. But I didn't mention the "--useNewAFCalculator". Then I will submit another job mentioning this parameter. What is the significance of the new QUAL calculated from "usenewAFcalculator" option?

    Next step, I am running VQSR for the raw VCF generated for my exome samples as per the GATK best practices. For annotations of variants,
    1) I read that "DP-Depth of Coverage" should not be used for exome datasets.
    2) Also, the "InbreedingCoeff" requires at least 10 samples to be computed. I have 50 samples (25 families), but there is another line mentioning that I should omit this annotation
    - if I have fewer samples or
    - if I have closely related samples (such as a family). In my case, I have 25 families with different relationships aforementioned.
    Therefore, I omitted following options DP and InbreedingCoeff from my command. Is my understanding correct?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
Sign In or Register to comment.