We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

I have 50 exome samples belong to 25 families. Do I run GenotypeVCFs on familywise or 50 together?

NandaNanda CanadaMember

We have exome sequenced data for 50 samples in total for a cardiac disease. But they have been sequenced in different batches. Even some of the batches were 2 years old. We have relationship information available for these 50 samples. So these 50 samples have been grouped to 25 families, that is each family has 2 samples. Each family relationship can be any one of the following: siblings, sisters, brothers, father & son, and mother & daughter. **Currently, I have GVCFs available for 50 samples. **

As per the article "GATK Tutorial: Variant Callset Evaluation & Filtering", there are two requirements for Variant Quality Score Recalibration (VQSR)
1) GATK requires atleast 30 exome samples or more or 1 whole genome sample
2) Known variant databases

Case1: If I run GenotypeVCFs on each family wise, then I won't be able to filter using VQSR. I need to go for hard filtering. (because I have only 2 exome samples under each family)

Case2: If I run GenotypeVCFs on 50 samples together, then I can filter using VQSR.

Do I need to run "GenotypeVCFs (Joint Calling)" on each family individually or 50 samples together?
If I opt for case2, won't I miss family specific mutations?

Best Answer


  • NandaNanda CanadaMember
    edited April 2017

    Thanks, Sheila. I read the article you provided. Last Thursday, I started GenotypeGVCFs for 50 samples together. But I didn't mention the "--useNewAFCalculator". Then I will submit another job mentioning this parameter. What is the significance of the new QUAL calculated from "usenewAFcalculator" option?

    Next step, I am running VQSR for the raw VCF generated for my exome samples as per the GATK best practices. For annotations of variants,
    1) I read that "DP-Depth of Coverage" should not be used for exome datasets.
    2) Also, the "InbreedingCoeff" requires at least 10 samples to be computed. I have 50 samples (25 families), but there is another line mentioning that I should omit this annotation
    - if I have fewer samples or
    - if I have closely related samples (such as a family). In my case, I have 25 families with different relationships aforementioned.
    Therefore, I omitted following options DP and InbreedingCoeff from my command. Is my understanding correct?

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭
Sign In or Register to comment.