This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
I have 50 exome samples belong to 25 families. Do I run GenotypeVCFs on familywise or 50 together?
We have exome sequenced data for 50 samples in total for a cardiac disease. But they have been sequenced in different batches. Even some of the batches were 2 years old. We have relationship information available for these 50 samples. So these 50 samples have been grouped to 25 families, that is each family has 2 samples. Each family relationship can be any one of the following: siblings, sisters, brothers, father & son, and mother & daughter. **Currently, I have GVCFs available for 50 samples. **
As per the article "GATK Tutorial: Variant Callset Evaluation & Filtering", there are two requirements for Variant Quality Score Recalibration (VQSR)
1) GATK requires atleast 30 exome samples or more or 1 whole genome sample
2) Known variant databases
Case1: If I run GenotypeVCFs on each family wise, then I won't be able to filter using VQSR. I need to go for hard filtering. (because I have only 2 exome samples under each family)
Case2: If I run GenotypeVCFs on 50 samples together, then I can filter using VQSR.
Do I need to run "GenotypeVCFs (Joint Calling)" on each family individually or 50 samples together?
If I opt for case2, won't I miss family specific mutations?