If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
I have 50 exome samples belong to 25 families. Do I run GenotypeVCFs on familywise or 50 together?
We have exome sequenced data for 50 samples in total for a cardiac disease. But they have been sequenced in different batches. Even some of the batches were 2 years old. We have relationship information available for these 50 samples. So these 50 samples have been grouped to 25 families, that is each family has 2 samples. Each family relationship can be any one of the following: siblings, sisters, brothers, father & son, and mother & daughter. **Currently, I have GVCFs available for 50 samples. **
As per the article "GATK Tutorial: Variant Callset Evaluation & Filtering", there are two requirements for Variant Quality Score Recalibration (VQSR)
1) GATK requires atleast 30 exome samples or more or 1 whole genome sample
2) Known variant databases
Case1: If I run GenotypeVCFs on each family wise, then I won't be able to filter using VQSR. I need to go for hard filtering. (because I have only 2 exome samples under each family)
Case2: If I run GenotypeVCFs on 50 samples together, then I can filter using VQSR.
Do I need to run "GenotypeVCFs (Joint Calling)" on each family individually or 50 samples together?
If I opt for case2, won't I miss family specific mutations?