If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.
Individual vcf for each sample and single vcf for all samples. Does the output contents differ?
I have performed variant calling analysis for 24 samples using GATK pipeline. I need some clarifications on following things
1) If I generate single VCF file for each of the 24 samples individually and then generate a single VCF file containing all 24 samples,
Are there any differences between them in the output VCF?
if yes, what are the differences?
The reason why I am asking this is, I have family level information and also symptom level information for those 24 samples.
Family level information for those 24 samples
FamilyA : Sample1, Sample2, Sample3
FamilyB : Sample4, Sample5, Sample6
FamilyH : Sample22, Sample23, Sample24
Symptom level information for those 24 samples
Joint pain : Sample1, Sample 4, Sample 14, Sample 15, Sample,16, Sample17
Bleeding : Sample2, Sample5, Sample6
Symptom X : …..
I would like to know whether the samples that are grouped together in the above scenario have any common genetic variants among them. In other words, are there 'secondary' variants elsewhere in the exome (other than the X gene) that are common amongst patients that suffer from the same symptoms.
- I want to find common variants for the bleeding symptom, does the common variants differ between the case1 and case2 or not?
case1: I am comparing individual VCF file (sample2.vcf, sample5.vcf and sample6.vcf) and filtering the common variants
case2: I am extracting just the sample2, sample5, and sample6 from the single VCF file with all 25 samples in it
As the above example, I would like to find common variants at the family level as well.