To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

I have 50 exome samples belong to 25 families. Do I run GenotypeVCFs on familywise or 50 together?

We have exome sequenced data for 50 samples in total for a cardiac disease. But they have been sequenced in different batches. Even some of the batches were 2 years old. We have relationship information available for these 50 samples. So these 50 samples have been grouped to 25 families, that is each family has 2 samples. Each family relationship can be any one of the following: siblings, sisters, brothers, father & son, and mother & daughter. **Currently, I have GVCFs available for 50 samples. **

As per the article "GATK Tutorial: Variant Callset Evaluation & Filtering", there are two requirements for Variant Quality Score Recalibration (VQSR)
1) GATK requires atleast 30 exome samples or more or 1 whole genome sample
2) Known variant databases

Case1: If I run GenotypeVCFs on each family wise, then I won't be able to filter using VQSR. I need to go for hard filtering. (because I have only 2 exome samples under each family)

Case2: If I run GenotypeVCFs on 50 samples together, then I can filter using VQSR.

Do I need to run "GenotypeVCFs (Joint Calling)" on each family individually or 50 samples together?
If I opt for case2, won't I miss family specific mutations?

Best Answer

Answers

  • NandaNanda CanadaMember
    edited April 2017

    Thanks, Sheila. I read the article you provided. Last Thursday, I started GenotypeGVCFs for 50 samples together. But I didn't mention the "--useNewAFCalculator". Then I will submit another job mentioning this parameter. What is the significance of the new QUAL calculated from "usenewAFcalculator" option?

    Next step, I am running VQSR for the raw VCF generated for my exome samples as per the GATK best practices. For annotations of variants,
    1) I read that "DP-Depth of Coverage" should not be used for exome datasets.
    2) Also, the "InbreedingCoeff" requires at least 10 samples to be computed. I have 50 samples (25 families), but there is another line mentioning that I should omit this annotation
    - if I have fewer samples or
    - if I have closely related samples (such as a family). In my case, I have 25 families with different relationships aforementioned.
    Therefore, I omitted following options DP and InbreedingCoeff from my command. Is my understanding correct?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
Sign In or Register to comment.