Hi there GATK team!
I'm new in this field. I have a question about the strategy that I should adopt. I have 65 human WES, but it's not a simple case of cases vs control because I have different types of cases (different phenotypes) that I want to analyze in separate groups. I don’t have 30 samples of each phenotype, so I don’t know if I should do Hard Filtering or VQSR.
What would be the right way to go?
Thanks so much in advance!
We recommend that you have at least 30 exomes for VQSR. If you have fewer than that then we recommend hard filtering ad you can find more info on that here: https://software.broadinstitute.org/gatk/documentation/article.php?id=2806.
If for some phenotype it is on the borderline of that threshold then it will be a more of a judgment call. Please try it and let me know if you face any issues.
Because the Joint Genotyping should be separate for each phenotype right?
You can do joint genotyping on your entire set of samples. Here is a doc that gives you more info on why you should do join genotyping.
Sorry, I don't think I'm explaining my doubt correctly.
If I do Joint Genotyping on my entire set of samples, can I then separate then in the different subsets (different phenotypes)? Or the Joint Genotyping should be for each subset? Because I’ll obtain a multi-sample vcf, but the filter values correspond to all samples, not to the subsets.
Are you working on somatic or germline variants?
If it is somatic, then we recommend you do not do joint genotyping. if it is germline, you can find info in this thread.
In that case, you should do joint genotyping for all the 31 samples together(as explained in this thread) and then use SelectVariants to extract one or more samples from a callset based on either a complete sample name or a pattern match.
VQSR is used to prepare the vcf file for the filtering process, while SelectVariants is used to filter the variants. Hence, you should use VQSR before SelectVariants.