Could you please explain
1)Why Genotyping is performed for a group of Targeted samples together?
2)What is the effect if we group some samples with very different set of variants?
Thanks in advance,
Have a look at this article: https://www.broadinstitute.org/gatk/guide/article?id=4150
I have in a batch, a few Onco samples like Neuro balstoma, Prostrate cancer etc.
Is it OK if I group these samples for joint genotyping, considering the fact that the target genes for the diseases are different and hence the variants are also different?
You should probably not be using these variant calling tools on cancer samples. HaplotypeCaller is not able to model somatic variation appropriately. You should use a somatic caller like MuTect instead. We'll have a new version of MuTect out soon that does both SNPs and indels.
I have one more question.
Suppose I have a batch of samples for Neurology diseases, can I group samples for Roussy-Levy syndrome and Neuropathy, distal hereditary motor type IIB etc. together and do a joint genotyping (after performing HaplotypeCaller on individual samples), considering the fact that the target genes for the diseases are different and hence the variants are also different?
Is it Ok if we group samples Cystic Fibrosis samples and Goldberg-Sphrintzen Syndrome together for the joint genotyping
Glycogen Storage Disease Type 0
Yes, you can perform joint genotyping on all of these together. Have a look at the preprint of the ExAC paper, which has some discussion of the rationale for analyzing together many samples. http://biorxiv.org/content/early/2015/10/30/030338
Could you please explain how HaplotypeCaller variant calling is different for Onco Samples and other type of samples like Neurology (as explained by you earlier).
Why HaplotypeCaller is not able to model somatic variants?
It mostly comes down to the caller's expectations about allele frequency. HaplotypeCaller expects allele frequencies to conform roughly to proportions of organism ploidy. The further away, the more likely it is that the lower allele frequency is due to artifact, that sort of thing. Whereas in somatic variant calling we know that allele frequencies can be wildly different from ploidy-based expectations because of various properties of somatic samples and mutations (tumor purity, copy number etc). The new MuTect2 which will be released later this week is a hybrid of HaplotypeCaller and the original MuTect somatic variant caller, so it is basically HC adapted for modeling somatic variation.