Joint variant calling using RNA-Seq data
Dear GATK Team,
I realized that the Best Practices does not recommend the using the joing genotyping for cohort analysis using RNA-Seq data. Since I didn't see a clear alternative recommendation (or I may well have missed key points), I tried the ERC mode of the HaplotypeCaller. I was trying to find out fixed SNPs between two populations with 11 and 9 individuals, respectively. The number of SNPs (after window filtering, ~10,000) that I found seemed to be reasonable, compared to the numbers of segregating SNPs (after window filtering, ~160,000) between individual pairs from the two populations.
I expected the fixed/segregating SNPs between the two populations to exist between each of the individual pairs, since the segregating SNPs should be subsets of SNPs between individual pairs. However, I found that about half of the SNPs between populations were not found in one of invidual pairs that I started to look at. Originally I thought it was because window filtering for each individual sample could have got rid of valid SNPs between populations, because some SNPs in a 35bp window would no longer exist in a cluster when all individuals have been taken into consideration. So I dialed back without the window filtering steps, but still more than 40% of the segregating SNPs between populations were not found between the individual pair that I looked at.
I realized that I need to use the joint genotyping here at my own risk, but if there could be any insight into what could have happened, I would greatly appreciated it! Also, if joint genotyping is not approriate for RNA-Seq data, why is that and what alternatives may I try to call SNPs using multiple individual libraries?
Many thanks for taking time to consider my questions,