The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Joint variant calling using RNA-Seq data
Dear GATK Team,
I realized that the Best Practices does not recommend the using the joing genotyping for cohort analysis using RNA-Seq data. Since I didn't see a clear alternative recommendation (or I may well have missed key points), I tried the ERC mode of the HaplotypeCaller. I was trying to find out fixed SNPs between two populations with 11 and 9 individuals, respectively. The number of SNPs (after window filtering, ~10,000) that I found seemed to be reasonable, compared to the numbers of segregating SNPs (after window filtering, ~160,000) between individual pairs from the two populations.
I expected the fixed/segregating SNPs between the two populations to exist between each of the individual pairs, since the segregating SNPs should be subsets of SNPs between individual pairs. However, I found that about half of the SNPs between populations were not found in one of invidual pairs that I started to look at. Originally I thought it was because window filtering for each individual sample could have got rid of valid SNPs between populations, because some SNPs in a 35bp window would no longer exist in a cluster when all individuals have been taken into consideration. So I dialed back without the window filtering steps, but still more than 40% of the segregating SNPs between populations were not found between the individual pair that I looked at.
I realized that I need to use the joint genotyping here at my own risk, but if there could be any insight into what could have happened, I would greatly appreciated it! Also, if joint genotyping is not approriate for RNA-Seq data, why is that and what alternatives may I try to call SNPs using multiple individual libraries?
Many thanks for taking time to consider my questions,