SNP calling from RNA-seq for population genetics analysis
My final goal is to call SNP, from RNA-seq data, for population genetics analysis but I have some problems.
For the population genetics analysis I need a vcf file with all the samples together (that in each locus has the calling info for each sample). My first attempt was to follow the GATK best practices for RNA-seq (Per-Sample Variant Calling and Filtering ) and then do the merge of vcf files but it doesn't work ( the merged vcf file doesn't have all the info per locus that the population genetics program needs).
I also tried with the EMIT_ALL_CONFIDENT_SITES but the clustered SNPs filter doesn't work.
I thought of two possible solutions:
1) Do the SNP calling with all the samples together. Although in the GATK's guide says clearly: "At the moment, we do not recommend applying the GVCF-based workflow to RNAseq data because although there is no obvious obstacle to doing so, we have not validated that configuration. Therefore, we cannot guarantee the quality of results that this would produce."
2) Do the SNP calling per sample but with and without the EMIT_ALL_CONFIDENT_SITES and then merge the results the two vcf files for each sample. In this way I would have the filtered SNPs and the reference call in one vcf file.
What do you recommend me? A third option?
thanks in advance!,