Extracting consensus variants from a VCF with 27 RNA-seq samples from the same genotype
Is there a tool, or recommended best practice for generating a consensus set of variants from multiple samples of the same genotype? In short I have 27 RNA libraries from different individuals and different tissues, and different sequencing lanes, but all from the same genotype, and I analyzed them following the RNA best practices listed and using the gVCF/HaplotypeCaller (I understand this is unsupported, but it seemed the most appropriate). Then end result is a VCF with 27 “columns” for each SNP, one for each sample (for instance root_1, root_2, leaf_1, leaf_2, etc). I would like to generate a VCF with a single column, combining the information for all the samples. Based on the website descriptions, it seems like CombineVariants is not appropriate, and I cannot see a way to do it with SelectVariants. It is perhaps complex as, for a given SNP, different samples, although from the same genotype, may have different alleles, as they are from different individuals – I would prefer to select the most common variant if possible. My downstream goal is to generate a new reference genome for the genotype that all of the 27 samples are derived form.