How to keep unique sample ID when combining gvcf files?

I am working with RNA-seq data, and I need to get SNP calls for multiple samples (12). I first tried following the best practices method with the haplotypecaller, and later merging my VCF files. However, I realized that when I do this, any site that is not a variant in all of my samples will be marked as missing data for the non-variant samples. This is a problem because I need to know which of these samples are actually missing and which of these samples match the reference. I don't think the gVCF mode of haplotypecaller is completely supported for RNA-seq yet, but a paper that is doing similar work to mine has used it and it seemed to work well for them. Because of this, I gave it a try, but I keep coming to the same problem. When I combine my .g.vcf files, all of my samples merge. I need to make a combined vcf file with all of my sample id's remaining unique. Is there a way to do this? Thank you very much for your help and I'm sorry if this has been asked before, I have done a lot of searching but can't seem to find this question.



  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    The GVCF workflow for RNA-seq data has not yet been validated (as far as I know), but I think in this case, it would be worth trying it out. Just make sure to validate your results at the end :smile:

    For the sample name issue, you can simply change the sample name in the GVCF manually.

    Let us know how things work out!


Sign In or Register to comment.