We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How to keep unique sample ID when combining gvcf files?

I am working with RNA-seq data, and I need to get SNP calls for multiple samples (12). I first tried following the best practices method with the haplotypecaller, and later merging my VCF files. However, I realized that when I do this, any site that is not a variant in all of my samples will be marked as missing data for the non-variant samples. This is a problem because I need to know which of these samples are actually missing and which of these samples match the reference. I don't think the gVCF mode of haplotypecaller is completely supported for RNA-seq yet, but a paper that is doing similar work to mine has used it and it seemed to work well for them. Because of this, I gave it a try, but I keep coming to the same problem. When I combine my .g.vcf files, all of my samples merge. I need to make a combined vcf file with all of my sample id's remaining unique. Is there a way to do this? Thank you very much for your help and I'm sorry if this has been asked before, I have done a lot of searching but can't seem to find this question.



  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    The GVCF workflow for RNA-seq data has not yet been validated (as far as I know), but I think in this case, it would be worth trying it out. Just make sure to validate your results at the end :smile:

    For the sample name issue, you can simply change the sample name in the GVCF manually.

    Let us know how things work out!


Sign In or Register to comment.