Merging VCF files after HaplotypeCaller->VariantFiltration

thkitapci

I have a RNA dataset that I am calling variants (10 sample). After running gatk4 on all samples independently I want to merge the resulting VCF files for the downstream analysis that requires a multi sample VCF file. I tried to use vcf-merge and bcftools I realize that the INFO field of only one sample is included in the merged file.

How can I merge VCF files without loosing any information about samples ?




  Tiffany_at_Broad
  • I have to "be around for a little while longer" before I can post links, but there are two pages/posts that may help you if you search for these titles:

    1) My previous forum post, titled, "Correct GATK4 tools to use for combining scattered gVCFS and VCFs from multiple calls" - discussion 5653.

    2) A post by delangel of Broad, titled, "Combining variants from different files into one" - discussion 53.

    As Tiffany suggests, you can use CombineVariants from GATK3 to achieve this.
  thkitapci

    Hi Tiffany and bramblepuss,

    Thanks for your replies and thanks for the links to other posts.

    It seems CombineVariants should work for me. I have generated my VCF files with GATK4 can I use CombineVariants from GATK3 to combine VCF files generated with GATK4 ?


  Tiffany_at_Broad

    Yes, this should be fine @thkitapci
    You could see if GatherVCFs from GATK4 will work, but I've seen folks report that it cannot concatenate unsorted VCFs or merge different INFO fields correctly.

