This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Merge VCF Files
BACKGROUND: I am working with a public data set that consists of VCF files. ( I cannot go back upstream in the process). VCF files are broken out by patient sample. And broken out further by chromosome for 0/0 calls with NON_REF listed as the ALT. The variant calls 0/1 and 1/1 and so forth are in a separate VCF file for each patient for variant calls listed across the entire genome. I concatenated all the files for each patient. So for each patient, the ALT for a 0/0 call is NON_REF and the ALT for a variant call is always listed as a value, such as "G" or "TT." Now, I wish to merge my 5000 patient samples into a single VCF file.
1. I went back to an older version of GATK 3.5 and used CombineVariants and got flagged with this message:
ERROR MESSAGE: CombineVariants should not be used to merge gVCFs produced by the HaplotypeCaller; use CombineGVCFs instead
- I also tried GATK4 and used CombineGVCFs and got flagged with this message:
ERROR MESSAGE: The list of input alleles must contain as an allele but that is not the case at position 15274; please use the Haplotype Caller with gVCF output to generate appropriate records
QUESTION: How do I solve this and merge my files? Is there a VCF merge function that can handle a mix of calls that sometimes list NON_REF as the ALT and sometimes list an actual value for ALT?
P.S. Bcftools will not let me do this, but vcf-tools merge will handle this, but it is very slow. I am hoping to use GATK.