CatVariants does not combine header

rcholicrcholic DenverMember Posts: 68

Below is the command:

java -cp $CLASSPATH/GenomeAnalysisTK.jar org.broadinstitute.sting.tools.CatVariants \
-R GATK_ref/hg19.fasta \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-1.vcf \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-2.vcf \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-3.vcf \
-out ../GATK/VQSR/parallel_batch/combined_raw.snps_indels.vcf \
-log ../GATK/VQSR/parallel_batch/log/combined.log \
-assumeSorted

After this, the combined_raw.snps_indels.vcf file only contains the header from raw.snps_indels-1.vcf, what might be wrong?

Best Answer

Answers

  • erikfaserikfas Member Posts: 9

    A related issue I had was that when I was trying to concatenate two VCFs from the same sample, one containing the (filtered) variants and one with non-variants, I got an error saying that the FS filter field wasn't in the header. This was because I had set the non-filtered VCF as first input, making the script take the header from that file, which of course didn't have a FS filter field (because no VariantFiltration had been run on it). It was easily solved by just reversing the input ordering, making the script take the most complete header available. Just an FYI if somebody else runs into a similar problem!

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,305 admin

    FYI, the behavior of CatVariants regarding headers is documented in the tool doc.

    More importantly, this tool is not appropriate for the use you're making of it. As noted above, the tool expects that the input VCFs represent non-overlapping intervals. The way you're using it, that expectation is not satisfied and the output vcf will most probably not be sorted correctly. You should be using CombineVariants for this.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.