Combine Variants from different sample variant files


Hi ,

I have two VCF files 1000G SNPs (Bi-allelic) and MySAMPLE variant SNPs (Bi-allelic). Both the files contain non-overlapping samples.

I wanna generate a combined VCF file containing all samples (1000G + MYSAMPLE) with the intersecting/overlapping/common sites present in both VCF files.

I tried SelectVariants --concordance option, my command-line was:

Command-line 1:

java -Xmx8g -jar $GATK -T SelectVariants \
-nt 10 \
-R $REF \
-V $MYSAMPLE/VQSR_PHASE2_snp99.5-Combine_Biallelic-MAF-0.01.recode.vcf \
--concordance $VAR1/ALL.WGS.chr.phase3_biallelic-concordMYSAMPLE.vcf \
-o $OUTPUT1/VQSR_PHASE2_snp99.5_Biallelic-MAF-0.01-CONCORD-1KGPnew.vcf \
-selectType SNP \
-restrictAllelesTo BIALLELIC

Command-line 2:

java -Xmx8g -jar $GATK -T SelectVariants \
-nt 10 \
-R $REF \
-V $VAR/ALL.WGS.chr.phase3_biallelic.vcf \
--concordance $MYSAMPLE1/VQSR_PHASE2_snp99.5_Biallelic-MAF-0.01-CONCORD-1KGPnew.vcf \
-o $OUTPUT/ALL.WGS.chr.phase3_biallelic-concordMYSAMPLE.vcf \
-selectType SNP \
-restrictAllelesTo BIALLELIC

Here, I was expecting the same number of variants in the above mentioned two resultant concordant files with the different sample genotypes but it was odd :
ALL.WGS.chr.phase3_biallelic-concordMYSAMPLE.vcf: 7533554
VQSR_PHASE2_snp99.5_Biallelic-MAF-0.01-CONCORD-1KGPnew.vcf: 7533549

and later by combining the VCF file with CombineVariants the variant number further decreased to 7511112, my command-line was:

java -Xmx8g -jar $GATK -T CombineVariants \
-nt 10 \
-R $REF \
-V $VAR/ALL.WGS.chr.phase3_biallelic-concordMYSAMPLE.vcf \
-V $MYSAMPLE/VQSR_PHASE2_snp99.5_Biallelic-MAF-0.01-CONCORD-1KGPnew.vcf \
-genotypeMergeOptions UNIQUIFY \
-o $OUTPUT/ALL_PHASE2_snp99.5_Biallelic-MAF-0.01-COMBINE-1KGP.vcf

Could you please help me in correctly using the command-line to solve my problem.



