If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Combine Variants from different sample variant files


Hi ,

I have two VCF files 1000G SNPs (Bi-allelic) and MySAMPLE variant SNPs (Bi-allelic). Both the files contain non-overlapping samples.

I wanna generate a combined VCF file containing all samples (1000G + MYSAMPLE) with the intersecting/overlapping/common sites present in both VCF files.

I tried SelectVariants --concordance option, my command-line was:

Command-line 1:

java -Xmx8g -jar $GATK -T SelectVariants \
-nt 10 \
-R $REF \
-V $MYSAMPLE/VQSR_PHASE2_snp99.5-Combine_Biallelic-MAF-0.01.recode.vcf \
--concordance $VAR1/ALL.WGS.chr.phase3_biallelic-concordMYSAMPLE.vcf \
-o $OUTPUT1/VQSR_PHASE2_snp99.5_Biallelic-MAF-0.01-CONCORD-1KGPnew.vcf \
-selectType SNP \
-restrictAllelesTo BIALLELIC

Command-line 2:

java -Xmx8g -jar $GATK -T SelectVariants \
-nt 10 \
-R $REF \
-V $VAR/ALL.WGS.chr.phase3_biallelic.vcf \
--concordance $MYSAMPLE1/VQSR_PHASE2_snp99.5_Biallelic-MAF-0.01-CONCORD-1KGPnew.vcf \
-o $OUTPUT/ALL.WGS.chr.phase3_biallelic-concordMYSAMPLE.vcf \
-selectType SNP \
-restrictAllelesTo BIALLELIC

Here, I was expecting the same number of variants in the above mentioned two resultant concordant files with the different sample genotypes but it was odd :
ALL.WGS.chr.phase3_biallelic-concordMYSAMPLE.vcf: 7533554
VQSR_PHASE2_snp99.5_Biallelic-MAF-0.01-CONCORD-1KGPnew.vcf: 7533549

and later by combining the VCF file with CombineVariants the variant number further decreased to 7511112, my command-line was:

java -Xmx8g -jar $GATK -T CombineVariants \
-nt 10 \
-R $REF \
-V $VAR/ALL.WGS.chr.phase3_biallelic-concordMYSAMPLE.vcf \
-V $MYSAMPLE/VQSR_PHASE2_snp99.5_Biallelic-MAF-0.01-CONCORD-1KGPnew.vcf \
-genotypeMergeOptions UNIQUIFY \
-o $OUTPUT/ALL_PHASE2_snp99.5_Biallelic-MAF-0.01-COMBINE-1KGP.vcf

Could you please help me in correctly using the command-line to solve my problem.



Sign In or Register to comment.