If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

a problem about gVCF

wubinwubin ChinaMember

a position existed in taregt region file "./target.bed" , didn't exist in gVCF file, but after GenotypeGVCFs, a SNP turned up at this position

I'm running the "HaplotypeCaller" walker to generate a GVCF file, the commandline was as follows:

java -Xmx15g \
-jar ./GATK/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R ./hg19/ucsc.hg19.fasta \
-I ./output.recal.cleaned.bam \
--dbsnp ./Data/dbsnp_138.hg19.excluding_sites_after_129.vcf \
--emitRefConfidence GVCF \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-L ./target.bed \

-o ./SNP_Indel_HaplotypeCaller.g.vcf

and then I used "GenotypeGVCFs" to generate a vcf file which contains only variants. the commandline was as follows:

java -Xmx10g -jar ./GATK/GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R ./hg19/ucsc.hg19.fasta \
--variant ./SNP_Indel_HaplotypeCaller.g.vcf \
-stand_call_conf 30 \
-stand_emit_conf 10 \

-o ./pedi_merged.vcf

In the file "pedi_merged.vcf", I found many variants which cannot be found in the corresponding gVCF file,such as


chr10 126089434 . G A 36.78 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.736;ClippingRankSum=-7.360e-01;DP=3;FS=0.000;GQ_MEAN=26.00;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-7.360e-01;NCC=0;QD=12.26;ReadPosRankSum=0.736;SOR=1.179 GT:AD:DP:GQ:PL 0/1:1,2:3:26:65,0,26

this SNP can not be found in file "SNP_Indel_HaplotypeCaller.g.vcf", in the file "SNP_Indel_HaplotypeCaller.g.vcf", we can see

chr10 126089432 . G . . END=126089433 GT:DP:GQ:MIN_DP:PL 0/0:4:12:4:0,12,139

chr10 126089435 . T . . END=126089437 GT:DP:GQ:MIN_DP:PL 0/0:5:15:5:0,15,171

we can see that not only the SNP, even the position "chr10 126089434" was not present in the gVCF file. while after "GenotypeGVCFs ", we can get a SNP which had no information in the corresponding gVCF file

when I used the "HaplotypeCaller" walker to generate a gVCF file, I used the "-L ./target.bed " argument. the file " ./target.bed " contained the position "chr10 126089434",


chr10 126089161 126089800

So we can see that a position existed in "./target.bed" , didn't exist in gVCF file, but after GenotypeGVCFs, a SNP turned up at this position ! can anyone tell me what's wrong with my commandline or there are some other problem about GATK "HaplotypeCaller "?

btw, my GATK version is "The Genome Analysis Toolkit (GATK) v3.3-0-g37228af"



Sign In or Register to comment.