Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

a problem about gVCF

wubinwubin ChinaMember

a position existed in taregt region file "./target.bed" , didn't exist in gVCF file, but after GenotypeGVCFs, a SNP turned up at this position

I'm running the "HaplotypeCaller" walker to generate a GVCF file, the commandline was as follows:

java -Xmx15g -Djava.io.tmpdir=pwd/tmp \
-jar ./GATK/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R ./hg19/ucsc.hg19.fasta \
-I ./output.recal.cleaned.bam \
--dbsnp ./Data/dbsnp_138.hg19.excluding_sites_after_129.vcf \
--emitRefConfidence GVCF \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
-L ./target.bed \

-o ./SNP_Indel_HaplotypeCaller.g.vcf

and then I used "GenotypeGVCFs" to generate a vcf file which contains only variants. the commandline was as follows:

==============================================
java -Xmx10g -Djava.io.tmpdir=pwd/tmp -jar ./GATK/GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R ./hg19/ucsc.hg19.fasta \
--variant ./SNP_Indel_HaplotypeCaller.g.vcf \
-stand_call_conf 30 \
-stand_emit_conf 10 \

-o ./pedi_merged.vcf

In the file "pedi_merged.vcf", I found many variants which cannot be found in the corresponding gVCF file,such as

==============================================

chr10 126089434 . G A 36.78 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.736;ClippingRankSum=-7.360e-01;DP=3;FS=0.000;GQ_MEAN=26.00;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-7.360e-01;NCC=0;QD=12.26;ReadPosRankSum=0.736;SOR=1.179 GT:AD:DP:GQ:PL 0/1:1,2:3:26:65,0,26

this SNP can not be found in file "SNP_Indel_HaplotypeCaller.g.vcf", in the file "SNP_Indel_HaplotypeCaller.g.vcf", we can see

chr10 126089432 . G . . END=126089433 GT:DP:GQ:MIN_DP:PL 0/0:4:12:4:0,12,139

chr10 126089435 . T . . END=126089437 GT:DP:GQ:MIN_DP:PL 0/0:5:15:5:0,15,171

we can see that not only the SNP, even the position "chr10 126089434" was not present in the gVCF file. while after "GenotypeGVCFs ", we can get a SNP which had no information in the corresponding gVCF file

when I used the "HaplotypeCaller" walker to generate a gVCF file, I used the "-L ./target.bed " argument. the file " ./target.bed " contained the position "chr10 126089434",

==============================================

chr10 126089161 126089800

So we can see that a position existed in "./target.bed" , didn't exist in gVCF file, but after GenotypeGVCFs, a SNP turned up at this position ! can anyone tell me what's wrong with my commandline or there are some other problem about GATK "HaplotypeCaller "?

btw, my GATK version is "The Genome Analysis Toolkit (GATK) v3.3-0-g37228af"

Tagged:

Answers

Sign In or Register to comment.