The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Why is HaplotypeCaller not calling this Sanger sequencing confirmed variant?
I am using HaplotypeCaller in GVF mode and GenotypeGVCFs to call some high-depth targeted sequencing data. I have some variants that have been confirmed using Sanger sequencing and I am using these variants as a gold standard to evaluate the pipeline. HaplotypeCaller is missing some of these variants and I would like to understand why.
Here is a variant in the g.vcf file that I thought should of been called:
15 75644465 . C . . END=75644465 GT:DP:GQ:MIN_DP:PL 0/0:546:0:546:0,0,1860
The allele depth is 335 C / 266 T. It looks like the variant should of been called but it is not present in the vcf file. I have attached a pileup at that site.
Here are the commands I was using with GATK 3.4-0. I have followed the best practices pipeline with the exception of duplicate removal because of the way the platform I am using works.
$JAVA -Xmx2048m -jar $GATK -T HaplotypeCaller -R $REF -I $BAM --dbsnp $DBSNP --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -L $REGION -o $GVCF
$JAVA -Xmx2048m -jar $GATK -T GenotypeGVCFs -R $REF -L $REGION -o $OUT/sample.vcf --variant $GVCF --dbsnp $DBSNP
Do you know why this variant was not called heterozygous?