Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
HaplotypeCaller 3.3-0 Homozygous variant calls
I just finished running HaplotypeCaller version 3.3-0 separately on 6 exome samples with the new best practices.
java -Xmx8g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19/hg19_Ordered.fa -I K87/HG19_Analysis/K87-929_final.recalibrated_final.bam --dbsnp dbsnp_138_hg19_Ordered.vcf --pair_hmm_implementation VECTOR_LOGLESS_CACHING -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 --output_mode EMIT_VARIANTS_ONLY -gt_mode DISCOVERY --pcr_indel_model CONSERVATIVE -o ./Haplotypes_929.vcf
Many Variant sites are called as homozygous alt (1/1), but none of these sites that are processed to infer haplotype are called as homozygous alt in their PGT field, they are all called as hets, PGT=0|1. for example:
The allelic depths agree with the phased genotype but out of all 6 exomes processed, not a single 1/1 is also phased as 1|1.
I checked all output vcfs with a simple grep combo:
grep 'PGT' Haplotypes_929.vcf | grep '1/1' - | grep '0|1' - | wc -l = 19046
grep 'PGT' Haplotypes_929.vcf | grep '1/1' - | grep '1|0' - | wc -l = 79
grep 'PGT' Haplotypes_929.vcf | grep '1/1' - | grep '1|1' - | wc -l = 0
This seemed odd, but I continued with GenotyeGVCF:
java -Xmx32g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R hg19/hg19_Ordered.fa -V Haplotypes_450.vcf -V Haplotypes_452.vcf -V Haplotypes_925.vcf -V Haplotypes_926.vcf -V Haplotypes_927.vcf -V Haplotypes_929.vcf -D dbsnp_138_hg19_Ordered.vcf -ped K87/HG19_Analysis/K87_6.ped -o Haplotypes_K87_GVCFs.vcf
I'm looking at the output vcf as it's being generated and now there are homozygous alt calls but they conflict with the associated Allelic Depths:
.... GT:AD:DP:GQ:PGT:PID:PL .... 1/1:0,29:29:85:1|1:33957151_G_T:948,85,0 .....
chr1 33957152 rs4403594 T G 3166.96 . AC=12;AF=1.00;AN=12;DB;DP=99;FS=0.000;GQ_MEAN=48.50;GQ_STDDEV=27.55;MLEAC=12;MLEAF=1.00;MQ=39.65;MQ0=0;NCC=0;QD=32.32;SOR=0.693 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,9:9:27:.:.:330,27,0 1/1:0,5:5:15:.:.:141,15,0 1/1:0,29:29:85:1|1:33957151_G_T:948,85,0 1/1:0,20:20:60:.:.:722,60,0 1/1:0,24:24:71:.:.:685,71,0 1/1:0,11:11:33:.:.:366,33,0
Can you help me interpret what seems to me as conflicting results?