Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller 3.3-0 Homozygous variant calls

PatrickJReedPatrickJReed University of ChicagoMember

I just finished running HaplotypeCaller version 3.3-0 separately on 6 exome samples with the new best practices.

java -Xmx8g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19/hg19_Ordered.fa -I K87/HG19_Analysis/K87-929_final.recalibrated_final.bam --dbsnp dbsnp_138_hg19_Ordered.vcf --pair_hmm_implementation VECTOR_LOGLESS_CACHING -ERC GVCF -variant_index_type LINEAR -variant_index_parameter 128000 --output_mode EMIT_VARIANTS_ONLY -gt_mode DISCOVERY --pcr_indel_model CONSERVATIVE -o ./Haplotypes_929.vcf

Many Variant sites are called as homozygous alt (1/1), but none of these sites that are processed to infer haplotype are called as homozygous alt in their PGT field, they are all called as hets, PGT=0|1. for example:

GT:AD:DP:GQ:PGT:PID:PL:SB 1/1:0,29,0:29:93:0|1:121483392_C_G:1331,93,0,1331,93,1331:0,0,13,16

The allelic depths agree with the phased genotype but out of all 6 exomes processed, not a single 1/1 is also phased as 1|1.

I checked all output vcfs with a simple grep combo:
grep 'PGT' Haplotypes_929.vcf | grep '1/1' - | grep '0|1' - | wc -l = 19046
grep 'PGT' Haplotypes_929.vcf | grep '1/1' - | grep '1|0' - | wc -l = 79
grep 'PGT' Haplotypes_929.vcf | grep '1/1' - | grep '1|1' - | wc -l = 0

This seemed odd, but I continued with GenotyeGVCF:

java -Xmx32g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R hg19/hg19_Ordered.fa -V Haplotypes_450.vcf -V Haplotypes_452.vcf -V Haplotypes_925.vcf -V Haplotypes_926.vcf -V Haplotypes_927.vcf -V Haplotypes_929.vcf -D dbsnp_138_hg19_Ordered.vcf -ped K87/HG19_Analysis/K87_6.ped -o Haplotypes_K87_GVCFs.vcf

I'm looking at the output vcf as it's being generated and now there are homozygous alt calls but they conflict with the associated Allelic Depths:

.... GT:AD:DP:GQ:PGT:PID:PL .... 1/1:0,29:29:85:1|1:33957151_G_T:948,85,0 .....

Full Line:
chr1 33957152 rs4403594 T G 3166.96 . AC=12;AF=1.00;AN=12;DB;DP=99;FS=0.000;GQ_MEAN=48.50;GQ_STDDEV=27.55;MLEAC=12;MLEAF=1.00;MQ=39.65;MQ0=0;NCC=0;QD=32.32;SOR=0.693 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,9:9:27:.:.:330,27,0 1/1:0,5:5:15:.:.:141,15,0 1/1:0,29:29:85:1|1:33957151_G_T:948,85,0 1/1:0,20:20:60:.:.:722,60,0 1/1:0,24:24:71:.:.:685,71,0 1/1:0,11:11:33:.:.:366,33,0

Can you help me interpret what seems to me as conflicting results?




  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Patrick,

    The PGT tags are always shown as heterozygous in the GVCF as an intermediate step, what counts is the final result in the final VCF. I know that seems confusing but it's a design decision for technical reasons.

    As for why some of your hom-var calls have allelic ratios that are not clear cut, it's hard to say without seeing the data. The most frequent reason is when there's evidence of bias that leads the genotyper to discount some reads.

Sign In or Register to comment.