We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Using HC GVCF mode for determining rate of heterozygosity

msrmsr Stowers Institute for Medical ResearchMember

Hi GATK team,

I was hoping I could get some insight on determining rate of heterozygosity from a gvcf file. We have three diploid lizard samples. Each was run through our GATK pipeline using HC in GVCF mode with -ERC GVCF followed by joint genotyping using all three samples.

I want to determine the rate of heterozygosity in each lizard by counting the number of heterozygous sites and dividing by the number of callable sites (i.e not './.') for every position in the genome (whether or not it is a variant).

However after reading a response to a question on the forum http://gatkforums.broadinstitute.org/discussion/4017/what-is-a-gvcf-and-how-is-it-different-from-a-regular-vcf,
"Short answer is that you shouldn't be looking at the genotype calls emitted by HC in GVCF mode. Longer answer, the gVCF is meant to be only an intermediate and the genotype calls are not final"

I am note sure this is the correct way to count heterozygous sites.

From the HC gvcf file, could I extract the number of callable sites from the GCVFBlocks and variant entries in the file to get the total number of callable sites (excluding './.' entries)? Then count the number of heterozygouse genotypes from the joint genotyping gvcf output?

HC command:

java -Xmx3g -jar /home/dut/bin/GATK/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ --variant_index_type LINEAR \ --variant_index_parameter 128000 \ -ERC GVCF \ -R reference.fa \ -I sample001.bam \ -stand_call_conf 30 \ -stand_emit_conf 30 \ -mbq 17 \ -o sample001.rawVAR.vcf)

JointGenotyping command:
java -Xmx3g -jar /home/dut/bin/GATK/GenomeAnalysisTK.jar \ -T GenotypeGVCFs \ --includeNonVariantSites \ -ploidy 2 \ -R reference.fa \ --variant sample8450.rawVAR.vcf \ --variant sample003.rawVAR.vcf \ --variant sample001.rawVAR.vcf \ -o jg_sample001_sample003_sample8450.vcf

We are using GATK version 3.3.0

Any suggestions are appreciated! Thank you for your time.


Best Answer


  • msrmsr Stowers Institute for Medical ResearchMember

    Thank you Sheila,
    Yes that makes sense and worked for me.


Sign In or Register to comment.