We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Heterozygous X variants observed in male samples called by HaplotypeCaller in normal VCF mode

Arvand88Arvand88 Member
edited July 2017 in Ask the GATK team

Hello GATK team,

I have called variants with HaplotyeCaller in the whole exome data of 6 people (4 affected males and 2 unaffected). All 4 male patients are heterozygous for three variants in a gene located on the X chormosome. One of the unaffected who is the father of one patient is homozygous for reference allele while the other unaffected who is the mother of two of patients is heterozygous for the same variants (which makes sense)
I have checked the bamout from HC and it confirms what I see in the VCF. You can see the command I used below.
why does the caller decided to call the sons of the unaffected mother heterozygous? Can't HaplotypeCaller distinguish between male and female samples? Moreover, the number of reads showing the ALT allele is more than the REF in all three variant locations so I am confused about what the actual status of patients is.

-T HaplotypeCaller \ 
-R ucsc.hg19.fasta \ 
-I recalibrated_reads_final.bam \ 
--genotyping_mode DISCOVERY \ 
-bamout bamout.bam \
--dbsnp dbsnp_138.hg19.vcf \
-A Coverage -A TandemRepeatAnnotator -A QualByDepth -A VariantType \
-o raw.vcf


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    No, HaplotypeCaller does not have any logic to distinguish male/female or special-case the sex chromosomes. If you want sex-aware calls, it's up to you to run the program with the correct ploidy setting on the sex chromosomes depending as a function of the sex of your samples.
  • Arvand88Arvand88 Member
    edited July 2017

    Dear @Geraldine_VdAuwera

    I ran HC again and added the following arguments:

    -ploidy 1 
    -L ChrX -L ChrY -L ChrM

    However, none of the three variants I was hoping to see again got called this time. I do not understand what could be different because in diploid mode HC detected some reads with an ALT allele that were mapped to this certain location and called them as heterozygous. But now these reads are simply ignored? I do not think they can be rendered sequencing errors because I can see at least 30 reads showing the ALT allele in the bamout output.

    I have faced a somewhat similar situation in autosomal chromosomes as well. Two cousins, one is called heterozygous and the other is called homozygous while in the bamout the number of reads showing the ALT allele is 30 and 20 respectively out of about 180 total reads.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    If you're joint calling these samples you should see genotype stats for the ref calls; that might shed some light on what's happening here. Without that it's impossible for me to comment.
Sign In or Register to comment.