To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

UG calling a SNP as heterozygous when it (most likely) isn't

The title kind of explains the situation, but basically I've got a SNP that shows up in IGV that I would call homozygous that the Unified Genotyper has labeled as heterozygous. The total read depth is 35, 32 of which were called as a SNP (A-->T), 2 were called the reference base (A), and one read contained a G. I went through your article describing why a SNP visible in IGV might not get called, and none of those five questions explained this situation. I didn't alter the --hets option at all either. Any help you might be able to offer would be greatly appreciated.

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there, can you tell me what version you are using and what is your command line?

  • JlowryJlowry Member
    edited January 2013

    I'm using v2.1-11-g13c0244, and here's my command line usage (I'm using it via Galaxy wrapper, not sure if that makes a difference):

    java -XX:DefaultMaxRAMFraction=1 -XX:+UseParallelGC -jar /galaxy/galaxy-dist/tool-data/shared/jars/gatk/GenomeAnalysisTK.jar -T UnifiedGenotyper --num_threads 4 --out /galaxy/galaxy-dist/database/files/001/dataset_1046.dat --metrics_file /galaxy/galaxy-dist/database/files/001/dataset_1047.dat -et STANDARD --genotype_likelihoods_model BOTH --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 20.0 --pedigreeValidationType STRICT --read_filter MappingQuality --min_mapping_quality_score 10 --interval_set_rule UNION --downsampling_type NONE --baq OFF --baqGapOpenPenalty 40.0 --defaultBaseQualities -1 --validation_strictness STRICT --interval_merging ALL --p_nonref_model EXACT --heterozygosity 0.001 --pcr_error_rate 0.0001 --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY --min_base_quality_score 15 --max_deletion_fraction 0.05 --max_alternate_alleles 5 --min_indel_count_for_genotyping 5 --indel_heterozygosity 0.000125 --indelGapContinuationPenalty 10 --indelGapOpenPenalty 45 --indelHaplotypeSize 80 -I /tmp/tmp-gatk-e2zCxM/gatk_input_0.bam -R /tmp/tmp-gatk-e2zCxM/gatk_input.fasta
  • Thanks for the help, I've made the upgrade and am waiting to see if this resolves the issue. Regardless, it seems like I'd want to check the GQ for possible het var calls, is there a score below which you would say, "Hmm, that seems questionable"?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    It's all relative to the quality of your dataset. One way to empirically derive a threshold is to plot the distribution of GQs for your callset and look in closer detail at a few subsets of calls along the distribution.

  • That makes sense. Thanks for the suggestion, I'll give it a shot.

Sign In or Register to comment.