The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

UG calling a SNP as heterozygous when it (most likely) isn't

The title kind of explains the situation, but basically I've got a SNP that shows up in IGV that I would call homozygous that the Unified Genotyper has labeled as heterozygous. The total read depth is 35, 32 of which were called as a SNP (A-->T), 2 were called the reference base (A), and one read contained a G. I went through your article describing why a SNP visible in IGV might not get called, and none of those five questions explained this situation. I didn't alter the --hets option at all either. Any help you might be able to offer would be greatly appreciated.

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there, can you tell me what version you are using and what is your command line?

  • JlowryJlowry Member
    edited January 2013

    I'm using v2.1-11-g13c0244, and here's my command line usage (I'm using it via Galaxy wrapper, not sure if that makes a difference):

    java -XX:DefaultMaxRAMFraction=1 -XX:+UseParallelGC -jar /galaxy/galaxy-dist/tool-data/shared/jars/gatk/GenomeAnalysisTK.jar -T UnifiedGenotyper --num_threads 4 --out /galaxy/galaxy-dist/database/files/001/dataset_1046.dat --metrics_file /galaxy/galaxy-dist/database/files/001/dataset_1047.dat -et STANDARD --genotype_likelihoods_model BOTH --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 20.0 --pedigreeValidationType STRICT --read_filter MappingQuality --min_mapping_quality_score 10 --interval_set_rule UNION --downsampling_type NONE --baq OFF --baqGapOpenPenalty 40.0 --defaultBaseQualities -1 --validation_strictness STRICT --interval_merging ALL --p_nonref_model EXACT --heterozygosity 0.001 --pcr_error_rate 0.0001 --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY --min_base_quality_score 15 --max_deletion_fraction 0.05 --max_alternate_alleles 5 --min_indel_count_for_genotyping 5 --indel_heterozygosity 0.000125 --indelGapContinuationPenalty 10 --indelGapOpenPenalty 45 --indelHaplotypeSize 80 -I /tmp/tmp-gatk-e2zCxM/gatk_input_0.bam -R /tmp/tmp-gatk-e2zCxM/gatk_input.fasta
  • Thanks for the help, I've made the upgrade and am waiting to see if this resolves the issue. Regardless, it seems like I'd want to check the GQ for possible het var calls, is there a score below which you would say, "Hmm, that seems questionable"?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    It's all relative to the quality of your dataset. One way to empirically derive a threshold is to plot the distribution of GQs for your callset and look in closer detail at a few subsets of calls along the distribution.

  • That makes sense. Thanks for the suggestion, I'll give it a shot.

Sign In or Register to comment.