Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

UG calling a SNP as heterozygous when it (most likely) isn't

JlowryJlowry Posts: 6Member

The title kind of explains the situation, but basically I've got a SNP that shows up in IGV that I would call homozygous that the Unified Genotyper has labeled as heterozygous. The total read depth is 35, 32 of which were called as a SNP (A-->T), 2 were called the reference base (A), and one read contained a G. I went through your article describing why a SNP visible in IGV might not get called, and none of those five questions explained this situation. I didn't alter the --hets option at all either. Any help you might be able to offer would be greatly appreciated.

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,277Administrator, GSA Member admin

    Hi there, can you tell me what version you are using and what is your command line?

    Geraldine Van der Auwera, PhD

  • JlowryJlowry Posts: 6Member
    edited January 2013

    I'm using v2.1-11-g13c0244, and here's my command line usage (I'm using it via Galaxy wrapper, not sure if that makes a difference):

    java -XX:DefaultMaxRAMFraction=1 -XX:+UseParallelGC -jar /galaxy/galaxy-dist/tool-data/shared/jars/gatk/GenomeAnalysisTK.jar -T UnifiedGenotyper --num_threads 4 --out /galaxy/galaxy-dist/database/files/001/dataset_1046.dat --metrics_file /galaxy/galaxy-dist/database/files/001/dataset_1047.dat -et STANDARD --genotype_likelihoods_model BOTH --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 20.0 --pedigreeValidationType STRICT --read_filter MappingQuality --min_mapping_quality_score 10 --interval_set_rule UNION --downsampling_type NONE --baq OFF --baqGapOpenPenalty 40.0 --defaultBaseQualities -1 --validation_strictness STRICT --interval_merging ALL --p_nonref_model EXACT --heterozygosity 0.001 --pcr_error_rate 0.0001 --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY --min_base_quality_score 15 --max_deletion_fraction 0.05 --max_alternate_alleles 5 --min_indel_count_for_genotyping 5 --indel_heterozygosity 0.000125 --indelGapContinuationPenalty 10 --indelGapOpenPenalty 45 --indelHaplotypeSize 80 -I /tmp/tmp-gatk-e2zCxM/gatk_input_0.bam -R /tmp/tmp-gatk-e2zCxM/gatk_input.fasta
    
    Post edited by Geraldine_VdAuwera on
  • JlowryJlowry Posts: 6Member

    Thanks for the help, I've made the upgrade and am waiting to see if this resolves the issue. Regardless, it seems like I'd want to check the GQ for possible het var calls, is there a score below which you would say, "Hmm, that seems questionable"?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,277Administrator, GSA Member admin

    It's all relative to the quality of your dataset. One way to empirically derive a threshold is to plot the distribution of GQs for your callset and look in closer detail at a few subsets of calls along the distribution.

    Geraldine Van der Auwera, PhD

  • JlowryJlowry Posts: 6Member

    That makes sense. Thanks for the suggestion, I'll give it a shot.

Sign In or Register to comment.