The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.2 is now available at
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.

Bug with multi-sample UnifiedGenotyper

sicottesicotte Member
edited February 2013 in Ask the GATK team

I am running into a bug with an older version of GATK. Calls for 1bp insertions sometimes have bad genotypes, e.g. I see 11 reference reads and 0 insertions and the genotype is 1/1


In one recent project, 72 out of 4069 1bp insertions are of this pattern.

When I look in IGV at the realigned BAM's, I see that the allele counts are OK (did that on multiple variants and samples).. but that the genotype call is not. (note the likelyhoods match the genotype call).

Can someone run the following command on a recent multi-sample vcf? to see if the bug still exists?

The command prints 1bp insertions, limits to indels not near runs of single nucleotides (may be false positives), and looks for any homozygous calls.. finally the perl looks for the pattern above

gawk -F "\t" '$0 ~ /^#.*/{next}(length($5)==(length($4)+1)){print $0}'  group_1.variants.filter.vcf | grep 'HRun=0' | gawk -F "\t" '{for(i=10;i<=NF;i++) {if($i ~ /1\/1/) {print $i}}}' | perl -ane 'if($_ =~ /1\/1\:([0-9])+\,0\:/){if($1>5) {print $_;}}'

I get something like


Sorry, this is part of a large workflow and updating to a more GATK version is a large endeavour that I only want to undertake if the bug is fixed.

These were the main parameters of the calling.


Here is the header information

##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[/..chr5.cleaned.bam] read_buffer_size=null phone_home=NO_ET gatk_key=/projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/1.7/ read_filter=[] intervals=[] excludeIntervals=null i
nterval_set_rule=UNION interval_merging=ALL reference_sequence=/data2/bsi/reference/sequence/human/ncbi/37.1/allchr.fa nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null d
ownsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=-1 defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=
4 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false logging_leve
l=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT heterozygosity=0.0010 pcr_error_rate=1.0E-4 genotyping_mode=DISCOVERY output_mode=EMIT_VARIANTS_ONLY standard_min_confide
nce_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=30.0 noSLOD=false annotateNDA=false alleles=(RodBinding name= source=UNBOUND) min_base_quality_score=17 max_deletion_fraction=0.05
max_alternate_alleles=5 min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indel_heterozygosity=1.25E-4 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 noBandedInde
l=false indelDebug=false ignoreSNPAlleles=false dbsnp=(RodBinding name= source=UNBOUND) comp=[]
erStub debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_mismatching_base_and_quals=false"
##VariantAnnotator="analysis_type=VariantAnnotator input_file=[chr5.cleaned.bam] read_buffer_size=null phone_home=NO_ET gatk_key=/projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/1.7/ read_filter=[] intervals=[variants.chr5.raw.vcf] excludeIntervals=
null interval_set_rule=UNION interval_merging=ALL reference_sequence=/data2/bsi/reference/sequence/human/ncbi/37.1/allchr.fa nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=
null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=-1 defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_t
hreads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false loggi
ng_level=INFO log_to_file=null help=false variant=(RodBinding name=variant source=variants.chr5.raw.vcf) snpEffFile=(RodBinding name= source=UNBOUND) dbsnp=(RodBinding name=dbsnp source=/data2/bsi/reference/annotation/dbSNP/hg19/dbsnp_135.hg19.
vcf.gz) comp=[] resource=[]
ub annotation=[QualByDepth, MappingQualityRankSumTest, ReadPosRankSumTest, HaplotypeScore, DepthOfCoverage, MappingQualityZero, DepthPerAlleleBySample, RMSMappingQuality, FisherStrand, ForwardReverseAlleleCoun
ts] excludeAnnotation=[] group=[] expression=[] useAllAnnotations=false list=false alwaysAppendDbsnpId=false MendelViolationGenotypeQualityThreshold=0.0 requireStrictAlleleMatch=false filter_mismatching_base_a
Post edited by Geraldine_VdAuwera on

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    It is extremely likely that this bug has been fixed. Feel free to test the latest version on a problematic region of your bam file.

  • Given the recent post about wanting to extensively test the 2.4 release, I think that running the very simple 1-liner I gave is a quick way to test for a bug. I did not see any post about it this type of bug, so it may not be fixed.
    From my side, I'd have a lot of code to change to upgrade to the new commands and options in 2.3 . Can you please run the command on a VCF you have already generated, it'll take one minute tops.

    I did a little more digging and this affects both SNV's and INDELS (0.4% of all homozygous alternate alleles are affected). I just happened to be looking at indels.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hugues, I understand your point of view but we really can't take requests for specific tests. We have an established workflow and plenty of work on our plate trying to get release 2.4 ready to ship. If you show that the bug persists in the latest version, we will be happy to fix it of course.

  • I just ran the above script on a multi-sample vcf (14 whole genome samples at 30-50x coverage per sample) generated by GATK unified genotyper 2.39. No lines were flagged. It looks like the bug was fixed.

Sign In or Register to comment.