Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Bug with multi-sample UnifiedGenotyper

sicottesicotte Posts: 3Member
edited February 2013 in Ask the team

I am running into a bug with an older version of GATK. Calls for 1bp insertions sometimes have bad genotypes, e.g. I see 11 reference reads and 0 insertions and the genotype is 1/1

GT:AD:DP:DP4:GQ:PL
1/1:11,0:10:7,4,0,0:18.03:160,18,0

In one recent project, 72 out of 4069 1bp insertions are of this pattern.

When I look in IGV at the realigned BAM's, I see that the allele counts are OK (did that on multiple variants and samples).. but that the genotype call is not. (note the likelyhoods match the genotype call).

Can someone run the following command on a recent multi-sample vcf? to see if the bug still exists?

The command prints 1bp insertions, limits to indels not near runs of single nucleotides (may be false positives), and looks for any homozygous calls.. finally the perl looks for the pattern above

gawk -F "\t" '$0 ~ /^#.*/{next}(length($5)==(length($4)+1)){print $0}'  group_1.variants.filter.vcf | grep 'HRun=0' | gawk -F "\t" '{for(i=10;i<=NF;i++) {if($i ~ /1\/1/) {print $i}}}' | perl -ane 'if($_ =~ /1\/1\:([0-9])+\,0\:/){if($1>5) {print $_;}}'

I get something like

1/1:16,0:10:8,8,0,0:23.91:274,24,0
1/1:28,0:24:16,12,0,0:72.24:1033,72,0
1/1:17,0:14:6,11,0,0:42.14:598,42,0
1/1:18,0:18:9,9,0,0:54.15:776,54,0
1/1:19,0:16:10,9,0,0:48.16:713,48,0
1/1:9,0:2:7,2,0,0:6.02:100,6,0
1/1:18,0:18:8,10,0,0:38.82:133,39,0
1/1:28,0:28:12,16,0,0:39.54:117,40,0
...

Sorry, this is part of a large workflow and updating to a more GATK version is a large endeavour that I only want to undertake if the bug is fixed.

These were the main parameters of the calling.

genotype_likelihoods_model=BOTH 
p_nonref_model=EXACT 
genotyping_mode=DISCOVERY  
output_mode=EMIT_VARIANTS_ONLY 

Here is the header information

##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[/..chr5.cleaned.bam] read_buffer_size=null phone_home=NO_ET gatk_key=/projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/1.7/Hossain.Asif_mayo.edu.key read_filter=[] intervals=[chr5.target.bed] excludeIntervals=null i
nterval_set_rule=UNION interval_merging=ALL reference_sequence=/data2/bsi/reference/sequence/human/ncbi/37.1/allchr.fa nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null d
ownsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=-1 defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=
4 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false logging_leve
l=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT heterozygosity=0.0010 pcr_error_rate=1.0E-4 genotyping_mode=DISCOVERY output_mode=EMIT_VARIANTS_ONLY standard_min_confide
nce_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=30.0 noSLOD=false annotateNDA=false alleles=(RodBinding name= source=UNBOUND) min_base_quality_score=17 max_deletion_fraction=0.05
max_alternate_alleles=5 min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indel_heterozygosity=1.25E-4 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 noBandedInde
l=false indelDebug=false ignoreSNPAlleles=false dbsnp=(RodBinding name= source=UNBOUND) comp=[] out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWrit
erStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_mismatching_base_and_quals=false"
##VariantAnnotator="analysis_type=VariantAnnotator input_file=[chr5.cleaned.bam] read_buffer_size=null phone_home=NO_ET gatk_key=/projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/1.7/Hossain.Asif_mayo.edu.key read_filter=[] intervals=[variants.chr5.raw.vcf] excludeIntervals=
null interval_set_rule=UNION interval_merging=ALL reference_sequence=/data2/bsi/reference/sequence/human/ncbi/37.1/allchr.fa nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=
null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=-1 defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_t
hreads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false loggi
ng_level=INFO log_to_file=null help=false variant=(RodBinding name=variant source=variants.chr5.raw.vcf) snpEffFile=(RodBinding name= source=UNBOUND) dbsnp=(RodBinding name=dbsnp source=/data2/bsi/reference/annotation/dbSNP/hg19/dbsnp_135.hg19.
vcf.gz) comp=[] resource=[] out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterSt
ub annotation=[QualByDepth, MappingQualityRankSumTest, ReadPosRankSumTest, HaplotypeScore, DepthOfCoverage, MappingQualityZero, DepthPerAlleleBySample, RMSMappingQuality, FisherStrand, ForwardReverseAlleleCoun
ts] excludeAnnotation=[] group=[] expression=[] useAllAnnotations=false list=false alwaysAppendDbsnpId=false MendelViolationGenotypeQualityThreshold=0.0 requireStrictAlleleMatch=false filter_mismatching_base_a
nd_quals=false"
Post edited by Geraldine_VdAuwera on

Answers

Sign In or Register to comment.