Qual is a "dot" and FILTER is PASS in vcf

Hi
I have a vcf that was generated using unified genotyper using output-mode EMIT_ALL_SITES. Several positions in the vcf with ALT as "." have QUAL as "." which I understand as "Reference" with unknown Quality. Howrever, FILTER for these is set to PASS.I am wondering how this is possible? Does this mean that Unified Genotyper did not print a QUAL even though it was score high enough to get it to PASS?

I am pasting some parts of the vcf below. Any help is appreciated.

##fileformat=VCFv4.0
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=3,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Filtered Depth">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias">
##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[x.bam] sample_metadata=[] read_buffer_size=n
ull phone_home=STANDARD read_filter=[] intervals=[x.bed] excludeIntervals=null reference_sequence=hg19.fasta rodBind=[dbsnp_132.hg19.vcf] rodToIntervalTrackName=null BTI_merge_rule=UNION nonDeterministicRandomSeed=false DBSNP=null downsampling_type=null downs
ample_to_fraction=null downsample_to_coverage=null baq=CALCULATE_AS_NECESSARY baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SIL
ENT unsafe=null num_threads=1 interval_merging=ALL read_group_black_list=null processingTracker=null restartProcessingTracker=false processingTrackerStatusFile=null processingTrackerID=-1 allow_int
ervals_with_unindexed_bam=false disable_experimental_low_memory_sharding=false logging_level=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT heterozygosity=0.0
010 pcr_error_rate=1.0E-4 genotyping_mode=DISCOVERY output_mode=EMIT_ALL_SITES standard_min_confidence_threshold_for_calling=50.0 standard_min_confidence_threshold_for_emitting=10.0 noSLOD=false as
sume_single_sample_reads=null abort_at_too_much_coverage=-1 min_base_quality_score=17 min_mapping_quality_score=20 max_deletion_fraction=0.05 min_indel_count_for_genotyping=5 indel_heterozygosity=1
.25E-4 indelGapContinuationPenalty=10.0 indelGapOpenPenalty=45.0 indelHaplotypeSize=80 doContextDependentGapPenalties=true getGapPenaltiesFromData=false indel_recal_file=indel.recal_data.csv indelD
ebug=false dovit=false GSA_PRODUCTION_ONLY=false exactCalculation=LINEAR_EXPERIMENTAL ignoreSNPAlleles=false output_all_callable_bases=false genotype=false out=org.broadinstitute.sting.gatk.io.stub
s.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub debug_file=null metrics_file=null annotation=[DepthOfC
overage, RMSMappingQuality]"
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  YH1
chr1    14468   .       G       .       .       PASS    DP=110;HaplotypeScore=0.0000;MQ=2.10;MQ0=104    GT      ./.
chr1    14469   .       C       .       .       PASS    DP=109;HaplotypeScore=0.0000;MQ=2.11;MQ0=103    GT      ./.
chr1    14470   .       G       .       .       PASS    DP=105;HaplotypeScore=0.0000;MQ=2.15;MQ0=99     GT      ./.
chr1    14471   .       C       .       .       PASS    DP=103;HaplotypeScore=0.0000;MQ=2.17;MQ0=97     GT      ./.
chr1    14472   .       A       .       .       PASS    DP=106;HaplotypeScore=0.0000;MQ=2.14;MQ0=100    GT      ./.
chr1    14473   .       G       .       .       PASS    DP=103;HaplotypeScore=0.0000;MQ=2.17;MQ0=97     GT      ./.
chr1    14474   .       G       .       .       PASS    DP=98;HaplotypeScore=0.0000;MQ=2.23;MQ0=92      GT      ./.
chr1    14553   .       C       .       .       PASS    DP=98;HaplotypeScore=0.0000;MQ=2.33;MQ0=94      GT      ./.
chr1    14554   .       G       .       .       PASS    DP=99;HaplotypeScore=0.0000;MQ=2.32;MQ0=95      GT      ./.
chr1    14555   .       C       .       .       PASS    DP=101;HaplotypeScore=0.0000;MQ=3.17;MQ0=96     GT      ./.
chr1    14556   .       T       .       32.99   LowQual AC=0;AF=0.00;AN=2;DP=101;MQ=3.17;MQ0=96 GT:DP:GQ:PL     0/0:101:3:0,3,27
chr1    14557   .       C       .       32.99   LowQual AC=0;AF=0.00;AN=2;DP=102;MQ=3.15;MQ0=97 GT:DP:GQ:PL     0/0:102:3:0,3,27
....
chr1    14587   .       T       .       35.99   LowQual AC=0;AF=0.00;AN=2;DP=100;MQ=4.41;MQ0=90 GT:DP:GQ:PL     0/0:100:6:0,6,51
chr1    14640   .       C       .       50.96   PASS    AC=0;AF=0.00;AN=2;DP=123;MQ=5.73;MQ0=107        GT:DP:GQ:PL     0/0:123:20.97:0,21,174
chr1    14641   .       A       .       50.96   PASS    AC=0;AF=0.00;AN=2;DP=123;MQ=5.84;MQ0=106        GT:DP:GQ:PL     0/0:123:20.97:0,21,174

Best Answers

Answers

  • newbie16newbie16 Member

    Hi Sheila,
    Thanks for your explanation.
    Yes I do have majority of MQ0 reads. What would you suggest? Should I filter out the reads with low mapping quality?

    Thanks

  • newbie16newbie16 Member

    One comment I have is that it is a bit confusing to have PASS in the FILTER column for a no-call. Maybe this can be changed to a dot?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @newbie16‌

    Hello,

    You will need to do some QC on your data and find out what proportion of the data is useable. If the MQ0 reads are just a localized problem, you can ignore it. But if it there are a lot of MQ0 reads throughout the genome, you should try remapping.

    I will look into what we can do about the . instead of PASS in the filter field.

    -Sheila

  • newbie16newbie16 Member

    Hi Sheila

    I have one more related question and would appreciate your help. For a unified genotyper run, I have specified
    --standard_min_confidence_threshold_for_calling 50 --standard_min_confidence_threshold_for_emitting 0 --output_mode EMIT_ALL_SITES

    In the vcf file, I see that there is LowQual associated with 1 call with Qual 31.23 . However for ref calls where the QUAL is higher than 50, I do not see a PASS. Does this imply that there are other filters/criteria due to which I do not see a PASS?

    I am pasting a subset of the vcf below:

        ##fileformat=VCFv4.1
        ##FILTER=<ID=LowQual,Description="Low quality">
        ##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
        ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
        ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
        ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
        ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
        ##GATKCommandLine=<ID=UnifiedGenotyper,Version=3.2-2-gec30cee,Date="Wed Oct 08 11:17:01 PDT 2014",Epoch=1412792221457,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[tests1_gatk_wbed/withrg_re
        order.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=[test.bed] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padd
        ing=0 reference_sequence=/home/rjain/software/gatk/resource_bundle/v2.8/ucsc.hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMP
        LE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false
        useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 valid
        ation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_c
        pu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_wi
        th_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false genotype_likelihoods_model=SNP pcr_error
        _rate=1.0E-4 computeSLOD=false pair_hmm_implementation=LOGLESS_CACHING min_base_quality_score=17 max_deletion_fraction=0.05 min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indelGapContinu
        ationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 indelDebug=false ignoreSNPAlleles=false allReadsSP=false ignoreLaneInfo=false reference_sample_calls=(RodBinding name= source=UNBOUND) reference_s
        ample_name=null min_quality_score=1 max_quality_score=40 site_quality_prior=20 min_power_threshold_for_calling=0.95 annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence
        _threshold_for_calling=50.0 standard_min_confidence_threshold_for_emitting=0.0 max_alternate_alleles=6 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) conta
        mination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=EXACT_INDEPENDENT exactcallslog=null output_mode=EMIT_ALL_SITES allSitePLs=false dbsnp=(RodBinding name= source=UNBO
        UND) comp=[] out=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.gatk.engi
        ne.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub onlyEmitSamples=[] debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_reads_
        with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
        ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
        ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
    ......
        ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
        ##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
        ##contig=<ID=chrM,length=16571,assembly=hg19>
        ##contig=<ID=chr1,length=249250621,assembly=hg19>
        ##contig=<ID=chr2,length=243199373,assembly=hg19>
        ##contig=<ID=chr3,length=198022430,assembly=hg19>
    .....
        ##reference=file:///home/rjain/software/gatk/resource_bundle/v2.8/ucsc.hg19.fasta
        #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA19238
        chr1    54631   .       A       .       .       .       .       GT      ./.
        chr1    54632   .       C       .       .       .       .       GT      ./.
        chr1    54633   .       T       .       .       .       .       GT      ./.
        chr1    54634   .       T       .       .       .       .       GT      ./.
        chr1    54635   .       A       .       .       .       .       GT      ./.
        chr1    54636   .       G       .       .       .       .       GT      ./.
        chr1    54637   .       A       .       .       .       .       GT      ./.
        chr1    54638   .       T       .       .       .       .       GT      ./.
        chr1    54639   .       T       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54640   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54641   .       T       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54642   .       A       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54643   .       T       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54644   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54645   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54646   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54647   .       T       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54648   .       A       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54649   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54650   .       G       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    564391  .       G       .       31.23   LowQual AN=2;DP=1;MQ=42.00;MQ0=0        GT:DP   0/0:1
        chr1    564392  .       A       .       .       .       .       GT      ./.
        chr1    564393  .       A       .       .       .       .       GT      ./.
    
Sign In or Register to comment.