Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Qual is a "dot" and FILTER is PASS in vcf

Hi
I have a vcf that was generated using unified genotyper using output-mode EMIT_ALL_SITES. Several positions in the vcf with ALT as "." have QUAL as "." which I understand as "Reference" with unknown Quality. Howrever, FILTER for these is set to PASS.I am wondering how this is possible? Does this mean that Unified Genotyper did not print a QUAL even though it was score high enough to get it to PASS?

I am pasting some parts of the vcf below. Any help is appreciated.

##fileformat=VCFv4.0
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=3,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Filtered Depth">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias">
##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[x.bam] sample_metadata=[] read_buffer_size=n
ull phone_home=STANDARD read_filter=[] intervals=[x.bed] excludeIntervals=null reference_sequence=hg19.fasta rodBind=[dbsnp_132.hg19.vcf] rodToIntervalTrackName=null BTI_merge_rule=UNION nonDeterministicRandomSeed=false DBSNP=null downsampling_type=null downs
ample_to_fraction=null downsample_to_coverage=null baq=CALCULATE_AS_NECESSARY baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SIL
ENT unsafe=null num_threads=1 interval_merging=ALL read_group_black_list=null processingTracker=null restartProcessingTracker=false processingTrackerStatusFile=null processingTrackerID=-1 allow_int
ervals_with_unindexed_bam=false disable_experimental_low_memory_sharding=false logging_level=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT heterozygosity=0.0
010 pcr_error_rate=1.0E-4 genotyping_mode=DISCOVERY output_mode=EMIT_ALL_SITES standard_min_confidence_threshold_for_calling=50.0 standard_min_confidence_threshold_for_emitting=10.0 noSLOD=false as
sume_single_sample_reads=null abort_at_too_much_coverage=-1 min_base_quality_score=17 min_mapping_quality_score=20 max_deletion_fraction=0.05 min_indel_count_for_genotyping=5 indel_heterozygosity=1
.25E-4 indelGapContinuationPenalty=10.0 indelGapOpenPenalty=45.0 indelHaplotypeSize=80 doContextDependentGapPenalties=true getGapPenaltiesFromData=false indel_recal_file=indel.recal_data.csv indelD
ebug=false dovit=false GSA_PRODUCTION_ONLY=false exactCalculation=LINEAR_EXPERIMENTAL ignoreSNPAlleles=false output_all_callable_bases=false genotype=false out=org.broadinstitute.sting.gatk.io.stub
s.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub debug_file=null metrics_file=null annotation=[DepthOfC
overage, RMSMappingQuality]"
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  YH1
chr1    14468   .       G       .       .       PASS    DP=110;HaplotypeScore=0.0000;MQ=2.10;MQ0=104    GT      ./.
chr1    14469   .       C       .       .       PASS    DP=109;HaplotypeScore=0.0000;MQ=2.11;MQ0=103    GT      ./.
chr1    14470   .       G       .       .       PASS    DP=105;HaplotypeScore=0.0000;MQ=2.15;MQ0=99     GT      ./.
chr1    14471   .       C       .       .       PASS    DP=103;HaplotypeScore=0.0000;MQ=2.17;MQ0=97     GT      ./.
chr1    14472   .       A       .       .       PASS    DP=106;HaplotypeScore=0.0000;MQ=2.14;MQ0=100    GT      ./.
chr1    14473   .       G       .       .       PASS    DP=103;HaplotypeScore=0.0000;MQ=2.17;MQ0=97     GT      ./.
chr1    14474   .       G       .       .       PASS    DP=98;HaplotypeScore=0.0000;MQ=2.23;MQ0=92      GT      ./.
chr1    14553   .       C       .       .       PASS    DP=98;HaplotypeScore=0.0000;MQ=2.33;MQ0=94      GT      ./.
chr1    14554   .       G       .       .       PASS    DP=99;HaplotypeScore=0.0000;MQ=2.32;MQ0=95      GT      ./.
chr1    14555   .       C       .       .       PASS    DP=101;HaplotypeScore=0.0000;MQ=3.17;MQ0=96     GT      ./.
chr1    14556   .       T       .       32.99   LowQual AC=0;AF=0.00;AN=2;DP=101;MQ=3.17;MQ0=96 GT:DP:GQ:PL     0/0:101:3:0,3,27
chr1    14557   .       C       .       32.99   LowQual AC=0;AF=0.00;AN=2;DP=102;MQ=3.15;MQ0=97 GT:DP:GQ:PL     0/0:102:3:0,3,27
....
chr1    14587   .       T       .       35.99   LowQual AC=0;AF=0.00;AN=2;DP=100;MQ=4.41;MQ0=90 GT:DP:GQ:PL     0/0:100:6:0,6,51
chr1    14640   .       C       .       50.96   PASS    AC=0;AF=0.00;AN=2;DP=123;MQ=5.73;MQ0=107        GT:DP:GQ:PL     0/0:123:20.97:0,21,174
chr1    14641   .       A       .       50.96   PASS    AC=0;AF=0.00;AN=2;DP=123;MQ=5.84;MQ0=106        GT:DP:GQ:PL     0/0:123:20.97:0,21,174

Best Answers

Answers

  • newbie16newbie16 Member

    Hi Sheila,
    Thanks for your explanation.
    Yes I do have majority of MQ0 reads. What would you suggest? Should I filter out the reads with low mapping quality?

    Thanks

  • newbie16newbie16 Member

    One comment I have is that it is a bit confusing to have PASS in the FILTER column for a no-call. Maybe this can be changed to a dot?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @newbie16‌

    Hello,

    You will need to do some QC on your data and find out what proportion of the data is useable. If the MQ0 reads are just a localized problem, you can ignore it. But if it there are a lot of MQ0 reads throughout the genome, you should try remapping.

    I will look into what we can do about the . instead of PASS in the filter field.

    -Sheila

  • newbie16newbie16 Member

    Hi Sheila

    I have one more related question and would appreciate your help. For a unified genotyper run, I have specified
    --standard_min_confidence_threshold_for_calling 50 --standard_min_confidence_threshold_for_emitting 0 --output_mode EMIT_ALL_SITES

    In the vcf file, I see that there is LowQual associated with 1 call with Qual 31.23 . However for ref calls where the QUAL is higher than 50, I do not see a PASS. Does this imply that there are other filters/criteria due to which I do not see a PASS?

    I am pasting a subset of the vcf below:

        ##fileformat=VCFv4.1
        ##FILTER=<ID=LowQual,Description="Low quality">
        ##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
        ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
        ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
        ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
        ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
        ##GATKCommandLine=<ID=UnifiedGenotyper,Version=3.2-2-gec30cee,Date="Wed Oct 08 11:17:01 PDT 2014",Epoch=1412792221457,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[tests1_gatk_wbed/withrg_re
        order.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=[test.bed] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padd
        ing=0 reference_sequence=/home/rjain/software/gatk/resource_bundle/v2.8/ucsc.hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMP
        LE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false
        useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 valid
        ation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_c
        pu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_wi
        th_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false genotype_likelihoods_model=SNP pcr_error
        _rate=1.0E-4 computeSLOD=false pair_hmm_implementation=LOGLESS_CACHING min_base_quality_score=17 max_deletion_fraction=0.05 min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indelGapContinu
        ationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 indelDebug=false ignoreSNPAlleles=false allReadsSP=false ignoreLaneInfo=false reference_sample_calls=(RodBinding name= source=UNBOUND) reference_s
        ample_name=null min_quality_score=1 max_quality_score=40 site_quality_prior=20 min_power_threshold_for_calling=0.95 annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence
        _threshold_for_calling=50.0 standard_min_confidence_threshold_for_emitting=0.0 max_alternate_alleles=6 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) conta
        mination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=EXACT_INDEPENDENT exactcallslog=null output_mode=EMIT_ALL_SITES allSitePLs=false dbsnp=(RodBinding name= source=UNBO
        UND) comp=[] out=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.gatk.engi
        ne.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub onlyEmitSamples=[] debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_reads_
        with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
        ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
        ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
    ......
        ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
        ##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
        ##contig=<ID=chrM,length=16571,assembly=hg19>
        ##contig=<ID=chr1,length=249250621,assembly=hg19>
        ##contig=<ID=chr2,length=243199373,assembly=hg19>
        ##contig=<ID=chr3,length=198022430,assembly=hg19>
    .....
        ##reference=file:///home/rjain/software/gatk/resource_bundle/v2.8/ucsc.hg19.fasta
        #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA19238
        chr1    54631   .       A       .       .       .       .       GT      ./.
        chr1    54632   .       C       .       .       .       .       GT      ./.
        chr1    54633   .       T       .       .       .       .       GT      ./.
        chr1    54634   .       T       .       .       .       .       GT      ./.
        chr1    54635   .       A       .       .       .       .       GT      ./.
        chr1    54636   .       G       .       .       .       .       GT      ./.
        chr1    54637   .       A       .       .       .       .       GT      ./.
        chr1    54638   .       T       .       .       .       .       GT      ./.
        chr1    54639   .       T       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54640   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54641   .       T       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54642   .       A       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54643   .       T       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54644   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54645   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54646   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54647   .       T       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54648   .       A       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54649   .       C       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    54650   .       G       .       58.23   .       AN=2;DP=10;MQ=42.00;MQ0=0       GT:DP   0/0:10
        chr1    564391  .       G       .       31.23   LowQual AN=2;DP=1;MQ=42.00;MQ0=0        GT:DP   0/0:1
        chr1    564392  .       A       .       .       .       .       GT      ./.
        chr1    564393  .       A       .       .       .       .       GT      ./.
    
Sign In or Register to comment.