The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# GATK's quality scores on VCF files

Posts: 4Member
edited March 2013

Hi,

quality score (phred score) is defined as below. (i.e. 1% error rate is equal to phred score of 20 (-10xlog 0.01))

QUAL phred-scaled quality score for the assertion made in ALT. i.e. -10log_10 prob(call in ALT is wrong). If ALT is ”.” (no variant) then this is -10log_10 p(variant), and if ALT is not ”.” this is -10log_10p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. If unknown, the missing value should be specified. (Numeric)

Using GATK to generate vcf files and looking through the quality column of those files, I found out that the max quality score is 441,453 which is extremely huge number.

I wonder if the quality score of GATK tool follows the phred score system; if not, how do you calculate the quality score and what do the numbers of quality score represent?

Look forward to hearing back from you soon and thank you very much.

Post edited by Geraldine_VdAuwera on
Tagged:

Yes, the QUAL emitted by GATK is phred-scaled and corresponds to the definition given in the VCF specification.

Geraldine Van der Auwera, PhD

• Posts: 4Member
edited March 2013
java -jar $GATK -T UnifiedGenotyper -R ucsc.hg19.fasta { -I$BAM }x17 -o $VCF --dbsnp$DBSNP -stand_call_conf 50 -stand_emit_conf 10 --min_base_quality_score 10 --max_deletion_fraction 0.05 -dt NONE -glm BOTH -baq CALCULATE_AS_NECESSARY -A QualByDepth -A FisherStrand -A HaplotypeScore -A ReadPosRankSumTest -A MappingQualityRankSumTest -A InbreedingCoeff -A RMSMappingQuality -A DepthOfCoverage


Above is the command line I used for generating vcf files

Post edited by Geraldine_VdAuwera on

I'm sorry, I'm not sure I understand your question -- can you please tell me what is the problem you would like help with?

Geraldine Van der Auwera, PhD

• Posts: 4Member

Hi, Thank you for your reply, so my question is if the quality scores of vcf files are estimated using phred score system.
When I compare with the quality scores I got from other tools, the max quality score of 441,453 is way too big so wonder if GATK follows the phred score system.

• Posts: 4Member
edited March 2013

I understand the scoring system and the below line is one excerpted from my vcf file. It would be great if you let me know the reason why this has extremely high qual score compared to that of other tools.

chr17     45214544        rs62075619      T       C       557015 VQSRTrancheBOTH99.90to100.00     .       GT:AD:DP:GQ:PL 1/1:118,2105:2116:99:32767,5777,0

Post edited by Geraldine_VdAuwera on

Different tools may emit values on very different scales depending on how the probability calculations are handled internally, e.g. at what stage a number is rounded off. I cannot comment on values emitted by other tools. This simply looks like a case where the GATK has determined that it is extremely likely that there is indeed a variant at this site (which is supported by the allele depths). This is not a cause for concern.

Geraldine Van der Auwera, PhD

• Posts: 2Member

I have done the alignment with bwa and variant calling with GATK. I have all the variant data. I copy,paste a part of my data. please let me know, is it good for analysis ??! Many of variations have "low quality" in column with title "FILTER" ??!! Thank you

# CHROM POS REF ALT QUAL FILTER INFO FORMAT sm

chr3 60596 C A 15.65 LowQual AC=2;AF=1.00;AN=2;DB;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=15.65 GT:AD:DP:GQ:PL 1/1:0,1:1:3:42,3,0
chr3 60648 A G 15.65 LowQual AC=2;AF=1.00;AN=2;DB;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=15.65 GT:AD:DP:GQ:PL 1/1:0,1:1:3:42,3,0
chr3 96098 G A 15.65 LowQual AC=2;AF=1.00;AN=2;DB;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=15.65 GT:AD:DP:GQ:PL 1/1:0,1:1:3:42,3,0
chr3 104222 T C 14.68 LowQual AC=2;AF=1.00;AN=2;DB;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=14.68 GT:AD:DP:GQ:PL 1/1:0,1:1:3:41,3,0
chr3 121954 C T 15.65 LowQual AC=2;AF=1.00;AN=2;DB;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=15.65 GT:AD:DP:GQ:PL 1/1:0,1:1:3:42,3,0
chr3 124835 G A 10.9 LowQual AC=2;AF=1.00;AN=2;DB;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=10.90 GT:AD:DP:GQ:PL 1/1:0,1:1:3:37,3,0
chr3 168610 G A 15.65 LowQual AC=2;AF=1.00;AN=2;DB;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=15.65 GT:AD:DP:GQ:PL 1/1:0,1:1:3:42,3,0
chr3 174816 A G 15.65 LowQual AC=2;AF=1.00;AN=2;DB;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=15.65 GT:AD:DP:GQ:PL 1/1:0,1:1:3:42,3,0

Hi @ofogh1974,

It looks like you have very low depth at those sites (DP = 1). That's probably why you are getting low qualities for your variants. There is not enough data for the program to call the sites confidently.

Geraldine Van der Auwera, PhD

• Posts: 2Member

Best regards
Afagh