# Questions about my VCF file generated by UnifiedGenotyper and cannot do the downstream analysis VQSR

China
Hi,

I known that this question should not post to the GATK forum because the ERROR told me that "Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself." However, I really cant find which step I made this error although I have read many documentations of GATK forum about this error. Could you please give me some suggestions? Much thanks!!

In my VCF file, I find that not all the SNP terms have the same set of annotation, and some annotations cant be found in some SNP terms, like this:

chr1    5036777 rs898335        T       G       36.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLE AF=1.00;MQ=37.00;MQ0=0;QD=18.37  GT:AD:DP:GQ:PL  1/1:0,2:2:6:64,6,0
chr1    9507566 rs12742542      T       C       33.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=16.87   GT:AD:DP:GQ:PL  1/1:0,2:2:6:61,6,0
chr1    9507621 rs12755964      G       A       37.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=18.87   GT:AD:DP:GQ:PL  1/1:0,2:2:6:65,6,0
chr1    22376947        rs2473327       A       G       40.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=20.37   GT:AD:DP:GQ:PL  1/1:0,2:2:6:68,6,0
chr1    38061706        rs10908362      G       C       32.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=16.37   GT:AD:DP:GQ:PL  1/1:0,2:2:6:60,6,0
chr1    78317717        rs10782656      G       C       36.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=18.37   GT:AD:DP:GQ:PL  1/1:0,2:2:6:64,6,0
chr1    111457142       rs1282019       A       G       35.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=30.81;MQ0=0;QD=17.87   GT:AD:DP:GQ:PL  1/1:0,2:2:6:63,6,0
chr1    121484153       rs9701684       C       G       32.74   .       AC=2;AF=1.00;AN=2;DB;DP=5;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=19.48;MQ0=3;QD=6.55    GT:AD:DP:GQ:PL  1/1:2,3:5:6:60,6,0
chr1    121484423       rs7368003       T       C       83.03   .       AC=2;AF=1.00;AN=2;DB;DP=5;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=28.24;MQ0=1;QD=16.61   GT:AD:DP:GQ:PL  1/1:0,5:5:12:111,12,0
chr1    121484503       rs4898086       T       A       311.77  .       AC=2;AF=1.00;AN=2;DB;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=33.63;MQ0=1;QD=25.98  GT:AD:DP:GQ:PL  1/1:0,12:12:33:340,33,0
chr1    121484591       rs4898109       T       A       463.77  .       AC=2;AF=1.00;AN=2;DB;DP=20;Dels=0.00;FS=0.000;HaplotypeScore=0.9947;MLEAC=2;MLEAF=1.00;MQ=30.44;MQ0=1;QD=23.19  GT:AD:DP:GQ:PL  1/1:0,20:20:54:492,54,0


e.g. MQRankSum can be found only in the last 4 terms.

and my Command Line:

java -Xmx15g -jar /ifs1/ST_POP/USER/lantianming/HUM/bin/GenomeAnalysisTK-2.7-2-g6bda569/GenomeAnalysisTK.jar
-glm BOTH
-l INFO
-R /nas/RD_09C/resequencing/soft/pipeline/GATK/bundle/2.5/hg19/ucsc.hg19.fasta
-T UnifiedGenotyper
-I /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/recal_03/test.realn_8.recal.bam
-D /nas/RD_09C/resequencing/soft/pipeline/GATK/bundle/2.5/hg19/dbsnp_137.hg19.vcf
-o /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/callsnp/realn_8.recal.bam.vcf
-metrics /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/callsnp/snpcall.metrics


What can I do to solve this problem?

Thanks a lot !

Hi Tianming,

Ask yourself: what do the sites that have the annotation have in common vs. the ones that don't?

Answer: they are all homozygous, not heterozygous.

From the documentation for MappingQualityRankSumTest:

Caveat
The mapping quality rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles.

The annotation is missing because it cannot be calculated. It is not a problem.

Geraldine Van der Auwera, PhD

Hi Geraldine

I am so sorry that I sent a hardly readable message in the previously comment, and I resent it now.

I was really confused when I open the VCF file of NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19.sites.vcf which was downloaded from the GATK bundle. In this file, all the terms was annotated with MappingQualityRankSumTest, are this SNPs all heterozygous?

When I subject NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19 file to VQSR, I found that not all terms are heterozygous but with MappingQualityRankSumTest annotations.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878