Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Questions about my VCF file generated by UnifiedGenotyper and cannot do the downstream analysis VQSR

tianmingtianming ChinaPosts: 4Member
edited September 2013 in Ask the GATK team

Hi,

I known that this question should not post to the GATK forum because the ERROR told me that "Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself." However, I really cant find which step I made this error although I have read many documentations of GATK forum about this error. Could you please give me some suggestions? Much thanks!!

In my VCF file, I find that not all the SNP terms have the same set of annotation, and some annotations cant be found in some SNP terms, like this:

chr1    5036777 rs898335        T       G       36.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLE AF=1.00;MQ=37.00;MQ0=0;QD=18.37  GT:AD:DP:GQ:PL  1/1:0,2:2:6:64,6,0
chr1    9507566 rs12742542      T       C       33.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=16.87   GT:AD:DP:GQ:PL  1/1:0,2:2:6:61,6,0
chr1    9507621 rs12755964      G       A       37.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=18.87   GT:AD:DP:GQ:PL  1/1:0,2:2:6:65,6,0
chr1    22376947        rs2473327       A       G       40.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=20.37   GT:AD:DP:GQ:PL  1/1:0,2:2:6:68,6,0
chr1    38061706        rs10908362      G       C       32.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=16.37   GT:AD:DP:GQ:PL  1/1:0,2:2:6:60,6,0
chr1    78317717        rs10782656      G       C       36.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=37.00;MQ0=0;QD=18.37   GT:AD:DP:GQ:PL  1/1:0,2:2:6:64,6,0
chr1    111457142       rs1282019       A       G       35.74   .       AC=2;AF=1.00;AN=2;DB;DP=2;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=30.81;MQ0=0;QD=17.87   GT:AD:DP:GQ:PL  1/1:0,2:2:6:63,6,0
chr1    121484153       rs9701684       C       G       32.74   .       AC=2;AF=1.00;AN=2;DB;DP=5;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=19.48;MQ0=3;QD=6.55    GT:AD:DP:GQ:PL  1/1:2,3:5:6:60,6,0
chr1    121484423       rs7368003       T       C       83.03   .       AC=2;AF=1.00;AN=2;DB;DP=5;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=28.24;MQ0=1;QD=16.61   GT:AD:DP:GQ:PL  1/1:0,5:5:12:111,12,0
chr1    121484503       rs4898086       T       A       311.77  .       AC=2;AF=1.00;AN=2;DB;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=33.63;MQ0=1;QD=25.98  GT:AD:DP:GQ:PL  1/1:0,12:12:33:340,33,0
chr1    121484591       rs4898109       T       A       463.77  .       AC=2;AF=1.00;AN=2;DB;DP=20;Dels=0.00;FS=0.000;HaplotypeScore=0.9947;MLEAC=2;MLEAF=1.00;MQ=30.44;MQ0=1;QD=23.19  GT:AD:DP:GQ:PL  1/1:0,20:20:54:492,54,0
chr1    121484599       rs4898111       C       G       109.77  .       AC=1;AF=0.500;AN=2;BaseQRankSum=-0.555;DB;DP=21;Dels=0.00;FS=0.000;HaplotypeScore=4.9775;MLEAC=1;MLEAF=0.500;MQ=30.78;MQ0=1;MQRankSum=-1.030;QD=5.23;ReadPosRankSum=1.664       GT:AD:DP:GQ:PL  0/1:14,7:21:99:138,0,312
chr1    121484600       rs1825267       T       C       100.77  .       AC=1;AF=0.500;AN=2;BaseQRankSum=-1.347;DB;DP=21;Dels=0.00;FS=0.000;HaplotypeScore=4.9775;MLEAC=1;MLEAF=0.500;MQ=30.78;MQ0=1;MQRankSum=-1.743;QD=4.80;ReadPosRankSum=1.585       GT:AD:DP:GQ:PL  0/1:14,7:21:99:129,0,310
chr1    121484602       rs74187930      T       C       190.77  .       AC=1;AF=0.500;AN=2;BaseQRankSum=1.323;DB;DP=22;Dels=0.00;FS=0.000;HaplotypeScore=4.9775;MLEAC=1;MLEAF=0.500;MQ=31.09;MQ0=1;MQRankSum=1.323;QD=8.67;ReadPosRankSum=-2.306        GT:AD:DP:GQ:PL  0/1:10,11:22:99:219,0,223
chr1    121484650       rs4092774       A       G       58.77   .       AC=1;AF=0.500;AN=2;BaseQRankSum=-1.620;DB;DP=15;Dels=0.00;FS=0.000;HaplotypeScore=0.9989;MLEAC=1;MLEAF=0.500;MQ=30.73;MQ0=2;MQRankSum=-0.540;QD=3.92;ReadPosRankSum=-0.231      GT:AD:DP:GQ:PL  0/1:10,4:15:87:87,0,222

e.g. MQRankSum can be found only in the last 4 terms.

and my Command Line:

java -Xmx15g -jar /ifs1/ST_POP/USER/lantianming/HUM/bin/GenomeAnalysisTK-2.7-2-g6bda569/GenomeAnalysisTK.jar 
    -glm BOTH 
    -l INFO 
    -R /nas/RD_09C/resequencing/soft/pipeline/GATK/bundle/2.5/hg19/ucsc.hg19.fasta 
    -T UnifiedGenotyper 
    -I /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/recal_03/test.realn_8.recal.bam 
    -D /nas/RD_09C/resequencing/soft/pipeline/GATK/bundle/2.5/hg19/dbsnp_137.hg19.vcf 
    -o /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/callsnp/realn_8.recal.bam.vcf
    -metrics /ifs1/ST_POP/USER/lantianming/HUM/align/bwa/callsnp/snpcall.metrics

What can I do to solve this problem?

Thanks a lot !

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,877Administrator, GATK Developer admin
    edited September 2013

    Hi Tianming,

    Ask yourself: what do the sites that have the annotation have in common vs. the ones that don't?

    Answer: they are all homozygous, not heterozygous.

    From the documentation for MappingQualityRankSumTest:

    Caveat
    The mapping quality rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles.

    The annotation is missing because it cannot be calculated. It is not a problem.

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • tianmingtianming ChinaPosts: 4Member

    Hi Geraldine

    I am so sorry that I sent a hardly readable message in the previously comment, and I resent it now.

    thanks very much for your patient and helpful answer. I have read this documentation you sent me about MappingQualityRankSumTest before I ask this question.

    I was really confused when I open the VCF file of NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19.sites.vcf which was downloaded from the GATK bundle. In this file, all the terms was annotated with MappingQualityRankSumTest, are this SNPs all heterozygous?

    When I subject NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.hg19 file to VQSR, I found that not all terms are heterozygous but with MappingQualityRankSumTest annotations.

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878 chr1 14907 . A G 2109.89 VQSRTrancheSNP95.00to97.00+ AC=2;AF=1.00;AN=2;BaseQRankSum=1.526;DP=118;Dels=0.00;HRun=1;HaplotypeScore=1.9512;MQ=25.12;MQ0=33;MQRankSum=1.691;QD=17.88;ReadPosRankSum=-0.247;SB=-1218.77;VQSLOD=1.95;culprit=ReadPosRankSum GT:AD:DP:GQ:PL 1/1:6,111:118:99:2143,207,0

    chr1 16257 . G C 88.16 VQSRTrancheSNP90.00to93.00 AC=1;AF=0.50;AN=2;BaseQRankSum=-1.614;DP=96;Dels=0.00;HRun=0;HaplotypeScore=1.4450;MQ=19.21;MQ0=18;MQRankSum=-0.289;QD=0.92;ReadPosRankSum=0.688;SB=5.96;VQSLOD=2.78;culprit=ReadPosRankSum GT:AD:DP:GQ:PL 0/1:78,18:96:99:118,0,870

    chr1 16534 . C T 332.92 VQSRTrancheSNP95.00to97.00 AC=1;AF=0.50;AN=2;BaseQRankSum=0.852;DP=97;Dels=0.00;HRun=0;HaplotypeScore=0.8321;MQ=17.04;MQ0=11;MQRankSum=1.444;QD=3.43;ReadPosRankSum=-1.214;SB=-67.95;VQSLOD=2.20;culprit=ReadPosRankSum GT:AD:DP:GQ:PL 0/1:42,55:97:68.81:363,0,69

    chr1 30923 . G T 1133.36 VQSRTrancheSNP95.00to97.00 AC=2;AF=1.00;AN=2;BaseQRankSum=-Infinity;DP=43;Dels=0.00;HRun=0;HaplotypeScore=0.8941;MQ=34.17;MQ0=5;MQRankSum=-Infinity;QD=26.36;ReadPosRankSum=-Infinity;SB=-655.51;VQSLOD=2.39;culprit=NULL GT:AD:DP:GQ:PL 1/1:0,43:43:90.26:1166,90,0

    chr1 51898 . C A 212.30 VQSRTrancheSNP95.00to97.00+ AC=1;AF=0.50;AN=2;BaseQRankSum=-2.889;DP=31;Dels=0.00;HRun=0;HaplotypeScore=2.7737;MQ=27.10;MQ0=9;MQRankSum=-2.824;QD=6.85;ReadPosRankSum=1.248;SB=21.05;VQSLOD=-6.144e-01;culprit=MQRankSum GT:AD:DP:GQ:PL 0/1:16,15:31:99:242,0,153

    chr1 55326 rs3107975 T C 1275.65 VQSRTrancheSNP93.00to95.00 AC=2;AF=1.00;AN=2;BaseQRankSum=-Infinity;DB;DP=46;Dels=0.00;HRun=0;HaplotypeScore=0.0000;MQ=29.05;MQ0=0;MQRankSum=-Infinity;QD=27.73;ReadPosRankSum=-Infinity;SB=-539.95;VQSLOD=2.57;culprit=NULL GT:AD:DP:GQ:PL 1/1:0,46:46:99:1309,129,0

    chr1 57856 . T A 407.38 VQSRTrancheSNP93.00to95.00 AC=1;AF=0.50;AN=2;BaseQRankSum=-3.439;DP=80;Dels=0.00;HRun=2;HaplotypeScore=1.9783;MQ=16.77;MQ0=29;MQRankSum=-1.385;QD=5.09;ReadPosRankSum=-0.367;SB=-103.89;VQSLOD=2.61;culprit=ReadPosRankSum GT:AD:DP:GQ:PL 0/1:49,31:80:99:437,0,467

    chr1 58211 . A G 134.43 VQSRTrancheSNP95.00to97.00 AC=2;AF=1.00;AN=2;BaseQRankSum=-Infinity;DP=8;Dels=0.00;HRun=1;HaplotypeScore=0.0000;MQ=22.93;MQ0=3;MQRankSum=-Infinity;QD=16.80;ReadPosRankSum=-Infinity;SB=-0.01;VQSLOD=2.30;culprit=NULL GT:AD:DP:GQ:PL 1/1:0,8:8:15.04:167,15,0

    chr1 61987 . A G 403.50 VQSRTrancheSNP95.00to97.00 AC=1;AF=0.50;AN=2;BaseQRankSum=-4.744;DP=60;Dels=0.00;HRun=0;HaplotypeScore=0.8667;MQ=37.62;MQ0=8;MQRankSum=-0.458;QD=6.72;ReadPosRankSum=1.300;SB=-172.40;VQSLOD=2.41;culprit=ReadPosRankSum GT:AD:DP:GQ:PL 0/1:33,27:60:99:433,0,676

    chr1 61989 . G C 423.47 VQSRTrancheSNP95.00to97.00 AC=1;AF=0.50;AN=2;BaseQRankSum=0.275;DP=60;Dels=0.00;HRun=0;HaplotypeScore=0.8667;MQ=37.62;MQ0=8;MQRankSum=-0.293;QD=7.06;ReadPosRankSum=1.300;SB=-188.61;VQSLOD=2.38;culprit=ReadPosRankSum GT:AD:DP:GQ:PL 0/1:33,27:60:99:453,0,661

    Could you please give me some more suggestions? Thanks very very much!!!

  • tianmingtianming ChinaPosts: 4Member

    Hi Geraldine I have finished this step as I did according to your suggestions, much thanks for your help!

Sign In or Register to comment.