the Allele Count (AC) tag is incorrect

naarkhoonaarkhoo Posts: 38Member
edited February 2013 in Ask the GATK team

I am facing this error, when I try to validate the variants (ValidateVariants) of a vcf file which is produced through GATK just after UnifiedGenotyper. I am using GenomeAnalysisTK-2.3-6-gebbba25 and dbsnp_137.hg19.vcf. These variants are annotated by DepthOfCoverage, aplotypeScore, ,InbreedingCoeff and LowMQ ...

Basically, I generate two VCF files using UnifiedGenotyper separately, one for SNP and the other for INDEL.

the error for both is about the Allele Count (AC) tag: ##### ERROR MESSAGE: File F93.snp.vcf fails strict validation: the Allele Count (AC) tag is incorrect for the record at position chr1:1225579, 1 vs. 1

I appreciate your comments,

Post edited by naarkhoo on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,466Administrator, GATK Developer admin

    Can you upgrade to the latest version and see if the error persists?

    Geraldine Van der Auwera, PhD

  • naarkhoonaarkhoo Posts: 38Member

    I tried GenomeAnalysisTK-2.3-9-ge5ebf34, the error still persists ! the Allele Count (AC) tag is incorrect for the record at position chrM:302, 2 vs. 2

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,466Administrator, GATK Developer admin

    Ok. Can you post the VCF record where the error occurs (chrM:302)?

    Geraldine Van der Auwera, PhD

  • naarkhoonaarkhoo Posts: 38Member
    edited February 2013
    chrM    302 rs66492218  AC  ACC,A   2239.04 PASS    AC=3,2;AF=0.375,0.250;AN=8;BaseQRankSum=0.822;DB;DP=195;FS=3.608;HaplotypeScore=45.0534;IndelType=MULTIALLELIC_INDEL;LowMQ=0.0000,0.0000,195;MLEAC=3,2;MLEAF=0.375,0.250;MQ=58.74;MQ0=0;MQRankSum=1.342;QD=11.48;RPA=8,9,7;RU=C;ReadPosRankSum=-1.658;STR;set=variant   
    
    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,466Administrator, GATK Developer admin

    Can you please post the complete record? This is missing the format field and sample values.

    Geraldine Van der Auwera, PhD

  • naarkhoonaarkhoo Posts: 38Member
    edited February 2013
    chrM    302 rs66492218  AC  ACC,A   2239.04 PASS    AC=3,2;AF=0.375,0.250;AN=8;BaseQRankSum=0.822;DB;DP=195;FS=3.608;HaplotypeScore=45.0534;IndelType=MULTIALLELIC_INDEL;LowMQ=0.0000,0.0000,195;MLEAC=3,2;MLEAF=0.375,0.250;MQ=58.74;MQ0=0;MQRankSum=1.342;QD=11.48;RPA=8,9,7;RU=C;ReadPosRankSum=-1.658;STR;set=variant   GT:AD:DP:GQ:PL  ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1:41,10,2:66:99:186,0,1210,211,1129,1608  0/1:3,34,0:47:21:987,0,21,1084,127,1467 2/2:0,0,36:52:99:1179,1264,1506,108,108,0   0/1:20,1,0:30:2:2,0,542,70,580,739
    
    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,466Administrator, GATK Developer admin

    Ah, there you go -- the counts are indeed incorrect. The record includes 3 samples with the first allele but only 1 of the second, unlike what is reported in the AC tag. Did you exclude some samples after processing, perhaps?

    Geraldine Van der Auwera, PhD

  • naarkhoonaarkhoo Posts: 38Member
    edited February 2013

    no ! I didn't ! I used UnifiedGenotyper separably for calling SNP and indels from 93 samples ; then I did filtration

    for indel

    --filterExpression "QD < 2.0" \
    --filterName "QDFilter" \
    --filterExpression "ReadPosRankSum < -20.0" \
    --filterName "ReadPosFilter" \
    --filterExpression "FS > 200.0" \
    --filterName "FSFilter" \
    --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
    --filterName "HARD_TO_VALIDATE" \
    --filterExpression "QUAL < 30.0 || DP < 6 || DP > 5000 || HRun > 5" \
    --filterName "QualFilter"
    

    for SNP:

    --clusterSize 3 \
    --clusterWindowSize 10 \
    --filterExpression "QD < 2.0" \
    --filterName "QDFilter" \
    --filterExpression "MQ < 40.0" \
    --filterName "MQFilter" \
    --filterExpression "FS > 60.0" \
    --filterName "FSFilter" \
    --filterExpression "HaplotypeScore > 13.0" \
    --filterName "HaplotypeScoreFilter" \
    --filterExpression "MQRankSum < -12.5" \
    --filterName "MQRankSumFilter" \
    --filterExpression "ReadPosRankSum < -8.0" \
    --filterName "ReadPosRankSumFilter" \
    --filterExpression "QUAL < 30.0 || DP < 6 || DP > 5000 || HRun > 5" \
    

    and then I combined these two, using CombineVariants; as you see, I didn't remove any sample.

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,466Administrator, GATK Developer admin

    I see -- Just to be clear, are you getting the error when you run ValidateVariants on the files that come straight out of the UnifiedGenotyper, or on the combined VCF that results from CombineVariants?

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,466Administrator, GATK Developer admin

    Ah, actually I was mistaken -- the sample that is 2/2 counts for two alleles, so the tag is in fact correct. We'll have a closer look at this.

    Geraldine Van der Auwera, PhD

  • naarkhoonaarkhoo Posts: 38Member

    Surprisingly, I don't see the error again ! sounds like miracle :D , now I am getting something like File F93.all.vcf fails strict validation: the rsID rs35614524 for the record at position chr9:139565479 is not in dbSNP ... I haven't start to pray for it to get fix automatically

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,466Administrator, GATK Developer admin

    Hmm, that's odd. Well, the new version will be out very soon, which is more thoroughly tested -- hopefully you won't suffer these weird phantom bugs then.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.