The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

the Allele Count (AC) tag is incorrect

naarkhoonaarkhoo Member
edited February 2013 in Ask the GATK team

I am facing this error, when I try to validate the variants (ValidateVariants) of a vcf file which is produced through GATK just after UnifiedGenotyper. I am using GenomeAnalysisTK-2.3-6-gebbba25 and dbsnp_137.hg19.vcf. These variants are annotated by DepthOfCoverage, aplotypeScore,
,InbreedingCoeff and LowMQ ...

Basically, I generate two VCF files using UnifiedGenotyper separately, one for SNP and the other for INDEL.

the error for both is about the Allele Count (AC) tag:
##### ERROR MESSAGE: File F93.snp.vcf fails strict validation: the Allele Count (AC) tag is incorrect for the record at position
chr1:1225579, 1 vs. 1

I appreciate your comments,

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Can you upgrade to the latest version and see if the error persists?

  • I tried GenomeAnalysisTK-2.3-9-ge5ebf34, the error still persists !
    the Allele Count (AC) tag is incorrect for the record at position chrM:302, 2 vs. 2

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ok. Can you post the VCF record where the error occurs (chrM:302)?

  • naarkhoonaarkhoo Member
    edited February 2013
    chrM    302 rs66492218  AC  ACC,A   2239.04 PASS    AC=3,2;AF=0.375,0.250;AN=8;BaseQRankSum=0.822;DB;DP=195;FS=3.608;HaplotypeScore=45.0534;IndelType=MULTIALLELIC_INDEL;LowMQ=0.0000,0.0000,195;MLEAC=3,2;MLEAF=0.375,0.250;MQ=58.74;MQ0=0;MQRankSum=1.342;QD=11.48;RPA=8,9,7;RU=C;ReadPosRankSum=-1.658;STR;set=variant   
    
    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Can you please post the complete record? This is missing the format field and sample values.

  • naarkhoonaarkhoo Member
    edited February 2013
    chrM    302 rs66492218  AC  ACC,A   2239.04 PASS    AC=3,2;AF=0.375,0.250;AN=8;BaseQRankSum=0.822;DB;DP=195;FS=3.608;HaplotypeScore=45.0534;IndelType=MULTIALLELIC_INDEL;LowMQ=0.0000,0.0000,195;MLEAC=3,2;MLEAF=0.375,0.250;MQ=58.74;MQ0=0;MQRankSum=1.342;QD=11.48;RPA=8,9,7;RU=C;ReadPosRankSum=-1.658;STR;set=variant   GT:AD:DP:GQ:PL  ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/1:41,10,2:66:99:186,0,1210,211,1129,1608  0/1:3,34,0:47:21:987,0,21,1084,127,1467 2/2:0,0,36:52:99:1179,1264,1506,108,108,0   0/1:20,1,0:30:2:2,0,542,70,580,739
    
    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah, there you go -- the counts are indeed incorrect. The record includes 3 samples with the first allele but only 1 of the second, unlike what is reported in the AC tag. Did you exclude some samples after processing, perhaps?

  • naarkhoonaarkhoo Member
    edited February 2013

    no ! I didn't ! I used UnifiedGenotyper separably for calling SNP and indels from 93 samples ; then I did filtration

    for indel

    --filterExpression "QD < 2.0" \
    --filterName "QDFilter" \
    --filterExpression "ReadPosRankSum < -20.0" \
    --filterName "ReadPosFilter" \
    --filterExpression "FS > 200.0" \
    --filterName "FSFilter" \
    --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
    --filterName "HARD_TO_VALIDATE" \
    --filterExpression "QUAL < 30.0 || DP < 6 || DP > 5000 || HRun > 5" \
    --filterName "QualFilter"
    

    for SNP:

    --clusterSize 3 \
    --clusterWindowSize 10 \
    --filterExpression "QD < 2.0" \
    --filterName "QDFilter" \
    --filterExpression "MQ < 40.0" \
    --filterName "MQFilter" \
    --filterExpression "FS > 60.0" \
    --filterName "FSFilter" \
    --filterExpression "HaplotypeScore > 13.0" \
    --filterName "HaplotypeScoreFilter" \
    --filterExpression "MQRankSum < -12.5" \
    --filterName "MQRankSumFilter" \
    --filterExpression "ReadPosRankSum < -8.0" \
    --filterName "ReadPosRankSumFilter" \
    --filterExpression "QUAL < 30.0 || DP < 6 || DP > 5000 || HRun > 5" \
    

    and then I combined these two, using CombineVariants; as you see, I didn't remove any sample.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I see -- Just to be clear, are you getting the error when you run ValidateVariants on the files that come straight out of the UnifiedGenotyper, or on the combined VCF that results from CombineVariants?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah, actually I was mistaken -- the sample that is 2/2 counts for two alleles, so the tag is in fact correct. We'll have a closer look at this.

  • Surprisingly, I don't see the error again ! sounds like miracle :D , now I am getting something like File F93.all.vcf fails strict validation: the rsID rs35614524 for the record at position chr9:139565479 is not in dbSNP ... I haven't start to pray for it to get fix automatically

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, that's odd. Well, the new version will be out very soon, which is more thoroughly tested -- hopefully you won't suffer these weird phantom bugs then.

Sign In or Register to comment.