AD value higher than the DP value

redred Posts: 12Member
edited February 2013 in Ask the GATK team

Calling SNPs using a single bam file with the command:

 java -Xmx30g -jar GenomeAnalysisTK.jar \
 -T UnifiedGenotyper  \
 -R ref.fasta  \
 -I  input.bam  \
 -o output.vcf \

and when looking at the output file, most DP values were equal to the AD values and in few cases the AD value was higher. Thought that AD values are the unfiltered counts of all reads and DP fields describes the total depth of reads that passed the Unified genotyper’s internal quality control. Is it normal for the AD values to be higher than the DP value?

 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  eo227
 gi|218430358|emb|CU928163.2|    2317180 .       T       G       76.55   .       AC=1;AF=0.50;AN=2;BaseQRankSum=1.568;DP=10;Dels=0.00;FS=11.181;HRun=0;HaplotypeScore=15.8585;MQ=28.63;MQ0=0;MQRankSum=1.036;QD=7.66;ReadPosRankSum=-0.633;SB=-0.01      GT:AD:DP:GQ:PL  0/1:6,4:10:99:107,0,154

 gi|218430358|emb|CU928163.2|    2317181 .       T       G       71.96   .       AC=1;AF=0.50;AN=2;BaseQRankSum=0.550;DP=10;Dels=0.00;FS=0.000;HRun=1;HaplotypeScore=19.8574;MQ=28.63;MQ0=0;MQRankSum=-1.754;QD=7.20;ReadPosRankSum=-1.754;SB=-0.01      GT:AD:DP:GQ:PL  0/1:3,4:10:87.90:102,0,88
Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    Since DP is filtered and AD is unfiltered, then yes, AD can be higher than DP.

    http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthPerAlleleBySample.html

    Geraldine Van der Auwera, PhD

  • BelenBelen Posts: 2Member

    Hi Geraldine,

    reading this entry, and because we have found a similar problem, I realize that in fact the sum of AD is lower than DP in the second example here shown (0/1:3,4:10:87.90:102,0,88, AD is 7 while DP is 10). How could it be? If the read depth from unfiltered reads are the values in AD the sum should never be lower than DP, which is the sum of filtered reads, is that right?

    Thanks,
    Belen

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    Hi Belen,

    The allele depth only accounts for reads that are assigned to one of the alleles considered in the genotype. It is possible that some reads have an allele that is not considered, so they'll be counted in DP, but not in the sum of AD values.

    Geraldine Van der Auwera, PhD

  • BelenBelen Posts: 2Member

    Ah, ok that makes sense, thanks!

  • chenyu600chenyu600 Posts: 22Member
    edited July 2013

    Hi, Geraldine,
    I'm having a little trouble understanding 'the alleles considered in the genotype', what's the criteria? And I found a strange thing when looking at the output file(shown blew)

    1/1:0,1:127:99:4993,382,0

    1/1:0,1:85:99:3342,256,0

    1/1:0,1:72:99:2831,217,0

    Almost no reads be considered in genotype with such high depth ,Why?
    With the filter field is 'PASS' and GT/PL is well support the genotype,but AD/DP as seen above,how to make decide

    Post edited by chenyu600 on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    Hmm, that doesn't look very good. Can you confirm that you are using the latest version of GATK (2.6)?

    The best way to check if the call is reasonable is to look at the pileup in a genome browser like IGV.

    Geraldine Van der Auwera, PhD

  • HasaniHasani GermanyPosts: 22Member

    Hi,

    could you please help me understand why the tool reported G when the depth is 0?

    chr2 198257795 rs4685 T C,**G**,<NON_REF> 405.77 0 BaseQRankSum=-4.126

    ClippingRankSum=-2.580 DB DP=394 MLEAC=1,0,0 MLEAF=0.500,0.00,0.00 MQ=41.57 MQ0=0

    MQRankSum=-0.581 ReadPosRankSum=2.199 GT:AD:DP:GQ:PL:SB

    0/1:336,58,**0**,0:394:99:434,0,8805,1442,8979,10422,1442,8979,10422,10421:187,149,33,25

    Thank you in advance!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    @Hasani This looks like a bug we had in an older version where the AD field values were give in the wrong order. Did you check the data to see if there is really 0 depth for that allele?

    Geraldine Van der Auwera, PhD

  • HasaniHasani GermanyPosts: 22Member
    edited October 2014

    I'm using GATK/3.2-2, and you are right! there are no reads supporting G.
    Could you please explain what you meant by "wrong order"?

    Post edited by Hasani on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    Ah, it sounds like this is a different case, not the bug I was thinking of. By wrong order I mean the AD counts for the alleles were sometimes reversed. But this sounds different, possibly something to do with soft clips. It is possible that the program saw a G on some reads that were soft-clipped (which don't count toward the AD value), and kept the allele in the list of candidate alleles. Since it seems the right call is made anyway, I don't think this is a cause for concern.

    Geraldine Van der Auwera, PhD

  • chenyu600chenyu600 Posts: 22Member

    @Geraldine_VdAuwera said:
    Hi Belen,

    The allele depth only accounts for reads that are assigned to one of the alleles considered in the genotype. It is possible that some reads have an allele that is not considered, so they'll be counted in DP, but not in the sum of AD values.

    Hi @Geraldine_VdAuwera,
    What kind of reads that will have an allele but is not considered when genotyping?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin

    @chenyu600 in the context of that discussion, I meant for example if your ref is A supported by 9 reads, and you have 10 reads that have T, and maybe one read that has C. You will typically get an A->T call with DP=20 and AD=9,10, so sum(AD) <DP because the C read is counted in DP but not AD.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.