The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

AD value higher than the DP value

redred Posts: 12
edited February 2013 in Ask the GATK team

Calling SNPs using a single bam file with the command:

 java -Xmx30g -jar GenomeAnalysisTK.jar \
 -T UnifiedGenotyper  \
 -R ref.fasta  \
 -I  input.bam  \
 -o output.vcf \

and when looking at the output file, most DP values were equal to the AD values and in few cases the AD value was higher. Thought that AD values are the unfiltered counts of all reads and DP fields describes the total depth of reads that passed the Unified genotyper’s internal quality control. Is it normal for the AD values to be higher than the DP value?

 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  eo227
 gi|218430358|emb|CU928163.2|    2317180 .       T       G       76.55   .       AC=1;AF=0.50;AN=2;BaseQRankSum=1.568;DP=10;Dels=0.00;FS=11.181;HRun=0;HaplotypeScore=15.8585;MQ=28.63;MQ0=0;MQRankSum=1.036;QD=7.66;ReadPosRankSum=-0.633;SB=-0.01      GT:AD:DP:GQ:PL  0/1:6,4:10:99:107,0,154

 gi|218430358|emb|CU928163.2|    2317181 .       T       G       71.96   .       AC=1;AF=0.50;AN=2;BaseQRankSum=0.550;DP=10;Dels=0.00;FS=0.000;HRun=1;HaplotypeScore=19.8574;MQ=28.63;MQ0=0;MQRankSum=-1.754;QD=7.20;ReadPosRankSum=-1.754;SB=-0.01      GT:AD:DP:GQ:PL  0/1:3,4:10:87.90:102,0,88
Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    Since DP is filtered and AD is unfiltered, then yes, AD can be higher than DP.

    http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthPerAlleleBySample.html

    Geraldine Van der Auwera, PhD

  • BelenBelen Posts: 2

    Hi Geraldine,

    reading this entry, and because we have found a similar problem, I realize that in fact the sum of AD is lower than DP in the second example here shown (0/1:3,4:10:87.90:102,0,88, AD is 7 while DP is 10). How could it be? If the read depth from unfiltered reads are the values in AD the sum should never be lower than DP, which is the sum of filtered reads, is that right?

    Thanks,
    Belen

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    Hi Belen,

    The allele depth only accounts for reads that are assigned to one of the alleles considered in the genotype. It is possible that some reads have an allele that is not considered, so they'll be counted in DP, but not in the sum of AD values.

    Geraldine Van der Auwera, PhD

  • BelenBelen Posts: 2

    Ah, ok that makes sense, thanks!

  • chenyu600chenyu600 Posts: 22
    edited July 2013

    Hi, Geraldine,
    I'm having a little trouble understanding 'the alleles considered in the genotype', what's the criteria? And I found a strange thing when looking at the output file(shown blew)

    1/1:0,1:127:99:4993,382,0

    1/1:0,1:85:99:3342,256,0

    1/1:0,1:72:99:2831,217,0

    Almost no reads be considered in genotype with such high depth ,Why?
    With the filter field is 'PASS' and GT/PL is well support the genotype,but AD/DP as seen above,how to make decide

    Post edited by chenyu600 on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    Hmm, that doesn't look very good. Can you confirm that you are using the latest version of GATK (2.6)?

    The best way to check if the call is reasonable is to look at the pileup in a genome browser like IGV.

    Geraldine Van der Auwera, PhD

  • HasaniHasani GermanyPosts: 30

    Hi,

    could you please help me understand why the tool reported G when the depth is 0?

    chr2 198257795 rs4685 T C,**G**,<NON_REF> 405.77 0 BaseQRankSum=-4.126

    ClippingRankSum=-2.580 DB DP=394 MLEAC=1,0,0 MLEAF=0.500,0.00,0.00 MQ=41.57 MQ0=0

    MQRankSum=-0.581 ReadPosRankSum=2.199 GT:AD:DP:GQ:PL:SB

    0/1:336,58,**0**,0:394:99:434,0,8805,1442,8979,10422,1442,8979,10422,10421:187,149,33,25

    Thank you in advance!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    @Hasani This looks like a bug we had in an older version where the AD field values were give in the wrong order. Did you check the data to see if there is really 0 depth for that allele?

    Geraldine Van der Auwera, PhD

  • HasaniHasani GermanyPosts: 30
    edited October 2014

    I'm using GATK/3.2-2, and you are right! there are no reads supporting G.
    Could you please explain what you meant by "wrong order"?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    Ah, it sounds like this is a different case, not the bug I was thinking of. By wrong order I mean the AD counts for the alleles were sometimes reversed. But this sounds different, possibly something to do with soft clips. It is possible that the program saw a G on some reads that were soft-clipped (which don't count toward the AD value), and kept the allele in the list of candidate alleles. Since it seems the right call is made anyway, I don't think this is a cause for concern.

    Geraldine Van der Auwera, PhD

  • chenyu600chenyu600 Posts: 22

    @Geraldine_VdAuwera said:
    Hi Belen,

    The allele depth only accounts for reads that are assigned to one of the alleles considered in the genotype. It is possible that some reads have an allele that is not considered, so they'll be counted in DP, but not in the sum of AD values.

    Hi @Geraldine_VdAuwera,
    What kind of reads that will have an allele but is not considered when genotyping?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    @chenyu600 in the context of that discussion, I meant for example if your ref is A supported by 9 reads, and you have 10 reads that have T, and maybe one read that has C. You will typically get an A->T call with DP=20 and AD=9,10, so sum(AD) <DP because the C read is counted in DP but not AD.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.