DiagnoseTarget output questions

moranmoran The BroadMember

Hello,

  1. Is DiagnoseTargets counting only reads that have mapped uniquely? Is that one of the default filters?

  2. From the vcf I see that
    IDP - Average depth across the interval. Sum of the depth in a loci divided by interval size.
    LL - Number of loci for this sample, in this interval with low coverage (below the minimum coverage) but not zero
    ZL - Number of loci for this sample, in this interval with zero coverage.

I'm interested in the total number of reads mapped to each interval.

Is this true that IDP = #reads_in_this_interval/(LL+ZL) ?
so if I want to extract #reads_in_this_interval, I can look at IDP*(LL+ZL)? I have different number of reads in each sample, so I first need to normalize it.

Thanks!
Moran.

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Moran,

    1. Yes, all GATK tools only look at uniquely mapped reads / primary alignments.

    2. No, this isn't true. I don't think you can directly compute the number of reads in an interval from the coverage metrics. For one, you can't be sure that all your reads have the same usable length (e.g. if some are clipped) so they may not be contributing equally to coverage. I'm not sure trying to normalize this way makes sense. If you have different mean coverage overall for your samples, then it makes sense to me to normalize against that; but normalizing per interval? I don't see it. But I suppose it depends what you're trying to compare.

  • moranmoran The BroadMember

    Thanks for the uniqueness answer!

    Regarding normalization, let's discuss the following scenario: I have 3 samples, with total number of mapped reads being 10M, 20M and 100M respectively. So if, for a specific interval, I have different coverage values, I want to first normalize it to the total number of mapped reads. I expect to have a higher coverage in the third sample, just since I have more reads from it.

    does that make sense?

    thanks!

  • moranmoran The BroadMember

    Ya, sample normalization is exactly what I intended to do. sorry if I wasn't clear.

    What I'm wondering is if there a way to extract the total number of mapped reads for each sample from the vcf output file?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I don't think DiagnoseTargets currently reports #reads per sample per interval, unfortunately. I'll see if we can put that on the todo feature list.

Sign In or Register to comment.