We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

DP and chromosome filtering

Hi team,

This is two separate questions:

  1. Starting with a vcf file, plotting the depth (DP) distribution gives a nice, slightly asymmetrical bell-shaped curve. Given that SNPs with very high and very low coverages should be excluded, how does one decide what is very high and low. e.g. 5% either side ?

  2. I'm only interested in chromosomes 2L, 2R, 3L, 3R and X of my Drosophila sequences. Filtering for these is easy with a Perl script but I'm trying to do this earlier on in the GATK processes. I've tried ...-L 2L -L 2R -L 3L ...etc, -L 2L 2R 3L ....etc and, -L 2L, 2R, 3R...etc but the result is either input error message or chromosome 2L only.

Many thanks and apologies if I've missed anything in the instructions.



Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Blue,

    1. That is all up to you depending on your experimental design, expected coverage etc.

    2. -L 2L -L 2R -L 3L should work. What is the error message you get when you try that?

  • BlueBlue Member

    Thanks for answering question 2, was probably just a typo, so apologies for the menial question.

    My expected average coverage is 35X per sample. I've been advised to exclude SNPs in the tails of the depth distribution as the calls are unreliable, which is fairly straightforward, as long as I know exactly where to draw the lines. This having been said, I suppose this can potentially exclude a small proportion of true SNPs which are located in regions of extreme GC content (and thus have less reads), and equally exclude SNPs that just happen to have a high total read depth (DP). On this basis, am I just better off filtering by QUAL alone?

Sign In or Register to comment.