Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

DP and chromosome filtering

Hi team,

This is two separate questions:

  1. Starting with a vcf file, plotting the depth (DP) distribution gives a nice, slightly asymmetrical bell-shaped curve. Given that SNPs with very high and very low coverages should be excluded, how does one decide what is very high and low. e.g. 5% either side ?

  2. I'm only interested in chromosomes 2L, 2R, 3L, 3R and X of my Drosophila sequences. Filtering for these is easy with a Perl script but I'm trying to do this earlier on in the GATK processes. I've tried ...-L 2L -L 2R -L 3L ...etc, -L 2L 2R 3L ....etc and, -L 2L, 2R, 3R...etc but the result is either input error message or chromosome 2L only.

Many thanks and apologies if I've missed anything in the instructions.



Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Blue,

    1. That is all up to you depending on your experimental design, expected coverage etc.

    2. -L 2L -L 2R -L 3L should work. What is the error message you get when you try that?

  • BlueBlue Member

    Thanks for answering question 2, was probably just a typo, so apologies for the menial question.

    My expected average coverage is 35X per sample. I've been advised to exclude SNPs in the tails of the depth distribution as the calls are unreliable, which is fairly straightforward, as long as I know exactly where to draw the lines. This having been said, I suppose this can potentially exclude a small proportion of true SNPs which are located in regions of extreme GC content (and thus have less reads), and equally exclude SNPs that just happen to have a high total read depth (DP). On this basis, am I just better off filtering by QUAL alone?

Sign In or Register to comment.