The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

DP and chromosome filtering

BlueBlue Member Posts: 48

Hi team,

This is two separate questions:

  1. Starting with a vcf file, plotting the depth (DP) distribution gives a nice, slightly asymmetrical bell-shaped curve. Given that SNPs with very high and very low coverages should be excluded, how does one decide what is very high and low. e.g. 5% either side ?

  2. I'm only interested in chromosomes 2L, 2R, 3L, 3R and X of my Drosophila sequences. Filtering for these is easy with a Perl script but I'm trying to do this earlier on in the GATK processes. I've tried ...-L 2L -L 2R -L 3L ...etc, -L 2L 2R 3L ....etc and, -L 2L, 2R, 3R...etc but the result is either input error message or chromosome 2L only.

Many thanks and apologies if I've missed anything in the instructions.



Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,015 admin

    Hi Blue,

    1. That is all up to you depending on your experimental design, expected coverage etc.

    2. -L 2L -L 2R -L 3L should work. What is the error message you get when you try that?

    Geraldine Van der Auwera, PhD

  • BlueBlue Member Posts: 48

    Thanks for answering question 2, was probably just a typo, so apologies for the menial question.

    My expected average coverage is 35X per sample. I've been advised to exclude SNPs in the tails of the depth distribution as the calls are unreliable, which is fairly straightforward, as long as I know exactly where to draw the lines. This having been said, I suppose this can potentially exclude a small proportion of true SNPs which are located in regions of extreme GC content (and thus have less reads), and equally exclude SNPs that just happen to have a high total read depth (DP). On this basis, am I just better off filtering by QUAL alone?

Sign In or Register to comment.