Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

DP and chromosome filtering

BlueBlue Posts: 24Member

Hi team,

This is two separate questions:

  1. Starting with a vcf file, plotting the depth (DP) distribution gives a nice, slightly asymmetrical bell-shaped curve. Given that SNPs with very high and very low coverages should be excluded, how does one decide what is very high and low. e.g. 5% either side ?

  2. I'm only interested in chromosomes 2L, 2R, 3L, 3R and X of my Drosophila sequences. Filtering for these is easy with a Perl script but I'm trying to do this earlier on in the GATK processes. I've tried ...-L 2L -L 2R -L 3L ...etc, -L 2L 2R 3L ....etc and, -L 2L, 2R, 3R...etc but the result is either input error message or chromosome 2L only.

Many thanks and apologies if I've missed anything in the instructions.

Cheers,

Blue

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,274Administrator, GATK Developer admin

    Hi Blue,

    1. That is all up to you depending on your experimental design, expected coverage etc.

    2. -L 2L -L 2R -L 3L should work. What is the error message you get when you try that?

    Geraldine Van der Auwera, PhD

  • BlueBlue Posts: 24Member

    Thanks for answering question 2, was probably just a typo, so apologies for the menial question.

    My expected average coverage is 35X per sample. I've been advised to exclude SNPs in the tails of the depth distribution as the calls are unreliable, which is fairly straightforward, as long as I know exactly where to draw the lines. This having been said, I suppose this can potentially exclude a small proportion of true SNPs which are located in regions of extreme GC content (and thus have less reads), and equally exclude SNPs that just happen to have a high total read depth (DP). On this basis, am I just better off filtering by QUAL alone?

Sign In or Register to comment.