Read trimming

KathKath Posts: 36Member

Hello,

Sorry if this is the wrong forum for this question - I just thought someone might have an idea/opinion...

Should I trim/filter exome sequencing reads prior to mapping with BWA and variant calling using GATK? I am currently filtering out reads in which <80% of bases have quality>=Q30 but I lose >20% of my reads this way. Does GATK take quality into account therefore rendering pre-mapping filtering unecessary?

Thanks in advance, Kath

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,818Administrator, GATK Developer admin

    Hi Kath,

    Yes, GATK tools take base qualities into account, so you don't need to do any filtering beforehand.

    Geraldine Van der Auwera, PhD

  • jjinkingjjinking Posts: 3Member

    I ran some concordance simulations comparing trimmed reads and untrimmed reads using the GATK1.6 and GATK 2.0 best practice pipelines using 20 samples, and when I took the average concordance rates for the genotype calls (emitting all sites corresponding to cytoscan HD's marker positions, then running the analysisready pipeline and extracting only the sites that PASS) across all samples (comparing with CytoscanHD, filtering for hwe 0.05 and marker callrate > 0.90), I got GATK 2.5 trimmed > GATK2.5 untrimmed. While I'm convinced that I should upgrade to GATK 2, I'm still not sure whether I should trim or not. The difference between the concordance rates between the GATK2 trimmed and untrimmed pipelines is very small, around 0.03 percent, but I do see that there is consistently higher discordance rates for untrimmed pipeline variant calls that resulted in heterozygous calls as opposed to cytoscan's homozygous calls. I'm sure GSA must've ran some simulations to decide whether to trim or not, and I understand that GATK tools take quality scores into account, but am I correct to think that in order to maximize the "correctness" of the genotype/variant calls, I should trim the raw reads? If this is true, then there is obviously a tradeoff between "correctness" and the time/space used in the trimming pipeline since there are extra alignment and merging steps involved. Was the extra bit considered not much of a gain compared to the time and space costs of doing the extra few steps, and data management, etc? I would appreciate any extra insights. Thank you very much. I've attached an excel file containing a general summary of the concordance results.

    xlsx
    xlsx
    submit.to.broad.20130812.xlsx
    12K
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,818Administrator, GATK Developer admin
    edited August 2013

    Hi there,

    First, I'm glad you found that its worth upgrading to GATK 2! It is a lot better than the older version.

    Second, because the GATK takes mapping quality into account, it is not necessary to trim the raw reads. In addition to the time/processing cost, trimming throws away information that can still be useable as long as you evaluate its worth properly (which the GATK does).

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • jjinkingjjinking Posts: 3Member

    Hi, I have 2 additional questions: 1) When you say we don't need to clip raw reads, do you also mean that we don't need to use the GATK's ClipReads tool, or does GSA use this tool? Since this tool allows soft clipping, we won't have to throw away any information, right? 2) If you do recommend that we use this tool, at what stage should we use it? Right after mapping to the reference file before the dedup, realign, recal pipeline? Thank you very much!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,818Administrator, GATK Developer admin

    We really don't use any clipping tools anymore as part of our processing pipeline; we used to, but now our callers have become smart enough about dealing with these things that we don't need to.

    Geraldine Van der Auwera, PhD

  • kjclowerskjclowers UW MadisonPosts: 14Member

    Will submitting trimmed reads to BQSR affect recalibration? Doesn't it take read position into account?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,818Administrator, GATK Developer admin

    @kjclowers, that's a good point. BQSR should still be able to handle softclips correctly, but not hard clips.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.