If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Read trimming


Sorry if this is the wrong forum for this question - I just thought someone might have an idea/opinion...

Should I trim/filter exome sequencing reads prior to mapping with BWA and variant calling using GATK? I am currently filtering out reads in which <80% of bases have quality>=Q30 but I lose >20% of my reads this way. Does GATK take quality into account therefore rendering pre-mapping filtering unecessary?

Thanks in advance,


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Kath,

    Yes, GATK tools take base qualities into account, so you don't need to do any filtering beforehand.

  • jjinkingjjinking Member

    I ran some concordance simulations comparing trimmed reads and untrimmed reads using the GATK1.6 and GATK 2.0 best practice pipelines using 20 samples, and when I took the average concordance rates for the genotype calls (emitting all sites corresponding to cytoscan HD's marker positions, then running the analysisready pipeline and extracting only the sites that PASS) across all samples (comparing with CytoscanHD, filtering for hwe 0.05 and marker callrate > 0.90), I got GATK 2.5 trimmed > GATK2.5 untrimmed. While I'm convinced that I should upgrade to GATK 2, I'm still not sure whether I should trim or not. The difference between the concordance rates between the GATK2 trimmed and untrimmed pipelines is very small, around 0.03 percent, but I do see that there is consistently higher discordance rates for untrimmed pipeline variant calls that resulted in heterozygous calls as opposed to cytoscan's homozygous calls. I'm sure GSA must've ran some simulations to decide whether to trim or not, and I understand that GATK tools take quality scores into account, but am I correct to think that in order to maximize the "correctness" of the genotype/variant calls, I should trim the raw reads? If this is true, then there is obviously a tradeoff between "correctness" and the time/space used in the trimming pipeline since there are extra alignment and merging steps involved. Was the extra bit considered not much of a gain compared to the time and space costs of doing the extra few steps, and data management, etc? I would appreciate any extra insights. Thank you very much.
    I've attached an excel file containing a general summary of the concordance results.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited August 2013

    Hi there,

    First, I'm glad you found that its worth upgrading to GATK 2! It is a lot better than the older version.

    Second, because the GATK takes mapping quality into account, it is not necessary to trim the raw reads. In addition to the time/processing cost, trimming throws away information that can still be useable as long as you evaluate its worth properly (which the GATK does).

    Post edited by Geraldine_VdAuwera on
  • jjinkingjjinking Member

    Hi, I have 2 additional questions:
    1) When you say we don't need to clip raw reads, do you also mean that we don't need to use the GATK's ClipReads tool, or does GSA use this tool? Since this tool allows soft clipping, we won't have to throw away any information, right?
    2) If you do recommend that we use this tool, at what stage should we use it? Right after mapping to the reference file before the dedup, realign, recal pipeline?
    Thank you very much!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We really don't use any clipping tools anymore as part of our processing pipeline; we used to, but now our callers have become smart enough about dealing with these things that we don't need to.

  • kjclowerskjclowers UW MadisonMember

    Will submitting trimmed reads to BQSR affect recalibration? Doesn't it take read position into account?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @kjclowers, that's a good point. BQSR should still be able to handle softclips correctly, but not hard clips.

Sign In or Register to comment.