The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the github release page for download and details.

Read trimming


Sorry if this is the wrong forum for this question - I just thought someone might have an idea/opinion...

Should I trim/filter exome sequencing reads prior to mapping with BWA and variant calling using GATK? I am currently filtering out reads in which <80% of bases have quality>=Q30 but I lose >20% of my reads this way. Does GATK take quality into account therefore rendering pre-mapping filtering unecessary?

Thanks in advance,


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Kath,

    Yes, GATK tools take base qualities into account, so you don't need to do any filtering beforehand.

  • I ran some concordance simulations comparing trimmed reads and untrimmed reads using the GATK1.6 and GATK 2.0 best practice pipelines using 20 samples, and when I took the average concordance rates for the genotype calls (emitting all sites corresponding to cytoscan HD's marker positions, then running the analysisready pipeline and extracting only the sites that PASS) across all samples (comparing with CytoscanHD, filtering for hwe 0.05 and marker callrate > 0.90), I got GATK 2.5 trimmed > GATK2.5 untrimmed. While I'm convinced that I should upgrade to GATK 2, I'm still not sure whether I should trim or not. The difference between the concordance rates between the GATK2 trimmed and untrimmed pipelines is very small, around 0.03 percent, but I do see that there is consistently higher discordance rates for untrimmed pipeline variant calls that resulted in heterozygous calls as opposed to cytoscan's homozygous calls. I'm sure GSA must've ran some simulations to decide whether to trim or not, and I understand that GATK tools take quality scores into account, but am I correct to think that in order to maximize the "correctness" of the genotype/variant calls, I should trim the raw reads? If this is true, then there is obviously a tradeoff between "correctness" and the time/space used in the trimming pipeline since there are extra alignment and merging steps involved. Was the extra bit considered not much of a gain compared to the time and space costs of doing the extra few steps, and data management, etc? I would appreciate any extra insights. Thank you very much.
    I've attached an excel file containing a general summary of the concordance results.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    edited August 2013

    Hi there,

    First, I'm glad you found that its worth upgrading to GATK 2! It is a lot better than the older version.

    Second, because the GATK takes mapping quality into account, it is not necessary to trim the raw reads. In addition to the time/processing cost, trimming throws away information that can still be useable as long as you evaluate its worth properly (which the GATK does).

    Post edited by Geraldine_VdAuwera on
  • Hi, I have 2 additional questions:
    1) When you say we don't need to clip raw reads, do you also mean that we don't need to use the GATK's ClipReads tool, or does GSA use this tool? Since this tool allows soft clipping, we won't have to throw away any information, right?
    2) If you do recommend that we use this tool, at what stage should we use it? Right after mapping to the reference file before the dedup, realign, recal pipeline?
    Thank you very much!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    We really don't use any clipping tools anymore as part of our processing pipeline; we used to, but now our callers have become smart enough about dealing with these things that we don't need to.

  • kjclowerskjclowers UW MadisonMember

    Will submitting trimmed reads to BQSR affect recalibration? Doesn't it take read position into account?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @kjclowers, that's a good point. BQSR should still be able to handle softclips correctly, but not hard clips.

Sign In or Register to comment.