The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

☞ Did we ask for a bug report?

Then follow instructions in Article#1894.

☞ Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

Read trimming

Member Posts: 43

Hello,

Sorry if this is the wrong forum for this question - I just thought someone might have an idea/opinion...

Should I trim/filter exome sequencing reads prior to mapping with BWA and variant calling using GATK? I am currently filtering out reads in which <80% of bases have quality>=Q30 but I lose >20% of my reads this way. Does GATK take quality into account therefore rendering pre-mapping filtering unecessary?

Thanks in advance,
Kath

Tagged:

Answers

• Administrator, Dev Posts: 11,146 admin

Hi Kath,

Yes, GATK tools take base qualities into account, so you don't need to do any filtering beforehand.

Geraldine Van der Auwera, PhD

• Member Posts: 3

I ran some concordance simulations comparing trimmed reads and untrimmed reads using the GATK1.6 and GATK 2.0 best practice pipelines using 20 samples, and when I took the average concordance rates for the genotype calls (emitting all sites corresponding to cytoscan HD's marker positions, then running the analysisready pipeline and extracting only the sites that PASS) across all samples (comparing with CytoscanHD, filtering for hwe 0.05 and marker callrate > 0.90), I got GATK 2.5 trimmed > GATK2.5 untrimmed. While I'm convinced that I should upgrade to GATK 2, I'm still not sure whether I should trim or not. The difference between the concordance rates between the GATK2 trimmed and untrimmed pipelines is very small, around 0.03 percent, but I do see that there is consistently higher discordance rates for untrimmed pipeline variant calls that resulted in heterozygous calls as opposed to cytoscan's homozygous calls. I'm sure GSA must've ran some simulations to decide whether to trim or not, and I understand that GATK tools take quality scores into account, but am I correct to think that in order to maximize the "correctness" of the genotype/variant calls, I should trim the raw reads? If this is true, then there is obviously a tradeoff between "correctness" and the time/space used in the trimming pipeline since there are extra alignment and merging steps involved. Was the extra bit considered not much of a gain compared to the time and space costs of doing the extra few steps, and data management, etc? I would appreciate any extra insights. Thank you very much.
I've attached an excel file containing a general summary of the concordance results.

• Administrator, Dev Posts: 11,146 admin
edited August 2013

Hi there,

First, I'm glad you found that its worth upgrading to GATK 2! It is a lot better than the older version.

Second, because the GATK takes mapping quality into account, it is not necessary to trim the raw reads. In addition to the time/processing cost, trimming throws away information that can still be useable as long as you evaluate its worth properly (which the GATK does).

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

• Member Posts: 3

Hi, I have 2 additional questions:
1) When you say we don't need to clip raw reads, do you also mean that we don't need to use the GATK's ClipReads tool, or does GSA use this tool? Since this tool allows soft clipping, we won't have to throw away any information, right?
2) If you do recommend that we use this tool, at what stage should we use it? Right after mapping to the reference file before the dedup, realign, recal pipeline?
Thank you very much!

• Administrator, Dev Posts: 11,146 admin

We really don't use any clipping tools anymore as part of our processing pipeline; we used to, but now our callers have become smart enough about dealing with these things that we don't need to.

Geraldine Van der Auwera, PhD

• UW MadisonMember Posts: 14

Will submitting trimmed reads to BQSR affect recalibration? Doesn't it take read position into account?

• Administrator, Dev Posts: 11,146 admin

@kjclowers, that's a good point. BQSR should still be able to handle softclips correctly, but not hard clips.

Geraldine Van der Auwera, PhD

Sign In or Register to comment.