Attention:
The frontline support team will be slow on the forum because we are occupied with a GATK Workshop on March 26th and 27th 2019. We will be back and available to answer questions on the forum on March 28th 2019.

Quality trimming

Hi GATK team,

I was wondering, your best practices for data preprocessing (https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165) don't mention any trimming (using i.e. trim_galore or fastx). It seems like at least some time ago this was pretty standard. Does this mean it is not advised anymore? Can I really take my raw fastq file and map them using BWA directly, without any filtering?

Highest regards,

Freek.

Best Answer

  • shleeshlee Cambridge ✭✭✭✭✭
    Accepted Answer

    Hi @freek,

    Our current best practice workflows do not require trimming reads. Trimming of adaptor sequence is something your sequencing facility should have done on your behalf. What then remains in your BAM are adaptor sequences that arise from 3' read through, for shorter than expected insert sizes. Given good sample preparation, e.g. that provided by your sequencing center, the expectation is that 3' read through instances are rare and HaplotypeCaller reassembly will soft-clip such overlapping sequences and discount the mate to (i) remove adapter sequences from consideration and (ii) remove duplicate evidence from consideration in germline variant calling.

    If your data requires extensive trimming, it is up to you to perform such preprocessing before using our workflows.

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    Accepted Answer

    Hi @freek,

    Our current best practice workflows do not require trimming reads. Trimming of adaptor sequence is something your sequencing facility should have done on your behalf. What then remains in your BAM are adaptor sequences that arise from 3' read through, for shorter than expected insert sizes. Given good sample preparation, e.g. that provided by your sequencing center, the expectation is that 3' read through instances are rare and HaplotypeCaller reassembly will soft-clip such overlapping sequences and discount the mate to (i) remove adapter sequences from consideration and (ii) remove duplicate evidence from consideration in germline variant calling.

    If your data requires extensive trimming, it is up to you to perform such preprocessing before using our workflows.

Sign In or Register to comment.