Quality trimming

Hi GATK team,

I was wondering, your best practices for data preprocessing (https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165) don't mention any trimming (using i.e. trim_galore or fastx). It seems like at least some time ago this was pretty standard. Does this mean it is not advised anymore? Can I really take my raw fastq file and map them using BWA directly, without any filtering?

Highest regards,

Freek.

Best Answer

  • shleeshlee Cambridge admin
    Accepted Answer

    Hi @freek,

    Our current best practice workflows do not require trimming reads. Trimming of adaptor sequence is something your sequencing facility should have done on your behalf. What then remains in your BAM are adaptor sequences that arise from 3' read through, for shorter than expected insert sizes. Given good sample preparation, e.g. that provided by your sequencing center, the expectation is that 3' read through instances are rare and HaplotypeCaller reassembly will soft-clip such overlapping sequences and discount the mate to (i) remove adapter sequences from consideration and (ii) remove duplicate evidence from consideration in germline variant calling.

    If your data requires extensive trimming, it is up to you to perform such preprocessing before using our workflows.

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    Accepted Answer

    Hi @freek,

    Our current best practice workflows do not require trimming reads. Trimming of adaptor sequence is something your sequencing facility should have done on your behalf. What then remains in your BAM are adaptor sequences that arise from 3' read through, for shorter than expected insert sizes. Given good sample preparation, e.g. that provided by your sequencing center, the expectation is that 3' read through instances are rare and HaplotypeCaller reassembly will soft-clip such overlapping sequences and discount the mate to (i) remove adapter sequences from consideration and (ii) remove duplicate evidence from consideration in germline variant calling.

    If your data requires extensive trimming, it is up to you to perform such preprocessing before using our workflows.

Sign In or Register to comment.