Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

ReadAdaptorTrimmer Best Pratices

micknudsenmicknudsen DenmarkMember ✭✭


Is there a "Best Practices" for how to use ReadAdaptorTrimmer? To me it seems that there is a Catch 22, if one wants to use GATK and Picard.

According to the ReadAdaptorTrimmer documentation: "Read data MUST be in query name ordering as produced, for example with Picard's FastqToBam". Therefore, I would start by doing

java picard.jar FastqToSam FASTQ={r1_file} FASTQ2={r2_file} OUTPUT={bam_file} SM={sample} SORT_ORDER=queryname

to convert my FASTQ files into a sorted uBAM file. However, ReadAdaptorTrimmer requires the BAM file to be indexed, but if I then try

java picard.jar BuildBamIndex INPUT={bam_file}

it fails because BuildBamIndex requires that the BAM file is sorted by coordinate (which does not make sense since the reads are not yet aligned).


Issue · Github
by Sheila

Issue Number
Last Updated
Closed By


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Michael,

    I have asked one of our team members to help with this one. She will get back to me sometime next week.


  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited November 2015

    Hi Michael,

    I recommend two different tools that combined achieve what I assume you need--MarkIlluminaAdapters and SamToFastq.

    Please use Picard's MarkIlluminaAdapters to mark your 3' adapter sequences. The tool will add an XT tag to the BAM file indicating the start position of any adapter sequence and also provide a metrics file of counts of bases clipped versus reads. You can adjust the default standard Illumina adapter sequences to any adapter sequence you want using the FIVE_PRIME_ADAPTER and THREE_PRIME_ADAPTER parameters. To clear and add new adapter sequences first set ADAPTERS to 'null' then specify each sequence with the parameter.

    To clip the adapter sequences, use Picard's SamToFastq. You will specify the CLIPPING_ATTRIBUTE=XT and a CLIPPING_ACTION of either (1) X to hard-clip, (2) N to change bases to Ns or (3) a number, e.g. 2, to change the base qualities of those positions to the value, e.g. 2.

    Remember that you can restore original read sequences and base qualities, amongst other attributes, after alignment using Picard's MergeBamAlignment.

    These recommendations aside, I was able to recapitulate your errors using my own file. These errors persist even when commands are run in unsafe mode, designated with -U, that allow GATK commands to process files without indexes. Since I am new to the GATK team, I had to ask to find out that ReadAdaptorTrimmer isn't on the team's radar--that is, we don't use it. Its presence is some vestige of development. This tool blindly strips what it assumes are adaptor sequences but what are technically sequences 3' of overlapping sequences of a certain length. If you are processing sequencing samples with typical aims, I would strongly discourage using any tool that doesn't specifically take into account the sequences of adapters in trimming.

    I hope I've been helpful. Let me know if I can clarify any points.

    Post edited by shlee on
  • micknudsenmicknudsen DenmarkMember ✭✭

    Thanks, @shlee! I will go ahead and try your approach. I will let you know if I run into something unexpected.

Sign In or Register to comment.