The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

ReadAdaptorTrimmer Best Pratices

micknudsenmicknudsen DenmarkMember Posts: 34

Hi,

Is there a "Best Practices" for how to use ReadAdaptorTrimmer? To me it seems that there is a Catch 22, if one wants to use GATK and Picard.

According to the ReadAdaptorTrimmer documentation: "Read data MUST be in query name ordering as produced, for example with Picard's FastqToBam". Therefore, I would start by doing

java picard.jar FastqToSam FASTQ={r1_file} FASTQ2={r2_file} OUTPUT={bam_file} SM={sample} SORT_ORDER=queryname

to convert my FASTQ files into a sorted uBAM file. However, ReadAdaptorTrimmer requires the BAM file to be indexed, but if I then try

java picard.jar BuildBamIndex INPUT={bam_file}

it fails because BuildBamIndex requires that the BAM file is sorted by coordinate (which does not make sense since the reads are not yet aligned).

Thanks,
Michael

Issue · Github
by Sheila

Issue Number
254
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator, Dev Posts: 4,583 admin

    @micknudsen
    Hi Michael,

    I have asked one of our team members to help with this one. She will get back to me sometime next week.

    -Sheila

  • shleeshlee CambridgeMember, Broadie, Moderator Posts: 497 admin
    edited November 2015

    Hi Michael,

    I recommend two different tools that combined achieve what I assume you need--MarkIlluminaAdapters and SamToFastq.

    Please use Picard's MarkIlluminaAdapters to mark your 3' adapter sequences. The tool will add an XT tag to the BAM file indicating the start position of any adapter sequence and also provide a metrics file of counts of bases clipped versus reads. You can adjust the default standard Illumina adapter sequences to any adapter sequence you want using the FIVE_PRIME_ADAPTER and THREE_PRIME_ADAPTER parameters. To clear and add new adapter sequences first set ADAPTERS to 'null' then specify each sequence with the parameter.

    To clip the adapter sequences, use Picard's SamToFastq. You will specify the CLIPPING_ATTRIBUTE=XT and a CLIPPING_ACTION of either (1) X to hard-clip, (2) N to change bases to Ns or (3) a number, e.g. 2, to change the base qualities of those positions to the value, e.g. 2.

    Remember that you can restore original read sequences and base qualities, amongst other attributes, after alignment using Picard's MergeBamAlignment.

    These recommendations aside, I was able to recapitulate your errors using my own file. These errors persist even when commands are run in unsafe mode, designated with -U, that allow GATK commands to process files without indexes. Since I am new to the GATK team, I had to ask to find out that ReadAdaptorTrimmer isn't on the team's radar--that is, we don't use it. Its presence is some vestige of development. This tool blindly strips what it assumes are adaptor sequences but what are technically sequences 3' of overlapping sequences of a certain length. If you are processing sequencing samples with typical aims, I would strongly discourage using any tool that doesn't specifically take into account the sequences of adapters in trimming.

    I hope I've been helpful. Let me know if I can clarify any points.

    Post edited by shlee on
  • micknudsenmicknudsen DenmarkMember Posts: 34

    Thanks, @shlee! I will go ahead and try your approach. I will let you know if I run into something unexpected.

Sign In or Register to comment.