BAM with soft-clipped primer sequences as an input for RevertSam (Tutorial #6483)

garrulus_glandariusgarrulus_glandarius UkraineMember
edited October 2017 in Ask the GATK team

Could I use RevertSam on primer-clipped BAM and then MergeBamAlignment using the primer-clipped BAM and uBAM originated from it as input?
I'm analyzing paired-end TruSeq Custom Amplicon panel data, thus the workflow includes soft-clipping primer sequences with BamClipper. How do I implement the soft-clipping step into the data cleaning pipeline offered in the Tutorial #6483?

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @garrulus_glandarius
    Hi,

    We will get back to you here.

    -Sheila

  • garrulus_glandariusgarrulus_glandarius UkraineMember
    edited October 2017

    So, I've tried this. RevertSam couldn't generate unmapped BAM out of the soft-clipped BAM. I am using Picard 2.13.2.

    java -Xmx2G -jar /usr/local/bin/picard/build/libs/picard.jar RevertSam I=test.clipped.bam O=unmapped.clipped.bam SANITIZE=true MAX_DISCARD_FRACTION=0.005 ATTRIBUTE_TO_CLEAR=XT ATTRIBUTE_TO_CLEAR=XN ATTRIBUTE_TO_CLEAR=AS ATTRIBUTE_TO_CLEAR=OC ATTRIBUTE_TO_CLEAR=OP SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=true REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true
    16:42:04.060 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/bin/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Tue Oct 24 16:42:04 EEST 2017] RevertSam INPUT=test.clipped.bam OUTPUT=unmapped.clipped.bam SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=true REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true ATTRIBUTE_TO_CLEAR=[NM, UQ, PG, MD, MQ, SA, MC, AS, XT, XN, AS, OC, OP] SANITIZE=true MAX_DISCARD_FRACTION=0.005 OUTPUT_BY_READGROUP=false OUTPUT_BY_READGROUP_FILE_FORMAT=dynamic VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Tue Oct 24 16:42:04 EEST 2017] Executing as [email protected] on Linux 3.13.0-133-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01; Deflater: Intel; Inflater: Intel; Picard version: 2.13.2-2-g4a846f3-SNAPSHOT
    [Tue Oct 24 16:42:04 EEST 2017] picard.sam.RevertSam done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=173015040
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 97, Read name M04607:37:000000000-AR7YC:1:1108:4763:10619, The unaligned mate start position is 24122067, should be 0
    at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:454)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:812)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:797)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:765)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
    at picard.sam.RevertSam.doWork(RevertSam.java:250)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

  • garrulus_glandariusgarrulus_glandarius UkraineMember

    @Sheila, could you please have a look at this read causing error? How to handle it?

    [email protected][Katana_test] samtools view test.clipped.bam | grep 'M04607:37:000000000-AR7YC:1:1108:4763:10619'

    M04607:37:000000000-AR7YC:1:1108:4763:10619 0 chr1 24122067 60 27S224M * 1 430 TGGCTGATTCTATTCCAGGGGTGGATAGTGACCATCTGTGAGAACCTGCCCCAGTGTAAGAAAACTTGTTTTTATTTTTAGATACTTTGGAAAGCTCTTTCAGAGCAATACAAATGAATGCCTGGGAGGAGGAGGTTTTGTGCCAGAGCCTTGCCCCCTCACTCACTCTTGGGGGTCCTGATGAACTCTTGGACCCTGTGGAAGATAAGAGTTAGAGACCTCGGCCTCCTGGTCAGTGGAGCCCTTGGCCT BBBBBBFFFFFFGGGGGGGGGAEEHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGHHHHHHHHHHHHHHGHHHGHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHGGHGGGGGGGGEHHGGHHHHGGHHGHHHHGHHGGGGGHHHHHHGHHHHHGGGGHHHHHHEHHHHHHHHHHHGHHGHHFHHEFHHHGGGGGFGGGGGGGGGGGGGFFGF;CGFGGGGFFFEFFFFFFB NM:i:3 MD:Z:80A29T6G133 MC:Z:251M AS:i:236 XS:i:20 X0:Z:GALE + GALE + GALE + GALE + GALE + GALE_UserDefined (42239994)_135551209 X1:Z:251M X2:i:24122039 X3:i:24122290 RG:Z:/media/sf_Desktop/170303_M04607_0037_000000000-AR7YC/clean_fastq/LV_S3

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    One problem that I notice with that error is that sometimes bam manipulators mess with the mate data and mate information needs to be fixed after such an event of bam manipulation. have you tried FixMateInformation in picard to see if that solves your problem ?

    Also If you already know the primer sequences of the kit can you try that with trimmomatic or someother tool similar (cutadapt maybe ...) to trim the ends of your reads to rid the primer sequences. That way you may not need to re-manipulate bam file.

  • garrulus_glandariusgarrulus_glandarius UkraineMember

    @SkyWarrior,
    yes, I've tried FixMateInformation and it didn't work. It just threw the same error as RevertSam did!

    java -Xmx8g -jar /usr/local/bin/picard/build/libs/picard.jar FixMateInformation I=test.clipped.bam O=test.clipped.fixed.bam
    ...Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 97, Read name M04607:37:000000000-AR7YC:1:1108:4763:10619, The unaligned mate start position is 24122067, should be 0...

    There's something wrong with this read...

    Thanks for offering options to handle primer sequence but I prefer to stick with soft-clipping because it's more favourable for proper calling variants at the edges of amplicons.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Looking at the article tells me that this clipping step should be performed after MergeBamAlignment-Sort-SetUqMdtags step. But I am not familiar with these amplicon based libraries. I will look for some data that may fit this definition to fiddle with.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @garrulus_glandarius
    Hi,

    To confirm, in all of the threads you have posted in, the underlying issue is that you are using some non-GATK tools on your BAM files which produce errors when you use GATK tools? I have asked the team to look into this, so someone will get back to you soon.

    -Sheila

Sign In or Register to comment.