Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.

Can I use Interleaved FASTQ?

MattBMattB NewcastleMember ✭✭

I have some HiSeq WGS data that it was made available to us as a BAM files, these are co-ordinate sorted and aligned to some variant of b37 by illumina’s Isaac aligner consequently I’ve reprocessed them to randomly ordered FASTQ in preparation for realignment using samtools commands that were previously recommended somewhere on the GATK forum:

samtools bamshuf -uOn 128 LP2000728-DNA_E03.bam /ramdisk/tmp | samtools bam2fq - | pigz > LP2000728-DNA_E03.fastq

Were pigz is a parallel-ish implementation of gzip which I've dropped in place of gzip (in the original recommendation), this has given me interleaved FASTQ would this sort of input be amenable to be used in the MuTect2 tumour/normal pipeline and the germline SNPs+Indel pipeline? Or should I specifically split the input into _1 and _2 files as per the norm with illumina sequencing runs? Just I'm not sure if the relevant WDL can optionally accept two or one file with out editing the workflow? Or alternatively is there a Picard tool which maybe more appropriate?

Best Answer

Answers

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Thank you for your question. I'm not as familiar as I'd like to be with the Mutect2 pipeline, so I'm consulting with a few other members of my team. We will get back to you with an answer soon.

  • MattBMattB NewcastleMember ✭✭

    Hi Kate, thanks this was exactly the info I needed to sanitise my current BAMs, I also found this Method/WDL which looks somewhat equivalent within Firecloud it's self although appears to clear slightly less tags than in the above tutorial.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Fantastic, I'm glad that helped! And thank you for the link to the method, I'll remember that for the future.

Sign In or Register to comment.