Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Local realignment output issue

cwardellcwardell Tokyo, JapanMember

Hi

I've followed the suggested protocol for local realignment - first using RealignerTargetCreator and then IndelRealigner, but have unexpected results.

Let's call the two BAMs I'm realigning "normal" and "tumour" or N and T for short. Once realigned, I've split the resulting NT BAM file (using readgroup tags, although I see from the docs that it can create separate files natively) back into the original N and T BAM files and discovered something odd. I was expecting the pre-realignment N and T files to contain the same number of reads as the post-realignment files, only the coordinates that reads are mapped to would be different.

However, I notice that post-realignment files contain significantly fewer reads because unaligned reads and reads not aligned to the autosomes or sex chromosomes have been removed. However, these reads alone do not account for the difference; large numbers of reads aligned to the 24 chromosomes are now missing.

Can you tell me more about the reads that are removed? I suspect it to be an alignment quality issue, but cannot find direct reference to this behaviour in the documentation. I'm currently keeping both my pre and post-realignment bam files, but ultimately there will be space constraints and I'll have to choose and would like to make the most informed decision possible.

Regards
Chris

Best Answers

Answers

  • cwardellcwardell Tokyo, JapanMember

    Very astute. Yes I did; I had assumed that specifying windows to realign within would simply make the process faster by only realigning within those regions, but I think you're suggesting that all the data not within those regions is discarded. Am I correct? This resolves the issue, unless in a future update you'd be so kind as to add a flag to realign within windows but not discard the rest of the data.

    Thanks for you help.

  • cwardellcwardell Tokyo, JapanMember

    As ever, thanks for your input - problems solved, things running smoothly again.

Sign In or Register to comment.