Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Variant Calling of a personal genome : USAGE question

NilakshaNilaksha Colombo Sri LankaMember

Hi all,
I have been successfully dealing with GATK tools for variant calling for exome sequences, but now I have to do it for a personal genome. Since the genome has been sequenced in two runs , using 7 lanes per each run , now I have 28 fastq files.( paired end reads (2)* 7 lanes * 2 runs). I haven't deal with such a large number of files at once before. My suggested approach is to,

1) Align, Dedup, Realign and Recalibrate per lane. (So I get 14 aligned,deduped,realigned and reacalibrated bam files)
2) Merge the bam files inorder to produce a single bam file
3) Call variants using the single bam file using Haplotype Caller.

Do you think my approach is feasible? Or do you have any alternative approaches? Furthermore, what is the best tool to merge the bam files?

Thanks in advance.
Regards!

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    Hi there,

    Your plan is totally fine, that is the correct way to proceed. The only detail I can add is that technically, you don't need to merge the lane bams before calling; you can just pass them in as multiple inputs to HaplotypeCaller. As long as you tag the sample with the same SM tag in all files, HC will merge the data internally. But if you do want to combine the files anyway e.g. for storage purposes, you can simply pass them to PrintReads as multiple inputs and it will write out a single output.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Hi there,

    Your plan is totally fine, that is the correct way to proceed. The only detail I can add is that technically, you don't need to merge the lane bams before calling; you can just pass them in as multiple inputs to HaplotypeCaller. As long as you tag the sample with the same SM tag in all files, HC will merge the data internally. But if you do want to combine the files anyway e.g. for storage purposes, you can simply pass them to PrintReads as multiple inputs and it will write out a single output.

  • NilakshaNilaksha Colombo Sri LankaMember

    Hey, Nice to hear. :) And thanks a lot for the quick replies. I tagged all the bams with same SM tag but there is a possibility that I might have replaced a simple letter of one tag with a capital in the other :D So I will proceed with producing a single bam file. Thanks again.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Happy to help! Note that if the SM tags are different (even by a letter case) the corresponding data will be treated as coming from different samples, even if they are in the same BAM file. So I do recommend you fix the tags. You can do it with Picard tools' AddOrReplaceReadGroups.

  • NilakshaNilaksha Colombo Sri LankaMember

    Thanks for the tip :)

Sign In or Register to comment.