To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GATK4 realignment

igorigor New YorkMember

As discussed in a blog post, GATK4 removes the realignment step:

As announced in the GATK v3.6 highlights, variant calling workflows that use HaplotypeCaller or MuTect2 now omit indel realignment. This change does not apply to workflows that call variants with UnifiedGenotyper or the original MuTect. We still recommend indel realignment for these legacy workflows.

I understand that HaplotypeCaller and MuTect2 do their own internal realignment, but I would like to examine the BAMs manually or feed them to other variant callers. It's nice to have the cleanest possible version. Technically, HaplotypeCaller can output a realigned BAM, but as the documentation states:

The assembled haplotypes and locally realigned reads will be written as BAM to this file if requested. Really for debugging purposes only. Note that the output here does not include uninformative reads so that not every input read is emitted to the bam.

Is there a recommended approach going forward? I am guessing you may have had an internal discussion about this. Should I keep the realignment step as GATK3 and move other steps to GATK4? That seems terribly inelegant and probably will eventually start causing issues.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @igor
    Hi,

    I am assuming you will only examine the BAMs when there is a discrepancy between the callers. In that case, you can output the bamout file for just those regions/sites. If you look at the Variant Discovery hands on tutorial in the presentations section, you will see an example of HaplotypeCaller reassembling the reads in an active region even after Indel Realignment was performed. So, it would be best to use the bamout file rather than the Indel Realigned file.

    That said, there is an effort to port the Indel Realignment tools to GATK4. You can keep track of the status here.

    -Sheila

  • igorigor New YorkMember

    I wanted to make sure I feed the cleanest possible BAM to all variant callers, not just check for discrepancies. I just wanted to know if I am missing any developments in regards to the realignment step. I am happy to hear there are plans to port it to GATK4 (although it sounds like that is not imminent).

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @igor
    Hi,

    If you are going to be using the same BAM file for both local reassembly based variant callers and position based variant callers, it may be best to use the Indel Realignment workflow. However, keep in mind that HaplotypeCaller uses the realigned BAM file (produced by bamout) to determine variants. This realigned BAM file may have changes from the Indel Realigned BAM file and cause discrepancies between callers.

    -Sheila

Sign In or Register to comment.