Do alignment differences affect a lot with GATK Haploytype Caller?

cindylanzaocindylanzao ShenzhengMember

Hi,
We find with the same sample aligned with different version of bwa thus with different bam files(We test the two bam files with deeptools, and find the two are quite different), we may get very similar variant result with gatk germline pipeline, is this due to that the haplotypecaller is more tolerant with the alignment file since it will do the realignment itself or we just ignore some other factors that will lead to this result?

Tagged:

Best Answer

Answers

  • cindylanzaocindylanzao ShenzhengMember

    @AdelaideR said:
    Hi Cindy -

    It is best practice to submit all bam files generated by the same version of bwa, so that the Haplotype caller does not miss any potential active regions.

    You are correct that the HaplotypeCaller does a local realignment in [step 2:] (https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.9.0/org_broadinstitute_hellbender_tools_walkers_haplotypecaller_HaplotypeCaller.php)

    "For each active region, the program builds a De Bruijn-like graph to reassemble the active region and identifies what are the possible haplotypes present in the data. The program then realigns each haplotype against the reference haplotype using the Smith-Waterman algorithm in order to identify potentially variant sites."

    In that case, a concern would be whether the bwa versions differ so much that active regions are missed.

    Also, just for consistency and repeatability, all bams should be generated in the same manner because the GATK program is sensitive to bam format errors that crop up in some of the programs.

    Hi,AdelaideR,
    The ending of this story is dramatic, the two bam files differs due to one finished with Markduplicates while another did not. After both were finished Markduplicates, they are almost the same. Anyway, happy ending. Thanks!

Sign In or Register to comment.