Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Generation of haplotype or phase data for SNPs in the flanking region of STRs

Does anyone know of an effective way to determine haplotypes or phasing data for SNPs and STRs? I understand that STRs are inherently difficult for aligners; however, I'm trying to determine haplotypes for a large number of STRs (including the flanking region information...SNPs) on a large number of samples. So, manual verification is not really an option. We've developed an in-house perl script that calls STRs accurately; however, it currently does not include flanking region information.

Any help is greatly appreciated.

Comments

  • SheilaSheila Broad InstituteMember, Broadie admin

    @jonathanKing2011
    Hi,

    You can try running HaplotypeCaller to generate the haplotypes. When you use the -bamout argument, you will get a bam file with "artificial haplotypes" which are the most likely haplotypes in the region. Then, you can use those artificial haplotypes in ReadBackedPhasing.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Note that you'll want to pass in your list of STRs (either as genome intervals or as a VCF) using -L to run only on them, and add -ip 100 for padding (or any length of flanking sequence you're interested in) as well as --disableOptimizations --dontTrimActiveAlignments --forceActive to make sure HC doesn't reduce the flanking regions if there's nothing it finds interesting in there.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No need to run the haplotypes through ReadBackedPhasing though -- that's for a different workflow to get physical phasing (you would run that on the reads themselves).

Sign In or Register to comment.