We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Generation of haplotype or phase data for SNPs in the flanking region of STRs

Does anyone know of an effective way to determine haplotypes or phasing data for SNPs and STRs? I understand that STRs are inherently difficult for aligners; however, I'm trying to determine haplotypes for a large number of STRs (including the flanking region information...SNPs) on a large number of samples. So, manual verification is not really an option. We've developed an in-house perl script that calls STRs accurately; however, it currently does not include flanking region information.

Any help is greatly appreciated.


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    You can try running HaplotypeCaller to generate the haplotypes. When you use the -bamout argument, you will get a bam file with "artificial haplotypes" which are the most likely haplotypes in the region. Then, you can use those artificial haplotypes in ReadBackedPhasing.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Note that you'll want to pass in your list of STRs (either as genome intervals or as a VCF) using -L to run only on them, and add -ip 100 for padding (or any length of flanking sequence you're interested in) as well as --disableOptimizations --dontTrimActiveAlignments --forceActive to make sure HC doesn't reduce the flanking regions if there's nothing it finds interesting in there.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No need to run the haplotypes through ReadBackedPhasing though -- that's for a different workflow to get physical phasing (you would run that on the reads themselves).

Sign In or Register to comment.