To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Generation of haplotype or phase data for SNPs in the flanking region of STRs

Does anyone know of an effective way to determine haplotypes or phasing data for SNPs and STRs? I understand that STRs are inherently difficult for aligners; however, I'm trying to determine haplotypes for a large number of STRs (including the flanking region information...SNPs) on a large number of samples. So, manual verification is not really an option. We've developed an in-house perl script that calls STRs accurately; however, it currently does not include flanking region information.

Any help is greatly appreciated.


  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    You can try running HaplotypeCaller to generate the haplotypes. When you use the -bamout argument, you will get a bam file with "artificial haplotypes" which are the most likely haplotypes in the region. Then, you can use those artificial haplotypes in ReadBackedPhasing.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Note that you'll want to pass in your list of STRs (either as genome intervals or as a VCF) using -L to run only on them, and add -ip 100 for padding (or any length of flanking sequence you're interested in) as well as --disableOptimizations --dontTrimActiveAlignments --forceActive to make sure HC doesn't reduce the flanking regions if there's nothing it finds interesting in there.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    No need to run the haplotypes through ReadBackedPhasing though -- that's for a different workflow to get physical phasing (you would run that on the reads themselves).

Sign In or Register to comment.