RealignerTargetCreator Entropy values in output

Hi GATK team,
I went recently to a workshop on GATK at the ASHG. There I have spoken with Geraldine about a possible way to utilize the RealignerTargetCreator command to find potentially off-target regions for CRISPR/Cas9 WGS data.

She suggested to me to use this utility and also told me that internally you calculate an entropy value for each region.
Would it possible to have in output also these entropy values?

Many thanks for any help you may provide!



Issue · Github
by Geraldine_VdAuwera

Issue Number
Last Updated
Closed By


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Luca, sorry for the late response -- this led to a bit more head-scratching than I expected. TL;DR: you can use the HaplotypeCaller to do what you want (further details below). I'm including details of what I found for both RTC and HC here for posterity.


    I found that using the RTC for what you want would require some code modification to get the tool to output the entropy scores.

    Right now the way it's done is that the code looks for mismatches using this:

    if ( p.getBase() != refBase )
        mismatchQualities += p.getQual();
        totalQualities += p.getQual();

    and then

    if ( pileup.getNumberOfElements() >= minReadsAtLocus && (double)mismatchQualities / (double)totalQualities >= mismatchThreshold )
        hasPointEvent = true;

    and it separately looks for indels using:

    if ( p.isDeletion() || p.isBeforeInsertion() ) {
        hasIndel = true;
        if ( p.isBeforeInsertion() )
            hasInsertion = true;

    And finally, if any of hasPointEvent, hasIndel or hasInsertion is true, then the tool outputs the coordinates of the interval to the output file. But it's not really set up to output an entropy score -- and indeed there is only a score for the mismatch case, not for the indel cases. So it would take some dev work to get this working to some extent.


    A better option is to use a subset of the HaplotypeCaller machinery instead. HC has a hidden argument called --justDetermineActiveRegions which does just that: it runs only the entropy calculations and doesn't do any of the subsequent assembly, haplotype scoring or genotyping steps. There is also a documented argument called --activityProfileOut that allows you to output the activity profile, ie the per base probability that there is something going on at that site. I think this will give you what you want; from there it's a question of parsing the output to be useable in your work.

    I hope this helps; let me know how it turns out!

  • lucapinellolucapinello BostonMember
    edited October 2015

    many thanks for this very detailed and comprensive explanation.

    I think the solution in the HaplotypeCaller is what I am looking for.

    I will try this approach, if it works well, I will share my experience here in the forum!

    Thanks again for this fantastic tool.

Sign In or Register to comment.