We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Haplotype caller active regions parameters not available in GATK4

dhwanidhwani indiaMember
edited May 2019 in Ask the GATK team

I would like to know like GATK 3.6 there was an option to define active regions, is there any option in GATK v4.2.1

The Active region parameters can be seen from this link:-
https://software.broadinstitute.org/gatk/documentation/tooldocs/3.5- /org_broadinstitute_gatk_tools_walkers_diagnostics_FindCoveredIntervals.php.

  • advanced input which has a parameter name --activeRegionIn,

If i correctly understand, this is the list of interval (can be in bed file) to define active regions which would be further used for phasing.

1) What is the similar parameter in the new GATK release?

2) Using new version of GATK, VCF files has only 2 rows out of ** 228 rows** that have PID information. How do i go in such a case. Have i done any mistake in my analysis .

My commands are as follows:-
/opt/apps/gatk/4.2.1/gatk Haome_analysis/reference/Homo_sapiens.GRCh37.dna.chromosome.6.fa -I base_recalib/abc_aligned_sorted_dupmarked_realigned_recalibrated.bam -O haplotype_caller/abc_haplotyper.g.vcf --emit-ref-confidence GVCF -L /home/dhwani.dholakia/archive/files_required_for_exome_analysis/coord.bed --max-assembly-region-size 1000 -mbq 25 -stand-call-conf 40


  • bshifawbshifaw Member, Broadie, Moderator admin
    edited May 2019

    Hi @dhwani

    HaplotypeCaller's method of identifying active regions is mentioned here : HC overview: How the HaplotypeCaller works.
    For more details on how the activity profile is computed and processed, as well as what options are available to modify the active region parameters, please see HC step 1: Defining ActiveRegions by measuring data entropy. You'll see a subheading called "Setting the ActiveRegion thresholds and intervals".

    Mind elaborating on the second question and uploading a snippet of the results?

  • dhwanidhwani indiaMember
    edited May 2019


    Thanks for your reply.

    The recent GATK documentation link and the link that you shared link have different arguments.
    In the link that you shared "Under sub heading "HC step 1: Defining ActiveRegions by measuring data entropy", the parameter to define active size ("-activeRegionMaxSize") is not there in the new documentation

    Could you please clear this ambiguity?

  • dhwanidhwani indiaMember


    Regarding second question:-

    When i used the parameter --max-assembly-region-size 1000 , i expect that the haplotypes should be phased for a large region around 1000 bp. However in the result of VCF file i got only two near by variants that were phased. The commands that i had used for the same were

    /opt/apps/gatk/4.2.1/gatk HaplotypeCaller --dbsnp /home/dhwani.dholakia/archive/files_required_for_exome_analysis/dbsnp/GRCH37.p17_refseq.vcf -R /home/dhwani.dholakia/archive/files_required_for_exome_analysis/reference/Homo_sapiens.GRCh37.dna.chromosome.6.fa -I base_recalib/DRR015487_aligned_sorted_dupmarked_realigned_recalibrated.bam -O haplotype_caller/haplotyper.g.vcf --emit-ref-confidence GVCF -L /home/dhwani.dholakia/archive/files_required_for_exome_analysis/coord.bed --max-assembly-region-size 1000 -mbq 25 -stand-call-conf 40

    Have i missed any important arguments that's why i am not getting the genotypes phased?

    Attaching my VCF file for further information
    I can only find two variants that are phased and not far from each other also. Why is this happening even when active region stretch was defined for a size of 1000 bp.

    PS. changed the extension of VCF file to .txt because .vcf was not supported for uploading

  • bshifawbshifaw Member, Broadie, Moderator admin

    Ahh, looks like the activeRegionMaxSize argument has been removed from Haplotypecaller in GATK4. The error in the documentation has been noted and will be corrected.

    I'll have to check with the dev team with your second question.

  • bshifawbshifaw Member, Broadie, Moderator admin

    It might also be helpful be follow the instructions in the following article until then.

  • bshifawbshifaw Member, Broadie, Moderator admin


    Here is a response from the team

    This is a known issue, and the phasing in VCF given by user does not look unusual. Basically, phasing output by HaplotypeCaller is very conservative and phases reads it's 100% sure about, which leads to low sensitivity.

    Look at issue 3368 and proposed fix in issue 5828

Sign In or Register to comment.