Different ARO with or without -justDetermineActiveRegions for HaplotypeCaller

I noticed that for one BAM, when calling HaplotypeCaller (I am using version 3.5), with or without the -justDetermineActiveRegions, the ARO results are different at several intervals.

For example, one is (without -justDetermineActiveRegions vs with it):

413386,413388c413386,413388
< chr7  100550695   100550914   size=219    1.00000
< chr7  100550914   100550915   end-marker  0.00000
< chr7  100550914   100551155   size=241    1.00000
---
> chr7  100550695   100550944   size=249    1.00000
> chr7  100550944   100550945   end-marker  0.00000
> chr7  100550944   100551155   size=211    1.00000

This is quite not as expected. justDetermineActiveRegions make the map() return directly without doing any job. This should not affect the active regions if my understanding is correct. But the test shows it does. I may understand something wrongly here.

Could you please give some comments on why the behavior is like this?

Tagged:

Issue · Github
by Sheila

Issue Number
1244
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @ericm
    Hi,

    Can you please post the exact command you ran?

    Thanks,
    Sheila

  • ericmericm Member

    Hi Sheila, the commands used are:

    /path/to/jdk1.8.0_73/bin/java -Xmx10G -jar /path/to/GATK-3.5.jar -T HaplotypeCaller -L /path/to/bed.bed -R /path/to/gatk_resource_bundle/hg19/ucsc.hg19.fasta -I /path/to/bam.bam --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o all31.vcf -ARO all31.bed
    
    /path/to/jdk1.8.0_73/bin/java -Xmx10G -jar /path/to/GATK-3.5.jar -T HaplotypeCaller -L /path/to/bed.bed -R /path/to/gatk_resource_bundle/hg19/ucsc.hg19.fasta -I /path/to/bam.bam --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o all30.vcf -ARO all30.bed -justDetermineActiveRegions
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @ericm
    Hi,

    I just tried this myself with some test data, and my ARO output is the exact same with or without -justDetermineActiveRegions. Let me check with the team and get back to you.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hey @ericm, you're correct that we don't expect to see any major differences, but if I recall correctly some of the active region trimming and size adjustments are done after the stage at which -justDetermineActiveRegions makes HC bail out, to use the information gained from the local reassembly process. The minor differences you see here seem to be in line with that.

  • ericmericm Member

    Hi @Geraldine_VdAuwera, thanks for the comment.

    I find the random number generator might also affect it. The downsampler uses the random number generator. The random number generator seems a globally shared one. If other logic such as the map() in HC uses the random number, the random number generator will be changed too. This may affect the downsampler a little bit. In some rare situations, the active regions may be affected.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @ericm Indeed, the random downsampling can come into play as well.

Sign In or Register to comment.