Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to best minimize variation between runs of HaplotypeCaller in GVCF mode?

I am using a combination of HaplotypeCaller local (non-spark), in GVCF mode, followed by GatherVcfs to merge them, and I get very different call results across runs. I would expect the probabilities/confidence values to change slightly, but not so much the number of calls. Is this normal?

I'm using the gatk from docker://broadinstitute/gatk:4.beta.6 . My BAM/BAI files pass validation.

I see other posts about results being non-deterministic. But I'm not passing any of the -nt or -nct flags in this case.

I'm splitting all my contigs (bed file) in roughly equal-sized chunks, and calling HaplotypeCaller, like so. The VCF file produced changes a lot if I do 8 chunks, vs 128. I'm not sure whether that makes things worse.

# chunk 000
java -jar /gatk/gatk.jar HaplotypeCaller -R ANN0859.bam --emitRefConfidence GVCF -L bed_chunk_000.bed -O ANN0859.bam_000.g.vcf -hets 0.010000
# chunk 001
java -jar /gatk/gatk.jar HaplotypeCaller -R ANN0859.bam --emitRefConfidence GVCF -L bed_chunk_001.bed -O ANN0859.bam_001.g.vcf -hets 0.010000
...

I merge them like so (passing all the chunks in order):

java -jar /gatk/gatk.jar GatherVcfs -I ANN0859.bam_000.g.vcf -I ANN0859.bam_001.g.vcf ...

The entire bed is sorted, and the chunks are not overlapping. I've made sure that I'm not losing any contigs when I split my bed file.

To provide an example difference for one of the chromosomes, I get the following calls (for 128 chunks) in the final output gVCF:

HanXRQChr00c0117        2497    .       G       <NON_REF>       .       .       END=2580        GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
HanXRQChr00c0117        10708   .       G       <NON_REF>       .       .       END=25539       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
(EOF)

And if I divide the work in 8 (longer) chunks, that last section just explodes into 1960 different calls:

HanXRQChr00c0117        10708   .       G       <NON_REF>       .       .       END=14265       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
HanXRQChr00c0117        14266   .       C       <NON_REF>       .       .       END=14267       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,42
...
HanXRQChr00c0117        14309   .       T       C,<NON_REF>     0.13    .       DP=2;MLEAC=0,0;MLEAF=nan,nan;RAW_MQ=7200        GT:PGT:PID      ./.:0|1:14309_T_C
HanXRQChr00c0117        14310   .       T       <NON_REF>       .       .       END=14315       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,45
HanXRQChr00c0117        14316   .       T       C,<NON_REF>     0.13    .       DP=2;MLEAC=0,0;MLEAF=nan,nan;RAW_MQ=7200        GT:PGT:PID      ./.:0|1:14309_T_C
HanXRQChr00c0117        14317   .       T       <NON_REF>       .       .       END=14321       GT:DP:GQ:MIN_DP:PL      0/0:1:3:1:0,3,45
...
HanXRQChr00c0117        14358   .       T       <NON_REF>       .       .       END=14359       GT:DP:GQ:MIN_DP:PL      0/0:4:12:4:0,12,180
HanXRQChr00c0117        14360   .       A       G,<NON_REF>     30.02   .       DP=4;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;RAW_MQ=14400        GT:AD:DP:GQ:PL:SB       1/1:0,1,0:1:3:45,3,0,4
5,3,45:0,0,1,0
...
HanXRQChr00c0117        25479   .       T       <NON_REF>       .       .       END=25484       GT:DP:GQ:MIN_DP:PL      0/0:8:24:8:0,24,296
HanXRQChr00c0117        25485   .       T       <NON_REF>       .       .       END=25485       GT:DP:GQ:MIN_DP:PL      0/0:8:21:8:0,21,315
HanXRQChr00c0117        25486   .       T       <NON_REF>       .       .       END=25521       GT:DP:GQ:MIN_DP:PL      0/0:6:18:6:0,18,217
HanXRQChr00c0117        25522   .       A       <NON_REF>       .       .       END=25524       GT:DP:GQ:MIN_DP:PL      0/0:7:15:7:0,15,225
HanXRQChr00c0117        25525   .       T       <NON_REF>       .       .       END=25539       GT:DP:GQ:MIN_DP:PL      0/0:5:9:3:0,9,133
(EOF)

I thought at first that maybe the chunk boundaries were at play, but those contigs are in the middle of a chunk file.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @init_js
    Hi,

    Can you try with the latest release? The beta versions were not completely stable, so there may be changes that affect your outputs.

    -Sheila

  • init_jsinit_js Member
    edited April 2018

    Ok, i've had some time to test with GATK 4.0.1.2 (docker://broadinstitute/gatk:4.0.1.2), and the results do look more consistent on that version.

    I've got many core years of computation completed on the previous version 4.beta.6 however. Is there something to salvage from the .g.vcf produced on 4.beta.6 or are they complete garbage (I certainly hope not)?

    I've tested 4.0.1.2 haplotypecaller on two different input bam files (one small, one big). For the small file, I've divided my BED file into 8 and then 128 chunks, each followed by a gathergvcfs. For the large bam file, I've used 32 and 160 chunks. And I've run HaplotypeCaller twice for each configuration (large_or_small x num_chunks) for a total of 8 runs.

    To compare outputs, I'm using GNU diff on the textual output of the raw calls in the gvcf, i.e. the output of bcftools view, without headers.

    The small input file (BAM, 19MB), gives consistent results across runs, regardless of the number of "chunks" I split my bed file input into. There is no difference in the call sites.

    The large file, (BAM, 9.0GB), does have slightly different calls across runs, but a countable number of them.

    For the 32-chunk runs on the large file, I count 35 call sites total with differences, but in the numeric values, not in the actual bases called. e.g.:

    Some differences in the QUAL field:

    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ANN1002
    < HanXRQChr01   43744341        .       C       T,<NON_REF>     308.84  .       DP=8;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1,0;RAW_MQ=28800  GT:AD:DP:GQ:PL:SB       1/1:0,8,0:8:24:327,24,0,327,24,327:0,0,3,5
    ---
    > HanXRQChr01   43744341        .       C       T,<NON_REF>     316.84.       DP=8;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1,0;RAW_MQ=28800  GT:AD:DP:GQ:PL:SB       1/1:0,8,0:8:24:335,24,0,335,24,335:0,0,3,5
    

    Or some differences in the last field:

    < HanXRQChr06   88296845        .       C       T,<NON_REF>     208.83  .       BaseQRankSum=-1.471;ClippingRankSum=0;DP=12;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;MQRankSum=0;RAW_MQ=43200;ReadPosRankSum=0.709        GT:AD:DP:GQ:PGT:PID:PL:SB       0/1:5,7,0:12:99:0|1:88296793_A_G:227,0,546,242,567,809:2,3,2,5
    ---
    > HanXRQChr06   88296845        .       C       T,<NON_REF>     208.83  .       BaseQRankSum=-1.471;ClippingRankSum=0;DP=12;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;MQRankSum=0;RAW_MQ=43200;ReadPosRankSum=0.709        GT:AD:DP:GQ:PGT:PID:PL:SB       0/1:5,7,0:12:99:0|1:88296793_A_G:227,0,545,242,566,808:2,3,2,5
    

    But the number and type of call is the same.

    Comparing a .g.vcf that was merged from 32 runs of haplotypecaller vs one that was merged from 160, I count 1426 diff blocks (over the entire 9.1GB .g.vcf). Most of them, again, in the quality scores and metadata, with a few exceptions (under 100), where some bases are called in one case, but not the other. I can probably live with that.

    < HanXRQChr12   118744621       .       C       <NON_REF>       .       .       END=118744624   GT:DP:GQ:MIN_DP:PL      0/0:11:0:11:0,0,281
    < HanXRQChr12   118744625       .       A       AT,<NON_REF>    0       .       DP=13;ExcessHet=3.0103 MLEAC=0,0;MLEAF=0.00,0.00;RAW_MQ=40729.00        GT:AD:DP:GQ:PGT:PID:PL:SB       0/0:9,0,0:9:27:0|1:118744625_A_AT:0,27,405,27,405,405:4,5,0,0
    < HanXRQChr12   118744626       .       A       <NON_REF>       .       .       END=118744626   GT:DP:GQ:MIN_DP:PL      0/0:12:0:12:0,0,336
    < HanXRQChr12   118744627       .       T       TAATAAAAA,<NON_REF>     0       .       DP=13;ExcessHet=3.0103;MLEAC=0,0;MLEAF=0.00,0.00;RAW_MQ=40729.00        GT:AD:DP:GQ:PGT:PID:PL:SB      0/0:9,0,0:9:27:0|1:118744625_A_AT:0,27,405,27,405,405:4,5,0,0
    < HanXRQChr12   118744628       .       A       <NON_REF>       .       .       END=118744628   GT:DP:GQ:MIN_DP:PL      0/0:12:0:12:0,0,351
    ---
    > HanXRQChr12   118744621       .       C       <NON_REF>       .       .       END=118744628   GT:DP:GQ:MIN_DP:PL      0/0:12:0:11:0,0,281
    

    Is a change in quality scores expected across runs?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @init_js
    Hi,

    Glad to hear GATK4 stable version is more consistent :smile:

    I've got many core years of computation completed on the previous version 4.beta.6 however. Is there something to salvage from the .g.vcf produced on 4.beta.6 or are they complete garbage (I certainly hope not)?

    It is really up to you, but it would be best to re-run everything with the same version. In your case, you can try using the GVCFs in GATK 4.0.1.2 GenomicsDBImport and GenotypeGVCFs, but I am not sure if they will be compatible. You can try out FireCloud and see if you can re-run everything on the Cloud. We are giving away free credits , and perhaps that will give you a chance to test it out and at least re-run some (or hopefully all) of your analysis. In the featured workspaces, the costs of things and approximate runtimes are shown.

    For the different base calls, it looks like the GQs are low so they won't be different/output in the final VCF.

    Are the QUAL scores different in the final VCF? They can be slightly different depending on which reads are used. Can you also try adding --use-new-qual-calculator in GenotypeGVCFs?

    Thanks,
    Sheila

  • init_jsinit_js Member
    edited April 2018

    Thank you Sheila! Ok, yes, I think we will re-run everything on 4.0.1.2, just to gain more confidence. I'll check out the free creds.

    Re: Comparison. You ask "Are the QUAL scores different in the final VCF?". I have now multiple gVCFS produced out of different scatter-gather configurations (e.g. 32 vs 160). How do you suggest I perform the VCF comparison? different GenotypeGVCF runs on the same cohort -- one that includes sample_FOO_{A}.g.vcf , vs one that includes sample_FOO_{B}.g.vcf?

    I can certainly try the --use-new-qual-calculator. I asked about it on a different thread, but I wasn't sure why it wasn't turned ON by default in 4.0.1.2. (see https://gatkforums.broadinstitute.org/gatk/discussion/comment/48130/#Comment_48130). If you could provide a pointer to documentation on it, I would gladly check it out.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @init_js
    Hi,

    How do you suggest I perform the VCF comparison? different GenotypeGVCF runs on the same cohort -- one that includes sample_FOO_{A}.g.vcf , vs one that includes sample_FOO_{B}.g.vcf?

    Right now, don't you just have GVCFs from two different runs on the same sample? Or, did I miss something? Can you clarify what you mean by "merge VCFs"?

    I provided a link that should help with understanding newQual in the other thread :smile:

    -Sheila

  • init_jsinit_js Member

    I do have two GVCFs files for that sample, yes, but to go from GVCF to VCF I need to run another tool, i.e. GenotypeGVCFs, to jointcall that GVCF with the other GVCFs of other samples in my cohort. I think this might take a bit of time to setup as it is. Having higher QUAL might not mean more accurate (or closer to the truth) though, will it?

    (For those interested, the link on --new-qual is https://github.com/broadinstitute/gatk/issues/4614)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @init_js
    Hi,

    I see. The differences in QUAL seem pretty minor, so they are nothing to worry about. There could be slight differences due to which reads are used. Have a look at this article for more information on how the old QUAL score is calculated.

    Having higher QUAL might not mean more accurate (or closer to the truth) though, will it?

    Hopefully this article will help with understanding QUAL values.

    -Sheila

Sign In or Register to comment.