Why I can not reproduce my result on different hostcomputer?

aaronicoaaronico ChinaMember

Dear colleague,
I have encounter a strange issues with GATK-3.3/3.7:
I used HaplotypeCaller to call variants on the same realn.recal.bam file, and it gave me different results when i ran on different computers (calls or annotation were not exactly identical) . I finally found that only if i ran on the same computer, i can get identical results between different repeats.

This dose not make sense, I think even if i use down-sampling, i should able to reproduce the result in any condition.I have stuck in the problem for a few days, can you give some advices? how could this happened and how to reproduce my result on different computers?

here is my commmands:

java -Xmx5G -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fasta -I sample.realign.recal.bam -L chr1 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -dt NONE -o sample.chr1.g.vcf.gz
java -Xmx2G -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R hg19.fasta --variant sample.chr1.g.vcf.gz -o sample.chr1.vcf.gz -stand_call_conf 30 -stand_emit_conf 10 -dt NONE

PS: I kept using same java version for all my tests.


Best Answer


  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @aaronico,

    I believe there were changes in v3.6 that allow for calls to be identical. Prior to this version, there was a random element to calling. Also, and I'm not sure exactly which version this pertains to, there were changes to the emit/call confidence defaults.

    I have encounter a strange issues with GATK-3.3/3.7:

    Which version are you getting the non-identical results for, 3.3 or 3.7?

  • aaronicoaaronico ChinaMember

    Thanks shlee and Geraldine. I tried 3.3 and 3.7, both of them output non-identical results.
    I took Geraldine's advice to using '-pairHMM LOGLESS_CACHING', and problem solved! Many thanks!!

Sign In or Register to comment.