HaplotypeCaller gave different results if include --dbsnp argument

jacobhsujacobhsu Hong KongPosts: 14Member
edited November 2013 in Ask the GATK team

Hi, I used following commands to call variants from exactly the same file by HaplotypeCaller. However, I got different results. The results from case1 are not consistent to case2. In some chromosomes, the numbers of variants in case 1 are more than case 2, but others are less. The differences are only a few variants in each chromosome. Any idea? I supposed they will be the same because just adding --dbsnp information.

======================================================================

${java7} -jar $GATK/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R $reference_genome \
-I $input_file \
-L X \
--genotyping_mode DISCOVERY \
-stand_emit_conf 10 \
-stand_call_conf 50 \
-o vcf_out/chrX.vcf

=======================================================================

${java7} -jar $GATK/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R $reference_genome \
-I $input_file \
-L X \
-nct 8 \
--dbsnp $dbsnp \
--genotyping_mode DISCOVERY \
-stand_emit_conf 10 \
-stand_call_conf 50 \
-o vcf_out/chrX.vcf

Post edited by jacobhsu on

Best Answers

Answers

  • ebanksebanks Posts: 684GATK Developer mod

    The difference has nothing to do with the --dbsnp argument but rather -nct; any parallelization makes the calling non-deterministic.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • jacobhsujacobhsu Hong KongPosts: 14Member

    Do you mean if I conduct this analysis by case 2 again, I may get another different result ? In this scenario, how can we know which one is more reliable ?

  • jacobhsujacobhsu Hong KongPosts: 14Member

    Dear ebanks,

    Thanks for your reply. Could you please explain more about why parallelization will make results different ?

Sign In or Register to comment.