Difference in the variant calling for same sample by HyplotypeCaller

I have base recalibrated the BAM file for the same sample using two different versions of dbSNP as the known sites. These is a particular position I am interested in. On using dbSNP v137 the genotype call is 0/1 while on using dbSNP v147 the genotype call is 0/0 in the g.vcf produced by HaplotypeCaller. I have pasted below the commands I used. The position is Chr2:241727475. The GATK version I have used is v2014.3-3.2.2-7-gf9cba99. I am unable explain this difference in the genotype calls as this particular position is not known in either of the dbSNP files. On checking the BAM file in visualizer like IGV i can understand the reason for the 0/1 call as the nucleotide change is A>G. I have also attached the image from IGV for each call. Please help me understand reason for the difference in these genotype calls.

v137
/usr/bin/java -Xmx25g -jar /nfshome/iomics/Apps/Toolkit//ClinicalDiagExome/GenomeAnalysisTK//GenomeAnalysisTK.jar -T BaseRecalibrator -L Neurology_3_Regions_genesymbol_sort.bed -I out_SO_RG_MD_IR.bam -R human_g1k_v37.fa -knownSites dbsnp_137.b37.vcf -knownSites Mills_and_1000G_gold_standard.indels.b37.vcf -nct 1 -o out_SO_RG_MD_IR_recal_data.table

/usr/bin/java -Xmx25g -jar /nfshome/iomics/Apps/Toolkit//ClinicalDiagExome/GenomeAnalysisTK/GenomeAnalysisTK.jar -T BaseRecalibrator -L Neurology_3_Regions_genesymbol_sort.bed -I out_SO_RG_MD_IR.bam -R human_g1k_v37.fa -knownSites dbsnp_137.b37.vcf -knownSites Mills_and_1000G_gold_standard.indels.b37.vcf -BQSR out_SO_RG_MD_IR_recal_data.table -nct 1 -o out_SO_RG_MD_IR_post_recal_data.table

Outcome:
2 241727475 . A G, 259.77 . BaseQRankSum=-4.491;ClippingRankSum=-1.336;DP=185;MLEAC=1,0;MLEAF=0.500,0.00;MQ=43.98;MQ0=0;MQRankSum=-0.384;ReadPosRankSum=-0.309 GT:AD:DP:GQ:PL:SB 0/1:111,71,0:182:99:288,0,1077,585,1263,1848:111,0,71,0

v147
/usr/bin/java -Xmx25g -jar /nfshome/iomics/Apps/Toolkit//ClinicalDiagExome/GenomeAnalysisTK//GenomeAnalysisTK.jar -T BaseRecalibrator -L Neurology_3_Regions_genesymbol_sort.bed -I out_SO_RG_MD_IR.bam -R human_g1k_v37.fa -knownSites dbSNP_build147_20160601.vcf -knownSites Mills_and_1000G_gold_standard.indels.b37.vcf -nct 1 -o out_SO_RG_MD_IR_recal_data.table

/usr/bin/java -Xmx25g -jar /nfshome/iomics/Apps/Toolkit//ClinicalDiagExome/GenomeAnalysisTK/GenomeAnalysisTK.jar -T BaseRecalibrator -L Neurology_3_Regions_genesymbol_sort.bed -I out_SO_RG_MD_IR.bam -R human_g1k_v37.fa -knownSites dbSNP_build147_20160601.vcf -knownSites Mills_and_1000G_gold_standard.indels.b37.vcf -BQSR out_SO_RG_MD_IR_recal_data.table -nct 1 -o out_SO_RG_MD_IR_post_recal_data.table

Outcome:
2 241727475 . A . . END=241727475 GT:DP:GQ:MIN_DP:PL 0/0:22
9:0:229:0,0,519

Regards

Manas

137.png 101.3K
147.png 101.8K

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @sulemanas
    Hi Manas,

    Hmm. There could be slight differences when using the two different versions. Notice the GQ in the 0/0 site is very low. Can you please try using the latest version of GATK and let us know if you see any difference?

    Thanks,
    Sheila

  • sulemanassulemanas IndiaMember

    HI Sheila

    I tried the new version of GATK 3.6. The variant call made for both dbSNP v137 and v147 was 0/1

    v137
    2 241727475 . A G, 1514.77 . BaseQRankSum=-5.874;ClippingRankSum=0.000;DP=831;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-0.322;RAW_MQ=1607666.00;ReadPosRankSum=-1.314 GT:AD:DP:GQ:PL:SB 0/1:497,315,0:812:99:1543,0,4342,2847,5171,8018:497,0,315,0

    v147
    2 241727475 . A G, 477.77 . BaseQRankSum=-4.732;ClippingRankSum=0.000;DP=779;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQ=1508144.00;ReadPosRankSum=-1.100 GT:AD:DP:GQ:PL:SB 0/1:484,200,0:684:99:506,0,6060,1861,6589,8450:484,0,200,0

    I can see that in both these cases the GQ is 99. Can you please help me understand if there is any difference in the algorithm being used in version 3.6?
    Also I am trying to understand how the call in version 3.3 could differ based only on the dbSNP file. I tried to check if this was a known position in the dbSNP but was unable to find it in both cases. Thus since this particular variant is of unknown significance, how could the genotype call differ?

    Regards

    Manas

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    There have been quite a few improvements from 3.2 to 3.6 -- that spans almost two years of active development -- so it's very difficult to say which change had an effect on this particular call. My guess is that there was a bug producing a wrong call under some conditions, and now it's fixed.
Sign In or Register to comment.