HaplotypeCaller Indel detection

biocaitaobiocaitao Posts: 2Member
edited November 2012 in Ask the GATK team

We find the haplotypecaller is an excellent SNP caller. But recently we got confused for the indel results. We did the target sequencing (total 6 samples with 3 case vs. 3 control). We followed the best practice suggestion except that the VariantRecalibrator (the snp number was around 600 and seems too little for the recalibration). Haploptypecaller detected correctly a SNP but the neighbor deletion was a little strange. From the samtools tview, there is no clear sign for the deletion. We wonder if it came from the de novo assembly by haplotypecaller and is it creditable? Thanks.

The command line:

java -Xmx4g -jar ~/GenomeAnalysisTK-2.1-9-gb90951c/GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I sample1.clean.dedup.recal.bam -I sample2.clean.dedup.recal.bam -I sample3.clean.dedup.recal.bam -I sample4.clean.dedup.recal.bam -I sample5.clean.dedup.recal.bam -I sample6.clean.dedup.recal.bam --dbsnp dbsnp_135.hg19.vcf -L target.interval_list  -stand_call_conf 50.0 -stand_emit_conf 10.0 -o samples_new.raw.snps.indels.vcf

haplotypecaller result:

SNP: 19448410   .    T  G   2126.64 .   AC=6;AF=0.500;AN=12;ActiveRegionSize=135;ClippingRankSum=18.283;DP=976;EVENTLENGTH=0;FS=1076.837;MLEAC=6;MLEAF=0.500;MQ=58.70;MQRankSum=-2.107;NVH=3;NumHapAssembly=17;NumHapEval=13;QD=2.18;QDE=0.73;ReadPosRankSum=-17.858;TYPE=SNP;  GT:GQ:PL    0/1:99:195,0,2945   0/1:99:936,0,6037   0/1:99:354,0,3059   0/1:99:301,0,4595   0/1:99:187,0,2191   0/1:99:203,0,2617
Indel: 19448411 .    GTGGCTCC   G   274.85  .   AC=3;AF=0.250;AN=12;ActiveRegionSize=135;ClippingRankSum=9.296;DP=1019;EVENTLENGTH=-7;FS=328.629;MLEAC=3;MLEAF=0.250;MQ=58.74;MQRankSum=-0.451;NVH=3;NumHapAssembly=17;NumHapEval=13;QD=0.46;QDE=0.15;ReadPosRankSum=-11.624;TYPE=INDEL;    GT:GQ:PL    0/0:99:0,106,14388  0/1:99:200,0,28094  0/0:45:0,45,15261   0/1:50:50,0,22048   0/1:74:74,0,10244   0/0:42:0,42,12913
Post edited by Geraldine_VdAuwera on

Answers

  • rpoplinrpoplin Posts: 122GATK Developer mod

    Hmmm, it is hard to really comment without seeing the data. Would you be able to post an IGV screenshot of the region? Also, you can add -debug to your command line and get all of the verbose debugging output from the HaplotypeCaller.

    I hope that helps,

  • biocaitaobiocaitao Posts: 2Member
    edited November 2012

    Thank a lot, rpoplin and Geraldine

    In fact, I reran the whole pipeline using the latest 2.2 bundle. And I am surprised to see the result about the deletion is different with result last time. Actually, the sample 6 changed from “0/0” to “0/1”. Also, I ran “Pindel” for double check. It seems there are sort of different length deletions around the regions in all samples. I am not sure if the regions are too complex to assembly by haplotypeCaller.

    Some detail information: IGV view, command line, new result and the debug info (I am sorry to remove the position)

    image

    java -Xmx4g -jar ~/GenomeAnalysisTK-2.2-5-g3bf5e3f/GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I sample1.clean.dedup.recal.bam -I sample2.clean.dedup.recal.bam -I sample3.clean.dedup.recal.bam -I sample4.clean.dedup.recal.bam -I sample5.clean.dedup.recal.bam -I sample6.clean.dedup.recal.bam --dbsnp dbsnp_137.hg19.vcf -L target.interval_list --debug -stand_call_conf 50.0 -stand_emit_conf 10.0 -o samples_new.raw.snps.indels.vcf
    

    deletion:

    GTGGCTCC        G       415.06  .       AC=4;AF=0.333;AN=12;ActiveRegionSize=135;BaseQRankSum=4.593;ClippingRankSum=6.318;DP=934;EVENTLENGTH=-7;FS=149.291;HaplotypeScore=67.2693;MLEAC=4;MLEAF=0.333;MQ=58.82;MQ0=0;MQRankSum=-0.061;NVH=3;NumHapAssembly=17;NumHapEval=13;QD=0.63;QDE=0.21;ReadPosRankSum=-7.676;TYPE=INDEL;extType=INDEL     GT:AD:GQ:PL     0/0:113,0:63:0,63,14050 0/1:222,15:99:322,0,27417       0/0:123,1:58:0,58,14897 0/1:174,3:44:44,0,21596 0/1:80,4:67:67,0,10023  0/1:105,3:22:22,0,12567
    

    debug info :

    http://cdn.vanillaforums.com/gatk.vanillaforums.com/FileUpload/c9/4f4ac3fc784e49b053f465f727b77a.txt

    igv_snapshot.png
    1891 x 729 - 29K
    txt
    txt
    debug.txt
    14K
    Post edited by Geraldine_VdAuwera on
  • rpoplinrpoplin Posts: 122GATK Developer mod

    Hi there,

    There aren't any deletions in the reads you are showing so I'm not sure what can be said about the evidence for or against the variant.

    I wonder if an issue might be that you are running the BaseRecalibrator with a small targeted sequencing bam. Maybe there isn't enough data to accurately estimate the error rate? What happens when you run with the original, unrecalibrated bam file?

    Thanks,

Sign In or Register to comment.