The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Strange results generated from HaplotypeCaller when comparing "-minPruning 3" with "-minPruning 2"
Hi GATK developer,
I am working on a human WES project with an average sequencing depth 42X per individual. Recent version of GATK (v3.2-2) was used in my study. To investigate which "-minPruning" setting is better for my project, I ran HaplotypeCaller and tested the "-minPruning 3" and "-minPruning 2", so I generate two sets of gvcf files. I found some interesting cases when I compared these two sets of files.
For example in the same individual (we name this individual as A):
Locus "chr1 1654007" from the GVCF file for individual A (-minPruning 3) is listed as below:
chr1 1654007 . A . . END=1654007 GT:DP:GQ:MIN_DP:PL 0/0:64:0:64:0,0,1922
Locus "chr1 1654007" from the GVCF file for individual A (-minPruning 2) is listed as below:
chr1 1654007 rs199558549 A T, 155.90 . DB;DP=6;MLEAC=2,0;MLEAF=1.00,0.00;MQ=27.73;MQ0=0 GT:AD:DP:GQ:PL:SB 1/1:0,6,0:6:18:189,18,0,189,18,189:0,0,4,2
You can see it is quite different for the results in terms of the final genotyping call and the sequencing depth ("DP" tag) generated from "-minPruning 3" and "-minPruning 2" setting. I am wondering what exactly happens for this locus so I check this locus using IGV. I found at this locus, there are 6 reads supporting the ALT allele ("T") and 64 reads supporting the REF allele ("A"). So I believe the result from "-minPruning 3" should be correct.
Just wondering is it normal considering the different setting of "-minPruning 3" and "-minPruning 2"or is it just a bug? If it is just because of the different setting of "-minPruning 3" and "-minPruning 2", I would say the setting of "-minPruning" is critical for the final results so a guideline of how to set the "-minPruning" based on the sequencing depth will be important. How do you think?