Significant difference in VQSR or VariantAnnotation between v1.6 and v2.2-10

kyasunokyasuno Member
edited November 2012 in Ask the GATK team


I observed a significant difference of the variant call sets from the same exomes between v1.6 and v2.2(-10).
In fact, I observed a significant decrease in the overall novel TiTv in the latter call sets from around 2.6 to 2.1 at TruthSensitivity threshold at 99.0.
When I looked at a sample to compare variant sites using VariantEval, it showed that

Filter JexlExpression Novelty nTi nTv tiTvRatio
called Intersection known 14624 4563 3.2
called Intersection novel 856 312 2.74
called filterIngatk22-gatk16 known 264 132 2
called filterIngatk22-gatk16 novel 28 18 1.56
called gatk16 known 3 1 3
called gatk16 novel 1 1 1
called gatk22-filterIngatk16 known 258 94 2.74
called gatk22-filterIngatk16 novel 144 425 0.34
called gatk22 known 2 2 1
called gatk22 novel 17 30 0.57
filtered FilteredInAll known 1344 649 2.07
filtered FilteredInAll novel 1076 1642 0.66

The novel TiTv of new calls in v2.2 not found in v1.6 or called in v2.2 but filtered in v1.6 demonstrated novel TiTv around 0.5. So I suspect that VQSLOD scoring (or ranking) of SNPs was changed substantially in somewhat an unfavorable way.

The major updates in v2.2 affecting my result were BQSRv2, ReduceReads, UG and VariantAnnotation. (Too many things to pin-point the culprit...)
The previous BAM processing and variant calls were made using v1.6.
For the new call set, I used v2.1-9 (so after serious bug fix in ReduceReads, thank you for the fix) for BQSRv2 and ReduceReads and v2.2-10 for UG and VQSR.

As a first clue, I found that distribution of FS values changed dramatically from the v1.6 (please see attached plots). Although I recognized that FS value calculations were recently updated, the distribution of previous FS values (please see attached) makes more sense for me because the current FS values do not seem to provide us information to classify true positives and false positives.

Thanks in advance.

