It looks like you're new here. If you want to get involved, click one of these buttons!
Hi,
I observed a significant difference of the variant call sets from the same exomes between v1.6 and v2.2(-10). In fact, I observed a significant decrease in the overall novel TiTv in the latter call sets from around 2.6 to 2.1 at TruthSensitivity threshold at 99.0. When I looked at a sample to compare variant sites using VariantEval, it showed that
Filter JexlExpression Novelty nTi nTv tiTvRatio
called Intersection known 14624 4563 3.2
called Intersection novel 856 312 2.74
called filterIngatk22-gatk16 known 264 132 2
called filterIngatk22-gatk16 novel 28 18 1.56
called gatk16 known 3 1 3
called gatk16 novel 1 1 1
called gatk22-filterIngatk16 known 258 94 2.74
called gatk22-filterIngatk16 novel 144 425 0.34
called gatk22 known 2 2 1
called gatk22 novel 17 30 0.57
filtered FilteredInAll known 1344 649 2.07
filtered FilteredInAll novel 1076 1642 0.66
The novel TiTv of new calls in v2.2 not found in v1.6 or called in v2.2 but filtered in v1.6 demonstrated novel TiTv around 0.5. So I suspect that VQSLOD scoring (or ranking) of SNPs was changed substantially in somewhat an unfavorable way.
The major updates in v2.2 affecting my result were BQSRv2, ReduceReads, UG and VariantAnnotation. (Too many things to pin-point the culprit...) The previous BAM processing and variant calls were made using v1.6. For the new call set, I used v2.1-9 (so after serious bug fix in ReduceReads, thank you for the fix) for BQSRv2 and ReduceReads and v2.2-10 for UG and VQSR.
As a first clue, I found that distribution of FS values changed dramatically from the v1.6 (please see attached plots). Although I recognized that FS value calculations were recently updated, the distribution of previous FS values (please see attached) makes more sense for me because the current FS values do not seem to provide us information to classify true positives and false positives.
Thanks in advance. Katsuhito
Geraldine_VdAuwera
Posts: 2,239 admin
Hi Katsuhito,
We're not sure what's going on here. but we'll try to find out. To start, could you please run your 2.2 pipeline again without ReduceReads to rule out a problem with RR's annotations? Sorry for the inconvenience.
Geraldine Van der Auwera, PhD
ebanks
Posts: 479 mod
Thank you very much for reporting this problem! Ultimately, the problem is that the FisherStrand annotation currently doesn't work well with reduced reads (because they are always marked as being on the forward strand). I've added a fix so that in the next release the FS calculation will ignore reduced reads.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
Answers
Let me try it. Thanks.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Geraldine,
I've just realized that I also have a variant call set that were called and filtered by using GATK v2.1-9 even though this set included other samples as well. I'm attaching a figure that compares 3 versions of FS vs QD plots. I think it suggests that ReduceReads is fine but FS calculations are different between v2.1-9 and v2.2-10.
Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •OK, that makes sense -- we recently made some changes to how the FS is calculated; seems to be a likely culprit. We'll take a closer look and get back to you once we have a clear idea of the problem.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Katsuhito, could you please submit a detailed bug report as explained here? Thanks!
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Geraldine,
For this issue, it's ambiguous which region I should select because I didn't see any error messages and FS values differ across many (if not all) regions as can be seen in the figure attached previously. Do you think that some portion of VCF files for the sites shared between callsets v1.6 and v2.1-10 would be enough to see the difference or do you really need snippets of BAMs? Let me know.
Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ah, fair point. We're definitely going to need BAM snippets, preferably containing regions where you find some different sites and some shared sites, if you can identify a region of reasonable length that contains all that.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Geraldine,
I've uploaded the requested BAM snippets, named bam_for_FS_check.tar.gz. I confirmed that v2.1-9 and v2.2-10 UGs produce very different FS values. Hope it helps to solve the issue. Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •It looks like you've uploaded an invalid bam file.
Can you please upload a valid bam file? Also, if you are not using a standard reference from our bundle, I'll need you to upload the reference files (fasta, fai, and dict) too.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Let me check it again. I could run variant calling for the bam file with the same name. Thanks (PS. I'm using humang_g1k_v37_decoy.fasta in the bundle.)
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Sorry, no, I uploaded a wrong one.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I've uploaded the file (with the same name, overwritten). Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok, in that case, another questions come to my mind: can the FisherStrand annotation from v2.1-9 be reliable? Does this issue affect only to v2.2 series? Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Yes. You should be able to use the VariantAnnotator (with -A FisherStrand) to re-annotate that value using the older version.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •That sounds good. Thank you very much!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •