The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the github release page for download and details.

Significant difference in VQSR or VariantAnnotation between v1.6 and v2.2-10

kyasunokyasuno Member
edited November 2012 in Ask the GATK team

Hi,

I observed a significant difference of the variant call sets from the same exomes between v1.6 and v2.2(-10).
In fact, I observed a significant decrease in the overall novel TiTv in the latter call sets from around 2.6 to 2.1 at TruthSensitivity threshold at 99.0.
When I looked at a sample to compare variant sites using VariantEval, it showed that

Filter JexlExpression Novelty nTi nTv tiTvRatio
called Intersection known 14624 4563 3.2
called Intersection novel 856 312 2.74
called filterIngatk22-gatk16 known 264 132 2
called filterIngatk22-gatk16 novel 28 18 1.56
called gatk16 known 3 1 3
called gatk16 novel 1 1 1
called gatk22-filterIngatk16 known 258 94 2.74
called gatk22-filterIngatk16 novel 144 425 0.34
called gatk22 known 2 2 1
called gatk22 novel 17 30 0.57
filtered FilteredInAll known 1344 649 2.07
filtered FilteredInAll novel 1076 1642 0.66

The novel TiTv of new calls in v2.2 not found in v1.6 or called in v2.2 but filtered in v1.6 demonstrated novel TiTv around 0.5. So I suspect that VQSLOD scoring (or ranking) of SNPs was changed substantially in somewhat an unfavorable way.

The major updates in v2.2 affecting my result were BQSRv2, ReduceReads, UG and VariantAnnotation. (Too many things to pin-point the culprit...)
The previous BAM processing and variant calls were made using v1.6.
For the new call set, I used v2.1-9 (so after serious bug fix in ReduceReads, thank you for the fix) for BQSRv2 and ReduceReads and v2.2-10 for UG and VQSR.

As a first clue, I found that distribution of FS values changed dramatically from the v1.6 (please see attached plots). Although I recognized that FS value calculations were recently updated, the distribution of previous FS values (please see attached) makes more sense for me because the current FS values do not seem to provide us information to classify true positives and false positives.

Thanks in advance.
Katsuhito

comp_v1.6_v2.2_FS.png
1500 x 529 - 242K

Best Answers

Answers

  • Let me try it. Thanks.

  • Hi Geraldine,

    I've just realized that I also have a variant call set that were called and filtered by using GATK v2.1-9 even though this set included other samples as well. I'm attaching a figure that compares 3 versions of FS vs QD plots. I think it suggests that ReduceReads is fine but FS calculations are different between v2.1-9 and v2.2-10.

    Thanks

    comp_v1.6_v2.2_v2.1.png
    1500 x 1110 - 374K
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    OK, that makes sense -- we recently made some changes to how the FS is calculated; seems to be a likely culprit. We'll take a closer look and get back to you once we have a clear idea of the problem.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Katsuhito, could you please submit a detailed bug report as explained here? Thanks!

  • Hi Geraldine,

    For this issue, it's ambiguous which region I should select because I didn't see any error messages and FS values differ across many (if not all) regions as can be seen in the figure attached previously. Do you think that some portion of VCF files for the sites shared between callsets v1.6 and v2.1-10 would be enough to see the difference or do you really need snippets of BAMs? Let me know.

    Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah, fair point. We're definitely going to need BAM snippets, preferably containing regions where you find some different sites and some shared sites, if you can identify a region of reasonable length that contains all that.

  • Hi Geraldine,

    I've uploaded the requested BAM snippets, named bam_for_FS_check.tar.gz.
    I confirmed that v2.1-9 and v2.2-10 UGs produce very different FS values.
    Hope it helps to solve the issue.
    Thanks

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    It looks like you've uploaded an invalid bam file.

    samtools view bam_for_FS_check.bam
       [bam_header_read] invalid BAM binary header (this is not a BAM file).
       [main_samview] fail to read the header from "bam_for_FS_check.bam".
    

    Can you please upload a valid bam file? Also, if you are not using a standard reference from our bundle, I'll need you to upload the reference files (fasta, fai, and dict) too.

  • Let me check it again. I could run variant calling for the bam file with the same name.
    Thanks (PS. I'm using humang_g1k_v37_decoy.fasta in the bundle.)

  • Sorry, no, I uploaded a wrong one.

  • I've uploaded the file (with the same name, overwritten). Thanks

  • Ok, in that case, another questions come to my mind: can the FisherStrand annotation from v2.1-9 be reliable? Does this issue affect only to v2.2 series? Thanks

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Yes. You should be able to use the VariantAnnotator (with -A FisherStrand) to re-annotate that value using the older version.

  • That sounds good. Thank you very much!

Sign In or Register to comment.