The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

HaplotypeCaller misses a true variant

flapaflapa BolognaMember Posts: 3

Hi,
I'm updating my pipeline for exome sequencing analysis, so I'm experiencing the HaplotypeCaller capabilities! I have analyzed the same sample with the UnifiedGenotyper walker and the HC one and I have examined the differences between the two output vcf files and I had a very bad finding... HC failed to find a true novel variant!! I know that this is a true variants because I validated that with Sanger sequencing after the first calling with UG.

I have run UG using GATK version 1.6-11-g3b2fab9. This is the VCF line of the variant:

chr7 45123943 . A T 3436.17 PASS AC=2;AF=1.00;AN=2;BaseQRankSum=2.043;DP=114;Dels=0.00;FS=2.678;HRun=1;HaplotypeScore=0.0000;MQ=42.18;MQ0=1;MQRankSum=2.152;QD=30.14;ReadPosRankSum=-0.781;SB=-1010.47 GT:AD:DP:GQ:PL 1/1:8,105:114:99:3436,253,0

I have run HC using GATK version 2.7-4-g6f46d11 both in a single- and in a multi-sample manner but not the shadow of this variant in the VCF output..
I also noticed that together with this novel variant, HC lost other two variants upstream the first; these are the VCF lines:

chr7 45123881 rs61740891 C T 654.25 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=0.205;DB;DP=43;DS;Dels=0.00;FS=65.862;HRun=0;HaplotypeScore=2.2312;MQ=30.27;MQ0=1;MQRankSum=3.254;QD=15.22;ReadPosRankSum=-3.921;SB=-3.02 GT:AD:DP:GQ:PL 0/1:18,25:43:99:684,0,176

chr7 45123888 . C T 161.90 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=-2.293;DP=26;DS;Dels=0.00;FS=49.656;HRun=2;HaplotypeScore=0.0000;MQ=23.81;MQ0=1;MQRankSum=0.425;QD=6.23;ReadPosRankSum=-3.821;SB=-3.00 GT:AD:DP:GQ:PL 0/1:17,9:26:99:192,0,249

How is it possible?

Many thanks in advance

Best, Flavia

Best Answers

Answers

  • flapaflapa BolognaMember Posts: 3

    Hi Geraldine,

    I've just uploaded my data in the FTP server in a file named flapa_data.tar.gz; I created a BAM file for the whole chr7 in which the non-called variants fall.

    I hope this can be helpful!

    Flavia

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin

    I was able to reproduce your issue, so I'm now passing this on to the devs for in-depth debugging.

    Geraldine Van der Auwera, PhD

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 692 admin

    Hi Flavia,

    I've taken a look at your example and would like to explain what's happening. If you look carefully at the HC call in that region you'll notice that it assembles it into a very large (120bp) deletion (with 90% of your reads supporting that call). The HC believes that those "SNPs" aren't real, but rather are artifacts from a misalignment around the deletion.

    I've attached a screenshot of your data that illustrates it quite nicely. The upper half shows the nice clean HC re-alignments around the deletion. The lower half shows the original reads; notice that the coverage drops dramatically over the deletion and that those "SNPs" occur near the breakpoints. These are classic signs of mis-alignments.

    Is it possible that the Sanger sequencing validation could be interpreted in this way too?

    flavia.png
    1150 x 682 - 29K

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • flapaflapa BolognaMember Posts: 3

    Hi Eric,

    thank you so much for your very clear answer.
    The gene sequence is very repetitive; so, after your explanation, I think that also the Sanger sequencing could be interpreted in this way.
    Now I'm trying to perform a more specific PCR and I'll let you know if I'll reply the validation.

    Thanks for yor help!
    Flavia

Sign In or Register to comment.