The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.2 is now available at
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.

Haplotype Score Region

Hi Everyone,

I had a few questions about the haplotype score.

In the technical documentation it states that "Higher scores are indicative of regions with bad alignments, often leading to artifactual SNP and indel calls. Note that the Haplotype Score is only calculated for sites with read coverage."

How is the haplotype group for each variant site determined? e.g. Does it take the closest two variants to the query site and then treat the query variant + closest two variants as the haplotype group?

Also, in the case of multiallelic SNPs (>2 SNPs), haplotype score is inappropriate since it only looks at whether a site can be explained by the segregation of two and only two haplotypes, correct? So multiallelic snps will be assigned poor haplotype scores OR will these sites not be annotated at all? If we have a case where there is a truly biallelic SNP and a couple of samples have some reads that are erroneously calling a third allele, this variant site will be assigned a poor haplotype score overall, correct?



Best Answers


  • Hi Dr. Banks,

    Okay the second part is clear. Regarding the first question I just want to clarify. If there is no connection between previous or successive variants then does the "haplotype" represents 21 bp nucleotide sequences of an individual read? So for an ideal heterozygous site, we would expect two exactly two "haplotypes" or "piles of reads", while in a less than ideal situation some reads will contain SNP artefacts and contribute to additional "haplotypes"? Thus, the probability that a variant site can be explained by 2 segregating piles of reads is diluted?

Sign In or Register to comment.