The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# VQSR and sex chromosomes

Member Posts: 1

Hi,

Maybe I have not been able to find some obvious piece of documentation, but I am searching for best practices in using VQSR with sex chromosomes (especially X)? I am trying to do variant calling on Anopheles gambiae genomes (sex chromosomes like human) and the results with chromosome X are not very encouraging. I was wondering if there is any documentation/best practices for VQSR with especially X. Or even if people are using VQSR with sex chromosomes?

Clueless and lost,
Tiago

Tagged:

Hi Tiago,

We do not have any specific recommendations for sex chromosomes, largely because we don't do anything with them ourselves. In humans at least X and Y are notoriously difficult to call because the mapping of reads there is typically of very poor quality. And since VQSR relies on large numbers of observations to do its job, it can't deal very well with localized aberrations such as those that might affect the sex chromosomes. So if the situation is similar in Anopheles, then unfortunately you may be in for a rough ride. Perhaps someone in the community might have more experience dealing with this problem and will share their approach...

Geraldine Van der Auwera, PhD

• United KingdomMember Posts: 404 ✭✭✭

Hi @tiagoantao . Which procedure did you decide on in the end? I'm asking, because I'm unsure whether to run VQSR across the autosomes and the sex chromosomes. They have different depths among other things, and some of the annotations used by VQSR are depth derived/dependent. Thanks.

• United KingdomMember Posts: 404 ✭✭✭

This thread is related to another one on a similar topic.

I was afraid that the Y chromosome would perform differently under the VQSR model, because some of the annotations used by VQSR are depth dependent. Indeed relatively more chromY sites have DP as a culprit and fewer have QD; see the attached image. ChromX suffers from the same problem although to a lesser extent. Based on this I might run haploid regions separately through VQSR. Unless this was caused by me calling chromY as diploid and calling all of chromX as diploid for all samples in the first place. The memory and CPU usage of the semi abandoned/retired UG3.3 seems to explode, when I try to call chromY and non-PAR chromX males as haploid. Therefore I didn't do it. I'm looking forward to best practices for the sex chromosomes, but I know how much is on your plate already. Maybe I should just go and sequence bees and wasps instead...

• United KingdomMember Posts: 404 ✭✭✭

P.S. Not sure if FS is the best of annotations for SNPs (and indels), because all the values seem to cluster at 0 making it a binary classifier rather than a continuous value and it was mentioned in another thread that "the VQSR algorithm assumes that annotation values follow a gaussian distribution". See attached VQSR SNP plot for annotations FS and SOR, which shows the clustering at 0, which SOR doesn't quite suffer from.

To be honest I don't think we'll be producing best practices for sex chromosomes anytime soon. But I will remove that old link at some point to at least not mislead folks into thinking we're advocating any particular method.

Geraldine Van der Auwera, PhD