VQSR and sex chromosomes

tiagoantaotiagoantao Posts: 1Member


Maybe I have not been able to find some obvious piece of documentation, but I am searching for best practices in using VQSR with sex chromosomes (especially X)? I am trying to do variant calling on Anopheles gambiae genomes (sex chromosomes like human) and the results with chromosome X are not very encouraging. I was wondering if there is any documentation/best practices for VQSR with especially X. Or even if people are using VQSR with sex chromosomes?

Clueless and lost,


  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,434Administrator, GATK Dev admin

    Hi Tiago,

    We do not have any specific recommendations for sex chromosomes, largely because we don't do anything with them ourselves. In humans at least X and Y are notoriously difficult to call because the mapping of reads there is typically of very poor quality. And since VQSR relies on large numbers of observations to do its job, it can't deal very well with localized aberrations such as those that might affect the sex chromosomes. So if the situation is similar in Anopheles, then unfortunately you may be in for a rough ride. Perhaps someone in the community might have more experience dealing with this problem and will share their approach...

    Geraldine Van der Auwera, PhD

  • tommycarstensentommycarstensen United KingdomPosts: 372Member ✭✭✭

    Hi @tiagoantao . Which procedure did you decide on in the end? I'm asking, because I'm unsure whether to run VQSR across the autosomes and the sex chromosomes. They have different depths among other things, and some of the annotations used by VQSR are depth derived/dependent. Thanks.

  • tommycarstensentommycarstensen United KingdomPosts: 372Member ✭✭✭

    This thread is related to another one on a similar topic.

    I was afraid that the Y chromosome would perform differently under the VQSR model, because some of the annotations used by VQSR are depth dependent. Indeed relatively more chromY sites have DP as a culprit and fewer have QD; see the attached image. ChromX suffers from the same problem although to a lesser extent. Based on this I might run haploid regions separately through VQSR. Unless this was caused by me calling chromY as diploid and calling all of chromX as diploid for all samples in the first place. The memory and CPU usage of the semi abandoned/retired UG3.3 seems to explode, when I try to call chromY and non-PAR chromX males as haploid. Therefore I didn't do it. I'm looking forward to best practices for the sex chromosomes, but I know how much is on your plate already. Maybe I should just go and sequence bees and wasps instead...

    2027 x 1217 - 436K
  • tommycarstensentommycarstensen United KingdomPosts: 372Member ✭✭✭

    P.S. Not sure if FS is the best of annotations for SNPs (and indels), because all the values seem to cluster at 0 making it a binary classifier rather than a continuous value and it was mentioned in another thread that "the VQSR algorithm assumes that annotation values follow a gaussian distribution". See attached VQSR SNP plot for annotations FS and SOR, which shows the clustering at 0, which SOR doesn't quite suffer from.

    Screen Shot 2015-02-08 at 16.54.03.png
    1608 x 1604 - 318K
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,434Administrator, GATK Dev admin

    To be honest I don't think we'll be producing best practices for sex chromosomes anytime soon. But I will remove that old link at some point to at least not mislead folks into thinking we're advocating any particular method.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.