X Chromosome gentyping

Now that GATK supports ploidy, how do people generate ChrX calls. My first guess is to have 2 separate genotyping runs, one for males with ploidy=1 and a second for females only with ploidy=2?

Best Answer

Answers

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @jlrflores I just add --intervals X (or chrX depending on your reference sequence) to the command line. You would have to treat PARs differently. I think keeping it simple is the better option, but I could be wrong.

  • mayaabmayaab IsraelMember ✭✭

    I wonder, running with that parameter makes GATK to run only on chrX. How does this solve the problem?

  • mayaabmayaab IsraelMember ✭✭

    Thanks Sheila for clearing me this issue!

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @sheila is running with ploidy = 1 for men and ploidy =2 for women GATK best practices for human chromX variant calling? Do you also recommend running VariantRecalibrator separately for men and women? Do you recommend running HC in GENOTYPE_GIVEN_ALLELES mode after VR (or before VR) to generate calls for the union set? Or perhaps there are no recommendations and I have to figure it out myself? Thanks!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen‌

    Hi,

    A new Best Practices is in revision and will be updated in the near future.
    As for how to handle the sex chromosomes, we have not fully reviewed it yet, but our suggestion is to run with ploidy = 1 for men and ploidy = 2 for women.
    You do not have to run VariantRecalibrator separately for men and women. It can accept different ploidies becuase it does not look at the genotype fields.
    You do not need to run HC in GENOTYPE_GIVEN_ALLELES mode at all.

    -Sheila

  • angelangel Member

    Hi All,

    Thanks for the helpful discussion on this topic.

    @Sheila, after running GenotypeGVCF separately for male and female. What is the best way to merge the male and female vcf files back together for a single VariantRecalibrator run?

    Thanks,
    angel

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You don't; instead you run GenotypeGVCFs on all samples together. As long as you're running version 3.4 or 3.3 it will handle the ploidy differences appropriately.

  • KlausNZKlausNZ Member ✭✭

    Hi All,

    Thanks for the discussion so far - could you please confirm that this is the recommended ploidy-correct, mixed-gender approach (irrelevant options omitted for clarity, and apologies to the mitochondrion):
    Boys:
    HaplotypeCaller -i boy.bam -ploidy 2 -L 1-22 -o boy_1-22.g.vcf
    HaplotypeCaller -i boy.bam -ploidy 1 -L XY -o boy_XY.g.vcf
    Girls:
    HaplotypeCaller -i girl.bam -ploidy 2 -L 1-22X -o girl_1-22X.g.vcf

    ?? Any need to merge boy_1-22.g.vcf with boy_XY.g.vcf ??

    Joint genotyping to produce a vcf with two SAMPLE columns (girl boy) and 24 chromosomes (1-22XY):
    GenotypeGVCFs -v girl_1-22X.g.vcf -v boy_1-22.g.vcf -v boy_XY.g.vcf -o boy+girl.raw.vcf
    or
    GenotypeGVCFs -v girl_1-22X.g.vcf -v boy_1-22XY.g.vcf -o girl+boy.raw.vcf (if prior merging required)

    If yes, yay! - thanks for the confirmation!
    @Geraldine_VdAuwera , could you please clarify your comment that 'GenotypeGVCFs will handle ploidy differences appropriately'. Is it not necessary to provide a value for -sample_ploidy?? If we need to provide a value, what would that value be for the above example?

    If no, please correct

    Many thanks in advance!
    K
    PS: -L options of course provided in proper format!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @KlausNZ

    First, a caveat: last I heard we didn't actually process boys & girls any differently in our production pipeline; not because we're that insistent on equal opportunity (though we are for the rest, I'm happy to say) but because very few end-users seem to care that much about X and Y (not to mention MT, poor thing). Presumably because when something is sex-linked it tends to be obvious? My understanding is that people who do care go back and reprocess X and Y. But this could change in future since it's now much easier to set up a pipeline to handle them appropriately -- as you demonstrate.

    So yes, if you do care, then that would indeed be the way to do it properly. I don't think you need to merge boys' XY file with the rest as the engine should lump everything together at runtime when you joint-genotype, but if you wanted to do it to keep things tidy, you could do it with CatVariants.

    You only need to tell HC what the ploidy is (if not 2); in the next step both CombineGVCFs and GenotypeGVCFs are smart enough to understand, based on parsing the GT value, what is the ploidy of any particular sample at any particular site, and to do the right thing with that information. Sweet, right?

  • KlausNZKlausNZ Member ✭✭

    Thanks Geraldine, that is indeed super-sweet! We do care about X, and of course want to drive the fantastic car you've built as best was we can. The sex chromosomes are over-represented in the 'Mendelian violations' set (GATK and other tools), so we're hoping to improve this.

    Many thanks again!
    K

  • gerhtbgerhtb South AfricaMember

    @Sheila,

    Hi Sheila

    I just like to know if there is any updates on best practices for gender calling?

    Do you still recommend to run with ploidy = 1 for men and ploidy = 2 for woman?

    Do you maybe have any comments of rather treating the male PAR (ploidy = 2 ) and non PAR (ploidy = 1) regions differently?

    Thanks,
    Gerrit

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @gerhtb
    Hi Gerrit,

    We really don't work much with the sex chromosomes, so there have not been many updates.

    We do still recommend ploidy 1 for men and ploidy 2 for women.

    As for the PAR regions, we really cannot comment because it is not an area of research for us.

    -Sheila

  • gerhtbgerhtb South AfricaMember
    edited July 2015
  • SteveLSteveL BarcelonaMember ✭✭

    Hi @tommycarstensen @gerhtb @KlausNZ, I am about to start addressing this issue and I was wondering if any of you have any tips or comments on how you have dealt with X-Chromosome (and "Y") calls.

    So far I have a batch of WES samples that have been mapped with BWA to hs37d5 and called with HC using ploidy of 2. Unfortunately I don't always know beforehand who is male/female (though it is clear after calling of course), so my plan is to take a sample and rerun the sex chromosomes with ploidy of 1, and see how the results compare for each gender.

    I presume one/all of you may have done something similar already and I am curious how useful you found the process. In particular I am curious if you came to any conclusions regarding the PAR regions, and whether you processed them any differently. One possible option I see would be to always call males twice, and then use bedtools to build a X-chromosome VCF that is haploid where it should be and diploid in the PAR.

    Do you think this makes sense?

    Does anybody do analysis of Y-chromosome calls? I see that ExAC has produced some calls on the Y. (e.g. http://exac.broadinstitute.org/region/Y-2600000-2675000)

    I am surprised this is not something that concerns all y'all at the Broad @Sheila, as we often receive requests from collaborators to investigate potentially X-linked Mendelian traits.

    Issue · Github
    by Sheila

    Issue Number
    159
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @SteveL
    Hi,

    We tend to exclude the Y chromosome in whole genome calling, but in whole exome calling, we include it because we have targets there.

    We don't do anything special when calling on the X chromosome.

    I think there is some data on the PAR regions we want to publish but that has not gotten done yet.

    -Sheila

  • KlausNZKlausNZ Member ✭✭

    @SteveL,

    Hi, good idea re the PAR. We're not doing anything deliberately special about the PARs; we limit calling to the GIAB 'safe intervals' and it is possible that some of the PAR may be excluded. We call autosomes and sex-chromosomes separately (not twice), and you could do similar and simply partition the PAR and non-PAR into the correct 'ploidy set', then merge the VCFs. Maybe run Coverage on selected X/Y regions to figure out gender before running HaplotypeCaller?

    BTW, one of the unforeseen (should have seen that one coming...) outcomes of haploid calling is that genotypes are emitted as '1', not '1/1' (etc). That makes of course perfect sense, but causes problems with some of the downstream tools (no complaint, it includes some of our own own). Also, we store all variants in a sql database for filtering, so the queries have become a lot more convoluted because of the dual notation. Although ploidy-correct calling clearly reduces MV calls, and fewer HC calls change genotype after pedigree refinement, we're in two minds about its overall value because of the downstream effects.

    Keen to hear what you find out!

  • Hi, I have a follow-up question on variant calling of chr X (Y and MT as well). After calling using sex-dependent ploidy parameters, how should we do variant-based QC on these variants not on autosomes. I have read it somewhere in this forum that these variants should not be included in VQSR. Should I simply do hard filtering or is there any other suggestions?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @claratsm
    Hi,

    I am assuming you are referring to this thread: http://gatkforums.broadinstitute.org/discussion/2895/vqsr-and-sex-chromosomes
    As Geraldine mentioned, we do not have any specific recommendations because we do not work much with the sex chromosomes. I am hoping someone else in this thread or other users will jump in with some recommendations.

    -Sheila

  • SteveLSteveL BarcelonaMember ✭✭

    I have some files generated, just haven't had a chance to look at them yet, but I will provide feedback as soon as I can - almost certainly by the end of next week, but hopefully sooner.

  • Looking forward to your feedback

  • @KlausNZ, could you kindly explain what are this "safe intervals"?

Sign In or Register to comment.