FastaAlternateReferenceMaker -- how to create the two fasta version of a diploid ?

StooffStooff Member
edited July 2016 in Ask the GATK team

Hi there,
I have been struggling recently with my data. Indeed I have aligned my individuals to a reference fasta and now I have a vcf. Thanks to FastaAlternateReferenceMaker I can generate the alternate reference. But here is my struggle, if I have a SNPs homozigosity thus my fasta reference will differ from the real reference. So that's why I would like to know how to create a fasta including all the genes that include SNPs that have been selected and used for creating the FastaAlternate, and that would be a Fasta non-alternate version...

I thought of selecting the SNPS strongly on my vcf but in that case I will miss all the genes that will be discarded... Any solution for me?

Please let me know,
Best,
Stéphanie

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Stooff
    Hi Stéphanie,

    I am a little confused. Can you post some examples of what you mean?

    Do you mean you want the best supported allele to be inserted at a position?

    Thanks,
    Sheila

  • StooffStooff Member
    edited July 2016

    Hi Sheila,

    I didn't get that you had replied before... I meant something like following:
    Ref : ATATGG
    Individual ref (2 1/1) : ACAGG
    Individual alt (2 1/1 & 4 0/1) : ACATG
    So if I ask fastaAlternateMaker it will give me the individual alt sequence. But if my individuals are diploid I guess I should use as well the individual ref.
    Do you see what I mean? In that case the Ref doesn't fit the individual ref and I would like to get the sequences of these two (individual ref & alt).

    Best,
    Stéphanie

    PS: I forgot to mention that I found a solution recently thanks to a colleague's script on bioPython. (the only thing is that this is not at all automatised.)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Stooff
    Hi Stéphanie,

    Ah, so you have found a solution? FastaAlternateReferenceMaker is not designed for what you are asking. It only chooses an alternate allele at random to insert into the FASTA.

    -Sheila

  • StooffStooff Member

    Hi Sheila,

    Yes I totally understand but I wonder in the case of a different diploid from the reference why not offering the option to get two fasta representing the diploidy : meaning only 1/1 SNPs on one individual reference fasta and one alternate fasta version that would use FastaAlternateReferenceMaker.
    Is my question out of bounder for you? I am pretty sure I am not the only one working with diploids and for those who are working on population genetics like me it might be of use.

    In case sorry for the bother. I in any case appreciate this option and GATK in general. I use it on a daily basis so I was surprised when I didn't find this option.

    Thanks for the support and if someone try to find a solution to this problem I guess they could use the vcfx from http://www.castelli-lab.net/apps/apps_vcfx.php but I didn't try it so cannot advise it.

    Question: I understand that the FASTA created with FastaAlternateReferenceMaker is created by choosing an alternate allele at random to insert. But in case of biallelic vcp, I guess all the 0/1 and 1/1 end up in the FASTA created, no?!

    Stéph

  • StooffStooff Member

    @Geraldine_VdAuwera I totally agree and understand your point. Unfortunately I could not offer any clear implementation with my poor script-writting skills. If anyone needs a script under bio-python I do have it though.

    But please could you reply to the following question : But in case of biallelic vcf, are all the 0/1 and 1/1 ending up in the FASTA created ? (the randomness concerning only the possible alternate 0/1 or 0/2 I suppose?!)

    Please let me know, and many thanks for the replies this time and all the others. It is a great pleasure to work with GATK and to have such a support if needed.

    S.

Sign In or Register to comment.