To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

FastaAlternateReferenceMaker -- how to create the two fasta version of a diploid ?

StooffStooff Member
edited July 2016 in Ask the GATK team

Hi there,
I have been struggling recently with my data. Indeed I have aligned my individuals to a reference fasta and now I have a vcf. Thanks to FastaAlternateReferenceMaker I can generate the alternate reference. But here is my struggle, if I have a SNPs homozigosity thus my fasta reference will differ from the real reference. So that's why I would like to know how to create a fasta including all the genes that include SNPs that have been selected and used for creating the FastaAlternate, and that would be a Fasta non-alternate version...

I thought of selecting the SNPS strongly on my vcf but in that case I will miss all the genes that will be discarded... Any solution for me?

Please let me know,
Best,
Stéphanie

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Stooff
    Hi Stéphanie,

    I am a little confused. Can you post some examples of what you mean?

    Do you mean you want the best supported allele to be inserted at a position?

    Thanks,
    Sheila

  • StooffStooff Member
    edited July 2016

    Hi Sheila,

    I didn't get that you had replied before... I meant something like following:
    Ref : ATATGG
    Individual ref (2 1/1) : ACAGG
    Individual alt (2 1/1 & 4 0/1) : ACATG
    So if I ask fastaAlternateMaker it will give me the individual alt sequence. But if my individuals are diploid I guess I should use as well the individual ref.
    Do you see what I mean? In that case the Ref doesn't fit the individual ref and I would like to get the sequences of these two (individual ref & alt).

    Best,
    Stéphanie

    PS: I forgot to mention that I found a solution recently thanks to a colleague's script on bioPython. (the only thing is that this is not at all automatised.)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Stooff
    Hi Stéphanie,

    Ah, so you have found a solution? FastaAlternateReferenceMaker is not designed for what you are asking. It only chooses an alternate allele at random to insert into the FASTA.

    -Sheila

  • StooffStooff Member

    Hi Sheila,

    Yes I totally understand but I wonder in the case of a different diploid from the reference why not offering the option to get two fasta representing the diploidy : meaning only 1/1 SNPs on one individual reference fasta and one alternate fasta version that would use FastaAlternateReferenceMaker.
    Is my question out of bounder for you? I am pretty sure I am not the only one working with diploids and for those who are working on population genetics like me it might be of use.

    In case sorry for the bother. I in any case appreciate this option and GATK in general. I use it on a daily basis so I was surprised when I didn't find this option.

    Thanks for the support and if someone try to find a solution to this problem I guess they could use the vcfx from http://www.castelli-lab.net/apps/apps_vcfx.php but I didn't try it so cannot advise it.

    Question: I understand that the FASTA created with FastaAlternateReferenceMaker is created by choosing an alternate allele at random to insert. But in case of biallelic vcp, I guess all the 0/1 and 1/1 end up in the FASTA created, no?!

    Stéph

  • StooffStooff Member

    @Geraldine_VdAuwera I totally agree and understand your point. Unfortunately I could not offer any clear implementation with my poor script-writting skills. If anyone needs a script under bio-python I do have it though.

    But please could you reply to the following question : But in case of biallelic vcf, are all the 0/1 and 1/1 ending up in the FASTA created ? (the randomness concerning only the possible alternate 0/1 or 0/2 I suppose?!)

    Please let me know, and many thanks for the replies this time and all the others. It is a great pleasure to work with GATK and to have such a support if needed.

    S.

Sign In or Register to comment.