We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

FastaAlternateReferenceMaker for several individuals ?

christopherSchristopherS Member
edited October 2012 in Ask the GATK team

Hi GATK team,

I have a VCF file (from GATK) containing variants for a total of 20 individuals and I'm wondering how to get the consensus sequences for each individual regarding its own polymorphism. Some individuals may not show polymorphism at a particular position in a contig whereas some others may. I've checked the GATK dedicated tool (FastaAlternateReferenceMaker) but it doesn't answer my question as only one consensus is generated. My requirement would be to get as many outputs files (containing consensus file) as mapped individuals.

Is there anyway to get this task achieved using GATK?


Post edited by Geraldine_VdAuwera on


  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Sorry, C, but unfortunately this isn't possible with GATK tools.

  • pdexheimerpdexheimer Member ✭✭✭✭

    Maybe I'm misunderstanding the question, but couldn't you do this with SelectVariants followed by FastaAlternateReferenceMaker for each individual? You wouldn't get phased contigs, but you could at least get individual-specific variants

  • Hi pdexheimer,

    You were definitely right ! I used SelectVariants to get one VCF file per individual and then run FastaAlternateReferenceMaker to generate the sequences on each of these individual VCF file ! I checked using a ClustalW alignment and found my SNPs back at the right position depending of the individual.

    The drawback may be when dealing with a high number of individuals (have to repeat the command). However, in my case, I managed quite well. This may be implemented in the next versions of GATK?

  • pdexheimerpdexheimer Member ✭✭✭✭

    I don't know about future versions, but I think it would be a relatively simple Queue script to iterate over the samples in a VCF and invoke the two walkers in sequence for each sample. It wouldn't benefit from s/g, but you could parallelize across samples trivially

Sign In or Register to comment.