I have a multisample VCF file generated from GATK and I need to convert it into FASTA format for performing Multiple sequence alignments.

Briefly, my protocol is:

1) Convert the multisample VCF to single sample VCF (using GATK, SelectVariants option)
2) Convert the single sample VCF to FASTA for each genotype (using GATK, FastaAlternateReferenceMaker option)
3) Sequence alignments

After two weeks of trying this out, I figured that all steps work fine except for Step 2 of the protocol above. GATK does actually convert the
VCF to FASTA but it doesn't take into consideration the genotype (GT) or allele depth (AD) information which is actually the deciding factor for the differences in alleles of the different genotypes. In the end, I have the same fasta file for all the genotypes.

I am not sure if this is the right way to approach this problem. Do you have any better suggestions for performing this task?


  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    Can you post your command for SelectVariants?


  • suda_ravindransuda_ravindran HamburgMember

    Hi Sheila,

    I used this command for SelectVariants (I am just showing an example for extracting my sample called "G1.6"):

    java -jar GenomeAnalysisTK.jar \
    -T SelectVariants \
    -R \
    -V allpops.vcf \
    -o G16.vcf \
    -sn G1.6

    This was my FastaAlternateReferenceMaker command:

    java -jar GenomeAnalysisTK.jar \
    -T FastaAlternateReferenceMaker \
    -R \
    -o G16.fasta \
    -V G16.vcf \

    I used the same command for all my other genotypes. I have 24 gentoypes in total, coming from four different populations. Each population has 6 genotypes.

    Thank you,

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi Suda,

    Try adding -env and -trimAlternates to your command.


Sign In or Register to comment.