I have a multisample VCF file generated from GATK and I need to convert it into FASTA format for performing Multiple sequence alignments.

Briefly, my protocol is:

1) Convert the multisample VCF to single sample VCF (using GATK, SelectVariants option)
2) Convert the single sample VCF to FASTA for each genotype (using GATK, FastaAlternateReferenceMaker option)
3) Sequence alignments

After two weeks of trying this out, I figured that all steps work fine except for Step 2 of the protocol above. GATK does actually convert the
VCF to FASTA but it doesn't take into consideration the genotype (GT) or allele depth (AD) information which is actually the deciding factor for the differences in alleles of the different genotypes. In the end, I have the same fasta file for all the genotypes.

I am not sure if this is the right way to approach this problem. Do you have any better suggestions for performing this task?


Sign In or Register to comment.