To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

VCF to FASTA

I have a multisample VCF file generated from GATK and I need to convert it into FASTA format for performing Multiple sequence alignments.

Briefly, my protocol is:

1) Convert the multisample VCF to single sample VCF (using GATK, SelectVariants option)
2) Convert the single sample VCF to FASTA for each genotype (using GATK, FastaAlternateReferenceMaker option)
3) Sequence alignments

After two weeks of trying this out, I figured that all steps work fine except for Step 2 of the protocol above. GATK does actually convert the
VCF to FASTA but it doesn't take into consideration the genotype (GT) or allele depth (AD) information which is actually the deciding factor for the differences in alleles of the different genotypes. In the end, I have the same fasta file for all the genotypes.

I am not sure if this is the right way to approach this problem. Do you have any better suggestions for performing this task?

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @suda_ravindran
    Hi,

    Can you post your command for SelectVariants?

    Thanks,
    Sheila

  • suda_ravindransuda_ravindran HamburgMember

    Hi Sheila,

    I used this command for SelectVariants (I am just showing an example for extracting my sample called "G1.6"):

    java -jar GenomeAnalysisTK.jar \
    -T SelectVariants \
    -R allaugust.okay.tr.fasta \
    -V allpops.vcf \
    -o G16.vcf \
    -sn G1.6

    This was my FastaAlternateReferenceMaker command:

    java -jar GenomeAnalysisTK.jar \
    -T FastaAlternateReferenceMaker \
    -R allaugust.okay.tr.fasta \
    -o G16.fasta \
    -V G16.vcf \

    I used the same command for all my other genotypes. I have 24 gentoypes in total, coming from four different populations. Each population has 6 genotypes.

    Thank you,
    Suda

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @suda_ravindran
    Hi Suda,

    Try adding -env and -trimAlternates to your command.

    -Sheila

Sign In or Register to comment.