Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.


I have a multisample VCF file generated from GATK and I need to convert it into FASTA format for performing Multiple sequence alignments.

Briefly, my protocol is:

1) Convert the multisample VCF to single sample VCF (using GATK, SelectVariants option)
2) Convert the single sample VCF to FASTA for each genotype (using GATK, FastaAlternateReferenceMaker option)
3) Sequence alignments

After two weeks of trying this out, I figured that all steps work fine except for Step 2 of the protocol above. GATK does actually convert the
VCF to FASTA but it doesn't take into consideration the genotype (GT) or allele depth (AD) information which is actually the deciding factor for the differences in alleles of the different genotypes. In the end, I have the same fasta file for all the genotypes.

I am not sure if this is the right way to approach this problem. Do you have any better suggestions for performing this task?


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    Can you post your command for SelectVariants?


  • suda_ravindransuda_ravindran HamburgMember

    Hi Sheila,

    I used this command for SelectVariants (I am just showing an example for extracting my sample called "G1.6"):

    java -jar GenomeAnalysisTK.jar \
    -T SelectVariants \
    -R allaugust.okay.tr.fasta \
    -V allpops.vcf \
    -o G16.vcf \
    -sn G1.6

    This was my FastaAlternateReferenceMaker command:

    java -jar GenomeAnalysisTK.jar \
    -T FastaAlternateReferenceMaker \
    -R allaugust.okay.tr.fasta \
    -o G16.fasta \
    -V G16.vcf \

    I used the same command for all my other genotypes. I have 24 gentoypes in total, coming from four different populations. Each population has 6 genotypes.

    Thank you,

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Suda,

    Try adding -env and -trimAlternates to your command.


  • nkobmoonkobmoo ParisMember

    Hi @suda_ravindran ,

    Have you solved the problem ? I encountered also a similar problem.

  • HovhannesSahakyanHovhannesSahakyan Tartu, EstoniaMember

    @Sheila said:
    Hi Suda,

    Try adding -env and -trimAlternates to your command.


    Hello Sheila,

    I wanted to do the same with mtDNA. While using your suggestions I get the following error:

    INFO 17:09:16,431 GenomeAnalysisEngine - Strictness is SILENT
    INFO 17:09:16,534 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 17:09:16,690 GenomeAnalysisEngine - Preparing for traversal
    INFO 17:09:16,695 GenomeAnalysisEngine - Done preparing for traversal
    INFO 17:09:16,696 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 17:09:16,696 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 17:09:16,704 SelectVariants - Including sample 'HG00096'

    ERROR --
    ERROR stack trace

    org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: All samples must be diploid
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.getDiploidLikelihoodIndexes(GATKVariantContextUtils.java:688)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.determineDiploidLikelihoodIndexesToUse(GATKVariantContextUtils.java:647)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.fixDiploidGenotypesFromSubsettedAlleles(GATKVariantContextUtils.java:1421)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.updatePLsSACsAD(GATKVariantContextUtils.java:1403)
    at org.broadinstitute.gatk.tools.walkers.variantutils.SelectVariants.subsetRecord(SelectVariants.java:1080)
    at org.broadinstitute.gatk.tools.walkers.variantutils.SelectVariants.map(SelectVariants.java:854)
    at org.broadinstitute.gatk.tools.walkers.variantutils.SelectVariants.map(SelectVariants.java:309)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR MESSAGE: All samples must be diploid
    ERROR ------------------------------------------------------------------------------------------

    Can you please help to solve the problem?

  • HovhannesSahakyanHovhannesSahakyan Tartu, EstoniaMember

    I have an update on this issue. With GATK version 3.8.1 this error is eliminated. Nevertheless, all different one-sample mtDNA vcf files convert into exactly the same fasta sequences.

    Here are the commands:

    java -Xmx4G -jarGenomeAnalysisTK.jar -T SelectVariants -R hs37d5.fasta -V ALL.chrMT.phase3_callmom-v0_4.20130502.genotypes.vcf -o HG00096.vcf -sn HG00096 -env -trimAlternates

    java -Xmx4G -jar GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -L MT -R hs37d5.fasta -o HG00096.fasta -V HG00096.vcf

    Any help on this issue is appreciated!

    Issue · Github
    by bshifaw

    Issue Number
    Last Updated
    Closed By
  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Hi @HovhannesSahakyan ,

    It seems like there is an issue with using SelectVariants on diploid samples for GATK V3.

    But there is a fix for this on GATK4, could you try running the command using GATK4?

  • HovhannesSahakyanHovhannesSahakyan Tartu, EstoniaMember

    Dear @bshifaw ,

    Thank you very much for your answer!

    I wrote in my previous post that SelectVariants now work with GATK 3.8.1. The bug was corrected with that version already. Nevertheless, I ran it with GATK4 as you mentioned.

    gatk --java-options "-Xmx4G" SelectVariants -R hs37d5.fasta -V ALL.chrMT.phase3_callmom-v0_4.20130502.genotypes.vcf -O HG00096.vcf -sn HG00096 --exclude-non-variants --remove-unused-alternates

    Then I ran FastaAlternateReferenceMaker with gatk3, because it seems there is no this tool in GATK4, is there?

    And the resulted fastas for the samples HG00096 and HG00097 look exactly the same as previously. So it does not matter with which version of GATK you'll run SelectVariants.

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    I checked the Tool Docs, there doesn't seem to be FastaAlternateReferenceMaker in GATK4.

    Ahh, I miss read your previous post. Sounds like you were able to move past the first step which was creating single sample vcfs using SelectVariants. Now you are on the second step which is creating a alternate reference fasta using FastaAlternateReferenceMaker but the generated fasta are identical amongst the different fasta samples.

    Can you confirm that single sample vcfs were properly created by SelectVariants by checking the contents of the VCF files for different sample names and variants.

  • HovhannesSahakyanHovhannesSahakyan Tartu, EstoniaMember

    Thank you very much for your feedback!

    Yes, you are right. I am now in the second phase.

    I confirm that single sample VCF files are different for different samples, and the variant positions and substitutions are correct.

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    @HovhannesSahakyan ,

    Would it be possible to share two of the vcfs files you are having trouble with so that we could try to debug it on our end?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @HovhannesSahakyan

    Please follow this link for information on sharing your data: https://software.broadinstitute.org/gatk/documentation/article.php?id=1894


  • HovhannesSahakyanHovhannesSahakyan Tartu, EstoniaMember
    edited October 2018

    Dear @bshifaw and @bhanuGandham ,

    Can't upload the file. After successful login it gives the following error "550 example_vcfs_hovhannes.tar.gz: Overwrite permission denied".

    Meanwhile I managed to get fastas.

    First, I follow @bshifaw 's previous suggestion to use gatk4 and make single-sample vcf files. Previously I told that gatk3.8.1 does it, but sorry I was wrong.

    gatk --java-options "-Xmx4G" SelectVariants -R hs37d5.fasta -V ALL.chrMT.phase3_callmom-v0_4.20130502.genotypes.vcf -O HG00096.vcf -sn HG00096 --exclude-non-variants --remove-unused-alternates

    Then, I converted the single sample vcf into fasta with bcftools consensus. Vcf must be bgziped and indexed with tabix.

    samtools faidx hs37d5.fasta MT | bcftools consensus -I -o HG00096.fasta --sample HG00096 HG00096.vcf.gz

    FastaAlternateReferenceMaker of gatk3 still generates reference fastas for all vcfs (generated by gatk4).

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Thanks for sharing your solution!

Sign In or Register to comment.