Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Multiple input generate Multiple output

Dear GATK team,

As far as I know we could add several input to a GATK call and run it , then results will be written to a SINGLE output file where in the last column per input there will be a column. So my question is that: Is there a way to generate output per inputs? for example if I add 100 samples as input, GATK automatically generate 100 vcf's ?

With the best regards

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi Amin,

    Sorry. You will have to run Haplotype Caller on each bam file separately if you wish to have a single VCF output for each sample.


  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @amin_davani I'm not cool enough to be part of the GATK team, but I'm pretty sure the answer to your question is no. You need to do that in a secondary step after variant calling. I'm not sure why one would want to do that...

  • amin_davaniamin_davani LeuvenMember
    edited August 2015

    @Sheila Thank you very much, I found a way to get what I need by using bcftools, I still need to validate the results.
    @tommycarstensen as @Geraldine_VdAuwera said you are cool enough and so kind to reply. the reason I try to test this is that we developed a system for variant sharing named NGS-Logistics, at each center we use GATK to call the variants directly from the bam files by this way we are guaranteeing that the variants called with the same parameters among different centers. For the moment we run GATK per sample. To reduce the time of this process I thought that if for every couple of samples run GATK once the process time decrease so this was the reason I was looking for a solution.
    @Geraldine_VdAuwera what is the reason GATK could not provide the output like this?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm not sure I understand what you're trying to do, Amin. Have you read our Bedt Practices documentation? It explains why the workflow we recommend is the way it is.

  • amin_davaniamin_davani LeuvenMember

    @Geraldine_VdAuwera yes I read that , Assume we have something around 2000 Exome samples in one center 1000 in another and so on, for point query (Chr:Position) among the samples located in the first center I have too run GATK 2000 times. What we have in mind is that to run GATK only one time. I know we could add several input to a GATK call but the output will be only one file with 2000 column at the end. that's fine, we could parse those columns but the main issue is the ALT column. we can simply calculate that based on the related sample information as well, But if GATK provide an option to accept several input and generate separate output (per input) it will be much easier and of course results will be more trustable.

Sign In or Register to comment.