Multiple input generate Multiple output

Dear GATK team,

As far as I know we could add several input to a GATK call and run it , then results will be written to a SINGLE output file where in the last column per input there will be a column. So my question is that: Is there a way to generate output per inputs? for example if I add 100 samples as input, GATK automatically generate 100 vcf's ?

With the best regards

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi Amin,

    Sorry. You will have to run Haplotype Caller on each bam file separately if you wish to have a single VCF output for each sample.


  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @amin_davani I'm not cool enough to be part of the GATK team, but I'm pretty sure the answer to your question is no. You need to do that in a secondary step after variant calling. I'm not sure why one would want to do that...

  • amin_davaniamin_davani LeuvenMember
    edited August 2015

    @Sheila Thank you very much, I found a way to get what I need by using bcftools, I still need to validate the results.
    @tommycarstensen as @Geraldine_VdAuwera said you are cool enough and so kind to reply. the reason I try to test this is that we developed a system for variant sharing named NGS-Logistics, at each center we use GATK to call the variants directly from the bam files by this way we are guaranteeing that the variants called with the same parameters among different centers. For the moment we run GATK per sample. To reduce the time of this process I thought that if for every couple of samples run GATK once the process time decrease so this was the reason I was looking for a solution.
    @Geraldine_VdAuwera what is the reason GATK could not provide the output like this?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm not sure I understand what you're trying to do, Amin. Have you read our Bedt Practices documentation? It explains why the workflow we recommend is the way it is.

  • amin_davaniamin_davani LeuvenMember

    @Geraldine_VdAuwera yes I read that , Assume we have something around 2000 Exome samples in one center 1000 in another and so on, for point query (Chr:Position) among the samples located in the first center I have too run GATK 2000 times. What we have in mind is that to run GATK only one time. I know we could add several input to a GATK call but the output will be only one file with 2000 column at the end. that's fine, we could parse those columns but the main issue is the ALT column. we can simply calculate that based on the related sample information as well, But if GATK provide an option to accept several input and generate separate output (per input) it will be much easier and of course results will be more trustable.

