New to the forum? Ask your questions here!

Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Broadie, Moderator admin
edited November 2018 in Ask the GATK team

We are facing a high volume of spam that we are working to prevent. Due to this, new users are only able to comment on current question threads, not create new ones. Please post your GATK question here if this applies to you. We apologize for this inconvenience and should have this resolved in the next few days.

Best Answers


  • Dear GATK-Team,

    I want to use CombineGVCF to combine 242samples into one file to analysis with GenotypeGVCFs _and _SelectVariants.

    I think I don't see something obvious but how do I do that in the call without entering all 242 file names manually?
    I tried (which did not work):

        java -jar GenomeAnalysisTK.jar \
           -T CombineGVCFs \
           -R reference.fasta \
           --variant *.g.vcf \
           -o output.g.vcf

    Also I would say a 'for'- or 'while'-loop seem not applicable. So how can I do this without entering all 242 file names manually?

    Thank you for your time and Ideas.

  • Dear GATK team,
    I have a trouble with applying hard filters to a call set. I did 1.Extract the SNPs from the call set, and I wanted to do 2. Apply the filter to the SNP call set and I typed: ./gatk- VariantFiltration -V raw_snps.vcf -R /home/path/to reference.fa -O filtered_snps.vcf --"QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"
    BUT a got the error message: QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" IS NOT A RECOGNIZED OPTION? Can you help me with this? Thank you in advance!
    All the best,

  • @Geraldine_VdAuwera
    thank you very much Geraldine for your fast response and help. I will try it with the textfile and will look into GenomicsDB.

    All the best,

  • @Akidne Many thanks! That works!

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @Rita_S Welcome to the forum, we have received your question and will get back to you.

  • zongjizongji Member

    Hi there,

    Is it possible to use the set of SNPs generated by GATK to directly generate the site frequency spectrum? Although ANGSD has this function, its input is bam file.

    Can someone tell me how to calculate SFS using the results of GATK?

    Thanks a lot.

    This Ticket has been deleted from Zendesk
    This Ticket has been deleted from Zendesk
  • Several Cromwell newbie questions here:

    1. Are out-of-memory errors retry-able? This is unfortunately a common problem in my multi-user HPC environment, e.g. if a competing job on the same node as my task didn't request sufficient slots and squeezed out my task as a result. From my local testing, killing a task command manually from a terminal does not result in a retry, but I haven't had the chance to force an OOM failure yet.

    2. Is there a recommended way to write task definitions so that they can support multiple backends (including Local), without getting the log warning about "unrecognized runtime attribute"? Or do tasks and workflows always need to be backend-specific? The current behavior (at least with my rough workflows) seems too noisy and inflexible.

    3. When I tried to use the SGE backend I quickly ran into problems with executables not being on path. (A) is there a way to propagate $PATH for a workflow? (B) the task actually just hangs and never completes, rather than registering the failure (or even writing the error to stderr or stdout), is this a known bug? (Using 35-71debed-SNAP, if it matters - although what I actually did to build it was "git checkout 36".)

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @natechols - Thank you for sharing your question, we did not have a link to the WDL/Cromwell forum in our banner, but it can be found here:

    I would recommend posting your question there to get a faster response. Let us know if you have any trouble doing so.

  • @AdelaideR I posted here because the WDL/Cromwell forum won't let new users post questions.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @natechols - We have a restriction on new users with less than 10 posting points. You are at 7 currently. Maybe if we have a little back and forth, you will be able to post after your next response?

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    @natechols -

    1. Not at present but this is certainly on our radar. Unless @Ruchi can contradict me... did your "always retry this job up to n times" apply to SGE memory failures?
      1.b. Not terminating certainly sounds like a bug... could you provide more detail on what caused it?
    2. If you know that you've written a task to be multi-backend then it's fine to ignore those "unrecognized" warnings. They're intended to help you catch typos, say if you had dockar: "ubuntu:latest" for example.
    3. When you submit jobs to SGE without docker, I believe you're relying on whatever environment the cluster has set up for you. If that sounds OK because the environment should be ok, you might just need to use a fully qualified path to the tool you want to use?
  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @zongii I found a reference to this issue from a [previous forum posting] (

    They recommended RTGTools for calculating allele frequency from a VCF file.

    I would be interested to hear how this works out. We have the JEXL expression tool in GATK, but it may not be exactly what you are looking for if you care about the spectrum, and not just filtering on allele frequency.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @natechols We have added another link to the banner for the WDL/Cromwell section of the forum.

    Here is the link

    Thank you for your patience.

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    @ChrisL @natechols -- the SGE memory failures should certainly be retried by maxRetries. Please report back if you don't see that working!

  • Thanks all for the help, I will follow up with further questions on the other board. @ChrisL my complaint about (2) is that this makes debugging more difficult because there will always be a bunch of spurious warning messages. Obviously I can modify the code to change the log level, but I'd prefer to avoid using custom builds in production if possible.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Thanks @natechols, if you could move this discussion over the Cromwell forum to make sure that they see your questions right away. This is the GATK forum and they may not see your request. @ChrisL if you could keep an eye on this thread, it would be much appreciated in getting it into the Cromwell space for future reference.

  • jparker4jparker4 Member

    Hi GATK team, I posted here last Friday but my post seems to have disappeared. I am having a problem when running GenomicsDBImport on a set of gVCF files.

    The error in question is: A USER ERROR has occurred: Failed to create reader from file:///mnt/fastdata/mbp15jdp/GTEx/SRA-private/SCAPT_gVCF/Variantcalls.dir/Frontal_Cortex-5074-NPJ8.g.vcf.gz

    And the statement I am running is:
    gatk --java-options -Xmx10G GenomicsDBImport -V Variantcalls.dir/Frontal_Cortex-5074-NPJ8.g.vcf.gz -V Variantcalls.dir/Frontal_Cortex-4565-NPJ8.g.vcf.gz -V Variantcalls.dir/Frontal_Cortex-4996-NPJ8.g.vcf.gz -V Variantcalls.dir/Frontal_Cortex-4005-NPJ8.g.vcf.gz -V Variantcalls.dir/Frontal_Cortex-4435-NPJ8.g.vcf.gz -V Variantcalls.dir/Cerebellar_Hemisphere-4643-NPJ8.g.vcf.gz -V

    Any help would be much appreciated, thanks.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @jparker4

    Lets handle this issue here.


Sign In or Register to comment.