Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SelectVariants produce empty files

nikkinathnikkinath GermanyMember

I have 8 samples of genome sequencing data with a different condition. The question is to identify variants for each sample. I followed best practice GATK for variant calling (https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145).
For variant calling i used different combinations:

  1. HaplotypeCaller -> GenotypeCaller -> SelectVariants
  2. GenotypeCaller -> HaplotypeCaller -> SelectVariants
  3. GenotypeCaller -> HaplotypeCaller -> SamSort -> SelectVariants
  4. GenotypeCaller -> HaplotypeCaller -> SamSort -> SelectVariants(Discovery option)
  5. GATK 3.4 and GATK 3.8
  6. HaplotypeCaller -> GenotypeCaller -> VCFTools

There are no error messages, It looks like SelectVariants goes through the whole file but produce empty output.

If they produce limited data I get from 300 GB (VCF file from HaplotypeCaller) to 2 GB (VCF file from SelectVariants). In this case, one sample gets limited counts of SNVs, which is a problem in downstream analysis.

I am unsure if there is some parameter that should be included for the genome data.

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited April 3

    HI @nikkinath

    1) The versions of GATK you are using, 3.6 and 3.8 are very old. We are now using GATK 4.1.1.0. I suggest you try the latest version and see if the issue persists.
    2) If you see the error even with the version update then please reach out to us with the exact commands you are using and we will help you out with it.

  • I ran GATKHaplotypeCaller without GVCF mode. I have got the correct output VCF file. I want to do the joint Genotyping for BSA QTLSeqr analysis. The GATK GenoType is only for gVCF. I cannot run this mode, it takes too much time, even if run the aligner file in intervals. My question, how can I use the joint GenoType command on the VCF file.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Sher_Afzal_Khan

    You can only perform GenotypeGVCFs on gvcf files.

    A work around it would be to use the latest GATK4.1.1.0 version Haplotypecaller because as mentioned in the release notes, there is a substantial (~33%) speedup to the HaplotypeCaller in GVCF mode (-ERC GVCF) in GATK 4.1.1.0.

Sign In or Register to comment.