If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
SelectVariants produce empty files
I have 8 samples of genome sequencing data with a different condition. The question is to identify variants for each sample. I followed best practice GATK for variant calling (https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145).
For variant calling i used different combinations:
- HaplotypeCaller -> GenotypeCaller -> SelectVariants
- GenotypeCaller -> HaplotypeCaller -> SelectVariants
- GenotypeCaller -> HaplotypeCaller -> SamSort -> SelectVariants
- GenotypeCaller -> HaplotypeCaller -> SamSort -> SelectVariants(Discovery option)
- GATK 3.4 and GATK 3.8
- HaplotypeCaller -> GenotypeCaller -> VCFTools
There are no error messages, It looks like SelectVariants goes through the whole file but produce empty output.
If they produce limited data I get from 300 GB (VCF file from HaplotypeCaller) to 2 GB (VCF file from SelectVariants). In this case, one sample gets limited counts of SNVs, which is a problem in downstream analysis.
I am unsure if there is some parameter that should be included for the genome data.