Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
best practices for calling variants in RNAseq
I have been trying to follow your recommendations for processing RNA-seq data. For the most part the recommendations are easy to follow and implement (thank you!) BUT Iv'e hit a snafu:
attempting to run BaseRecalibrator (GATK v3.5) on a non-model organism for which there is no set of variants. The docs for this tool specifically state --knownSites is optional with a default of NA
--knownSites NA A database of known polymorphic sites,
yet when I try to run without specifying this option I get this error:
ERROR MESSAGE: Invalid command line: This calculation is critically dependent on being able to mask out known variant sites. Please provide a VCF file containing known sites of genetic variation.
suggesting that a vcf file of known sites is required. I would think there should be a way to recalibrate for machine artifacts despite not knowing variants in advance. Why is this optional input of providing a vcf apparently not optional, and is there a way to recalibrate these bamfiles in the absence of giving the program known variants?