best practices for calling variants in RNAseq
I have been trying to follow your recommendations for processing RNA-seq data. For the most part the recommendations are easy to follow and implement (thank you!) BUT Iv'e hit a snafu:
attempting to run BaseRecalibrator (GATK v3.5) on a non-model organism for which there is no set of variants. The docs for this tool specifically state --knownSites is optional with a default of NA
--knownSites NA A database of known polymorphic sites,
yet when I try to run without specifying this option I get this error:
ERROR MESSAGE: Invalid command line: This calculation is critically dependent on being able to mask out known variant sites. Please provide a VCF file containing known sites of genetic variation.
suggesting that a vcf file of known sites is required. I would think there should be a way to recalibrate for machine artifacts despite not knowing variants in advance. Why is this optional input of providing a vcf apparently not optional, and is there a way to recalibrate these bamfiles in the absence of giving the program known variants?