post-transcriptional modifications in RNA-seq base recalibration Best Practices?
Are the authors of GATK concerned about Post-transcriptional Modifications to mRNA negatively impacting Base Recalibration and Variant Calling via GATK RNA-seq Best practices?
Reverse Transcriptases have difficulty correctly reading modified nucleotides. The Reverse Transcriptase may produce an error at a modified nucleotide when making the cDNA. Illumina will then read the resulting cDNA correctly (and give high quality score). Thus, even though the Illumina reads are correctly reporting the base in the cDNA (with high quality scores), it will be "wrong" compared to the reference, and not masked by dbSNP since it is only a Post-transcriptional modification. This will severely reduce the resulting empirical quality scores calculated by BaseRecalibrator.
For DNA-seq, BaseRecalibrator masks "--knownSites" of polymorphism when calculating empirical Quality scores. The "--knownSites" is usually a VCF from e.g. dbSNP.
In RNA-seq, do the authors of GATK recommend any kind of VCF with known Post-transcriptional modifications in mRNA?