BQSR for RNA-seq data

I am working on calling variants in my RNA-seq data using the GATK best practices pipeline. I have made it through step 3 (split and trim and reassign mapping qualities). I then got to indel realignment which was labelled as optional and I skipped this step. Now I am at Base Recalibration and I am having trouble understanding the tool documentation. The documentation says to run the following script:

java -jar GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-R reference.fasta \
-I my_reads.bam \
-knownSites latest_dbsnp.vcf \
-o recal_data.table

My question is, how am I supposed to make the "KnownSites latest_dbsnp.vcf" file? Also, when I have finished this step, how do I use this info to get a recalibrated bam file for the variant calling in the next step? Thank you very much for your help.

Best Answer


Sign In or Register to comment.