How can I run BaseRecalibrator with an empty VCF file?

Dear all,
I have aligned my sequences against a made up genome composed of different genomes. Now I need to re-map (BQSR) the alignments using GATK. The command to do so is:
gatk BaseRecalibrator \
-R {ref}.fa \
-I {deduplicated_alignment}.bam \
-O {deduplicated_alignment}_recalibration.table \
--known-sites {ref}.vcf

Since the reference genome is essentially fake, there is no data on genome variability (or better: it will take years to find out all the publications on genetic variability of the many genomes I have pasted together).
Can I run this step WITHOUT the VCF file? Or does it make sense to create an empty VCF file? and in that case, what values should I give to the different columns of the file?
Thank you

Answers

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @Gigiux

    BQSR algorithm treats every reference mismatch as an indication of error. However, real genetic variation is expected to mismatch the reference, so it is critical that a database of known polymorphic sites(known sites) is given to the tool in order to skip over those sites. If you gave it an empty vcf file, the purpose of using BQSR becomes redundant.

    Regards
    Bhanu

  • GigiuxGigiux Member

    I understand that, but as I said there are no variants for the reference file, thus the VCF would be empty. Shall I skip the BQSR altogether or is there another way of re-mapping the reads that does not require a VCF file? Thanks

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @Gigiux

    BQSR does not do re-mapping. It merely recalculates the base quality score to give better and more accurate scores.
    Having said that, if you do not have known sites information then its best to skip the BSQR step. This comes with the caveat that you will encounter false positives.

    Regards
    Bhanu

Sign In or Register to comment.