Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

Effects of '--disable_indel_quals'

Hi,

Apologies if this has been previously asked or explained elsewhere, but I could not find anything by searching.

We're hoping to reduce the size of our BAM files by including the '--disable_indel_quals' flag during the BQSR, but we're wondering:
1) if we're currently only interested in calling SNPs, will the lack of indel tags affect any of our downstream analyses?, and
2) if we eventually are interested in calling indels, could we rerun BQSR on the files created with '--disable_indel_quals' to obtain those indel tags?

Thank you in advance!

Best,
Monica

Best Answer

Answers

  • mdr232mdr232 Member
    edited April 2016

    Many thanks! One more question that I've been unable to find a clear answer for -- if a newer version of the GATK is released with a different indel realignment or BQSR method, is it generally okay to rerun a BAM that has already been realigned and recalibrated through the new version's realignment and recalibration? Or will it depend on the changes between the old and new method(s)? Thanks again!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mdr232
    Hi,

    It is best to use the original bam file. However, if you have already pre-processed the bams and would like to save time, you may not even need to re-run the pre-processing steps again.

    -Sheila

  • amywilliamsamywilliams Ithaca, NYMember

    @Sheila said:
    @mdr232
    Hi,

    It is best to use the original bam file. However, if you have already pre-processed the bams and would like to save time, you may not even need to re-run the pre-processing steps again.

    -Sheila

    Hi Sheila,

    We have some bams that contain the indel tags but would like to regenerate bams without these tags (to save space). Can we start from our current realigned and recalibrated bams and rerun recalibration on these (including generating the recalibration table and applying recalibration) to get bams without the tags? I'm surprised by the requirement to use the original bam file. Since all the reads are present and realigned, why not restart the recalibration step from this bam to generate a bam without the tags?

    Many thanks,
    Amy

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited June 2016

    @amywilliams
    Hi Amy,

    That answer was for a different case. I think you are asking about simply removing the indel quals from your BAM files. In your case, you can simply re-run PrintReads with --disable_indel_quals.

    -Sheila

  • amywilliamsamywilliams Ithaca, NYMember

    Good to know. We weren't aware of this and have unfortunately rerun recalibration on a few bams. Is this a problem? If needed we can do some reprocessing of the bams from some earlier stage in the pipeline.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited July 2016

    @amywilliams
    Hi Amy,

    Have a look at this article. You should stick with the same version throughout your analysis. I would either simply remove the indel qual scores as I suggested above, or re-run all the pre-processing steps on all your BAMs with the latest version of GATK.

    -Sheila

  • amywilliamsamywilliams Ithaca, NYMember

    @Sheila:

    Thanks for the information. To clarify, the situation is this: for many bams, we can remove the indel tags using PrintReads as you mentioned (which is great). For some others, we mistakenly reran BQSR (using the same version of the GATK) in order to remove these tags. The question is, is this OK? Can BQSR work with recalibrated scores? I would assume that recalibration on top of previous recalibration would be fine since it will use the base scores in the bam itself to build a table for recalibration.

    Also, on your comment above, for a future release of the GATK, is the ideal to start from a bam that has just had markdup run on it or can we take bams that we've already run through the entire GATK pipelline and restart from the realignment step. Again, it seems like running from that step forward would work since realigning a realigned bam shouldn't hurt things and since the base scores can just be recalibrated using the most up-to-date algorithm.

    Many thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    All of what you describe will work in the sense that it will run successfully, and the results are expected to be valid. However, we prefer to ensure that all samples are processed in exactly the same way (with the same program version) to eliminate any possible batch effects, so in most cases we would revert the files back to their original state and reprocess from scratch. It's up to you to choose how strict you want to be in your process.
  • amywilliamsamywilliams Ithaca, NYMember

    @Geraldine_VdAuwera:

    OK this makes sense. I guess we should not have deleted the original bams with the quality scores from the sequencer. I guess that's the recommendation? One thing we could do to avoid batch effects would be to run BQSR twice on all the bams. In any case, thanks for your help!

    -Amy

Sign In or Register to comment.