Does it hurt to run the "data pre-processing for variant discovery" on "analysis-ready" bam files?

This might be a very naive question, but I would like to confirm.

So I have some bam files from a collaborator (who got those files from Broad Institute, I believe) but I do not know whether they are "analysis-ready" or not (probably yes?). Is there a reliable way to check whether those BAM files are "analysis-ready" or not?

To be safe, I'm running the "Best Practice" Data pre-processing for variant discovery on those files. So if they are analysis-ready, it shouldn't hurt, right? Thank you!


  AdelaideR

    Hi @minimax

    I would ask the collaborator to which genome they aligned their reads. That will make a difference in putting in a value for the reference genome (hg19 or GRCh38)

    A bam is just a file that contains reads aligned to the reference, so whether they are "analysis ready" can depend on a lot of factors - how they were sequenced, if the DNA was extracted from tissue that was contaminated, etc. Many of these issues originate in the wet lab, so having a good idea about how the samples were treated, sequenced and preprocessed can help you make the decision as to whether they will be ready for

