Recommendations for human populations very different from the human reference

Hi,
I am working on genomes from human populations that are very different (more diverse) from the populations used to build the human reference genome. Thus we are worried about biasing our analyses by relying too much on the reference genome.
In particular, I have a question regarding the indels realigning step (RealignerTargetCreator and IndelRealigner): do you recommend to use the lists of known indels based on the human reference for populations very different from the reference? Or should I run this step without the -known option?
Thanks,
Gwenna
Best Answer
-
Sheila Broad Institute admin
@GBr
Hi Gwenna,You can try comparing your outputs from using -known file and not using -known file. I don't think it will make much of a difference if you use the -known file or do not use the -known file. As long as you input the bam file, the tool will look for areas that could benefit from realignment. Have a look at this powerpoint for more information on how Indel Realignment works: https://www.broadinstitute.org/gatk/events/slides/1503/GATKwh6-BP-2-Realignment.pdf (page 9 might be of interest
)
-Sheila
Answers
@GBr
Hi Gwenna,
You can try comparing your outputs from using -known file and not using -known file. I don't think it will make much of a difference if you use the -known file or do not use the -known file. As long as you input the bam file, the tool will look for areas that could benefit from realignment. Have a look at this powerpoint for more information on how Indel Realignment works: https://www.broadinstitute.org/gatk/events/slides/1503/GATKwh6-BP-2-Realignment.pdf (page 9 might be of interest
)
-Sheila
@Sheila
Thanks for the suggestion and for pointing to this powerpoint. I will compare what I obtain with and without a -known file.
/Gwenna