pooled sequencing indel realignment creates different results in different pools.

Hi,
In the attached screenshot of IGV, there are 6 BAM panels. The top three (wider) BAM panels are three pools after indel realignment. The bottom three (thinner) BAM panels are those same three pools just prior to doing indel realignment (but after doing dedup).

I'm looking for SNPs which show major allele frequency differences between these three pools. The pools are about 20x depth and contain 20-40 individuals.

As you can see, in the realigned pools, it looks like a very promising SNP on paper where High and Ref pools have the reference A allele and the Low pool has a T allele nearly fixed for that position. However, what is clear from the bottom three panels is that this is just a false positive caused by differential indel realignment among these three pools.

I examined 38 highly differentiated SNPs by eye using IGV and 13 of them were clear false positives caused by indel realignment. Has this been observed before that you know of? Is there any a priori reason to avoid performing indel realignment on pooled sequencing data?

Many thanks,
m

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi there,

    If it's important for you to get consistency across pools then you should realign all of your data together instead of doing it separately per pool. That will ensure that any realignment happens the same within all your pools.

    Out of curiosity, what command line are you using for the Indel Realigner job?

  • marakatmarakat Member

    FYI, I tried what I think you suggested (to make one intervals file using input from all three BAMs) and the SNP site I posted in the screenshot above is identical: so in other words, using one intervals file didn't resolve the issue. The "Low" pool is aligned differently from the "High" and "Ref" pools, even when in the dedupped files, prior to realignment, all three pools showed the indel in the same place. Any further thoughts on how to fix this, or maybe it's just better to skip indel realignment?
    Thanks,
    m

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I believe what @ebanks meant was to realign the reads together, not just using the same intervals file. Since the realignment uses the reads it sees to decide how to realign, it may make different decisions for different batches, even if provided with the same intervals. The key here is that the realigner has to see all the reads together so that it can realign them all the same way.

Sign In or Register to comment.