Unified genotyper- reduced reads issue with indel calling

Dear all,

I find the behaviour of UG with reduced reads and indels very odd. I am using the 2.7.4 GATK and I am dealing with an indel that looks completely legit in the normal BAM:
11 47354522 C 16 .,..+3CAC,+3cac.,,+3cac,+3cac..+3CAC,.,,, :=>!!=>!!>!=?;>>

The reduced BAM mpileup looks like:
11 47354522 C 9 ,,+3cac..+3CAC,.,,, C!>!=?;>>
which I don't really have an issue with. I presume the forward and reverse reads containing the indels were collapsed into two composite reads.

Now this indel absolutely refuses to be called, until I set the option
-minIndelCnt 2
and then the indel is found and called. It is as if the UG could not see that the reads supporting the indel are collapsed.
Similarly in the VCF file the call looks like:
0/1:8,2:10:85:85,0,407
So depth 10, which I cannot make sense of. In any case with 2 reads supporting the indel, it is not consistent with the unreduced BAM file.

Is there something I do not understand here, or is there a bug with the handling of indels and the GATK UG?

Thank you in advance for suggestions,

Best Answer

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Are you using an older version of the GATK? Indels should not be compressed in current versions of Reduce Reads.

  • vplagnolvplagnol Member

    I am calling with 2.7.4 (the latest) but I reduced some of the files with 2.6.5 indeed. It varies really, depending on when the BAM were created. Is the suggestion to re-reduce every file? I guess that may makes sense but it is a big chunk of computational work.

  • vplagnolvplagnol Member

    mmm... I had absolutely no idea about it, but this is most likely my mistake.
    That's a absolutely key for me to know, thanks. I must have missed it somewhere but I guess it's a good idea to make it very obvious in the documentation.

    Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'll clarify this in the documentation, thanks for bringing it up.

  • vplagnolvplagnol Member

    Also I suppose that some versions will be backward compatible. Some knowledge of this would help out in cases where it is OK to upgrade.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The hard-line answer is that it's never really safe to mix and match between versions; ideally you should use the same version for every step of the pipeline on every sample in your project. If you do need to switch versions, you should redo any work that was previously completed with a different version.

Sign In or Register to comment.