GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

QCing IndelRealigner on low-coverage (10x) mouse WGS data

DEMember

Gidday,

I've run the IndelRealigner on my mouse WGS *bam files with known site data from the Sanger MGP, and now I'm trying to figure out how "well" it worked.

The list created by RealignerTargetCreator contains 6547185 intervals

I used the default settings, which means that

1) -model was USE_READS - and from what I've read, this is the correct option to use, given that Smith-Waterman modelling doesn't give greatly improved results;

2) -LOD was 5.0 - but for my data, which is mouse whole-genome sequence at average 10x coverage, this may be too high and I might be losing true positives.

I've tried randomly picking out candidate intervals from the intervals and OC-tagged reads from the realigned.bam file to check, but I was wondering if there's a more empirical way of checking how good the realignment was (I realise there's "no formal measure" as per the presentation but I'm finding it hard to make a judgement call!).

My feeling from looking at the intervals or realigned reads is that the low coverage is a major issue in terms of identifying "true" indels, so preferably we'd go for specificity over sensitivity.

