This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Does Unified Genotyper use information from soft clipped portions of reads?
I have put 9 samples through a variant calling pipeline closely following the standard recommended one and am puzzling over some insertion calls from GATK 2.3-9 that are called as het for samples that appear to have no support in the reads. There are some interesting features:
- several of the other samples called concurrently contain reads supporting the insertion
- there are two reads in the sample in question that were soft clipped, and the soft clipped portion could be realigned in such a way as to support the insertion
- if I call the sample on its own, no insertion call is made
It seems to me that either GATK is using information from the soft clipped portion of the reads (and performing a realignment on them) and additionally it is inferring evidence for the insertion from haplotyping and linkage with other samples. I couldn't find any information from the documentation about how soft clipping is handled, so I was wondering if anybody could clarify if soft clipped regions are used and, of course, any other thoughts on how calls like this might arise.