Is it possible to disable DuplicateReadFilter in UnifiedGenotyper?
My understanding of DuplicateReadFilter is that is removes PCR duplicates (ie: identical reads that map to the same location in the genome). Is there a way to prevent UnifiedGenotyper from using this filter?
Many of my 'duplicate' reads are not really duplicates. I have 50bp single ended yeast RNA-seq data, and 10 samples/lane results in highly over-sampled data. Removing duplicates results in an undercounting of reads at the most highly expressed genes, and therefore an increased number of sub-clonal variants at these genes (because the reads from the major clonal population are discarded, but reads of sub-clonal populations are kept because they have a different sequence). At least I think that is what is happening.